Xiyuan Yang (杨希渊)

About Me

Xiyuan Yang (杨希渊) is now an undergraduate (sophomore) in School of Artificial Intelligence, Shanghai Jiao Tong University (SJTU-SAI). During his freshman year of study, he ranked SECOND of 62 in grades and received honors including the National Scholarship (5 scholarships in total). He joined the research group MAGIC in SJTU, under the supervision of Prof. Siheng Chen.

My research interest lies in continuously pushing the capability boundaries of language intelligence, enabling it to accomplish increasingly complex and valuable tasks. I focus on two key directions: agentic harness construction, which builds workflows that maximize the potential of base language models through agentic tool-calling and agentic memory; and agentic training-data construction, which synthesizes high-quality agent trajectories and feeds them back into the model's post-training and mid-training stages, so that the model intrinsically acquires these agentic capabilities.

Topics I'm currently interested in:

Self-Evolving Agents for Autonomous LLM Post-Training
Agentic Trajectory Data Construction
Agentic Tool Calling Benchmarks

Download my CV

Education

Shanghai Jiao Tong University

B.S. in Artificial Intelligence, School of Artificial Intelligence

2024 - 2028

GPA: 4.1/4.3 (Ranked 2 out of 62)

Score: 94.0/100

Scholarships: National Scholarship (first 3%), "Han Ying Ju Hua" Scholarship (15 per year), Zhiyuan Honor Scholarships (first 50%), SJTU Undergraduate Excellence Scholarship

High Graded Courses:

Comprehensive Programming Practice: 100/100
Probability and Statistics (Honor): 100/100
Algorithm Design and Analysis: 100/100
Numerical Analysis: 100/100
Linear Algebra (Honor): 98/100
Fundamentals of Programming (Honor): 98/100
Fundamentals of ML, DL and RL: 98/100

Experience

MAGIC Lab

Undergraduate Research Assistant

Advisor: Prof. Siheng Chen

2024 - Present

Publications & Tech Projects

DataMaster: Data-Centric Autonomous AI Research

Yaxin Du*, Xiyuan Yang*, Zhifan Zhou, Wanxu Liu, Zixing Lei, Zimeng Chen, Fenyi Liu, Haotian Wu, Yuzhu Cai, Zexi Liu, Xinyu Zhu, WenHao Wang, Linfeng Zhang, Chen Qian, Siheng Chen

arXiv / Code / RedNote

We introduce DataMaster, a data-agent framework for task-conditioned autonomous data engineering. Given a fixed learning algorithm, DataMaster improves downstream performance by searching for external data, selecting and composing datasets, and applying cleaning or transformation. Its DataTree, shared Data Pool, and Global Memory coordinate branch exploration and reuse evidence, improving medal rate by 32.27% on MLE-Bench Lite and surpassing the instruct model on GPQA.

ICML 2026 Regular

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

Wenhao Wang*, Peizhi Niu*, Gongyi Zou*, Xiyuan Yang*, Jingxing Wang*, Haoting Shi, Yaxin Du, Jingyi Chai, Xianghe Pang, Shuo Tang, Yanfeng Wang, Siheng Chen

arXiv / Code

We introduce MCP-Persona, a benchmark for evaluating LLM agents on personalized MCP tools in realistic social and productivity environments. It simulates account- and database-grounded tasks across Reddit, Xiaohongshu, Lark, and Slack, exposing challenges beyond generic information seeking. Experiments with SOTA agents reveal substantial gaps in personalized tool use, highlighting MCP-Persona as a practical testbed for personal-application agents.

ICLR 2026 Poster

InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents

Yaxin Du, Yuanshuo Zhang, Xiyuan Yang, Yifan Zhou, Cheng Wang, Gongyi Zou, Xianghe Pang, Wenhao Wang, Menglan Chen, Shuo Tang, Zhiyu Li, Feiyu Xiong, Siheng Chen

arXiv / Dataset / Code / Project

We introduce InfoMosaic-Bench, the first benchmark dedicated to multi-source information seeking in tool-augmented agents. Covering 6 representative domains (medicine, finance, maps, video, web, and multi-domain integration), InfoMosaic-Bench requires agents to combine general-purpose search with domain-specific tools. Tasks are synthesized with InfoMosaic-Flow, a scalable pipeline that grounds task conditions in verified tool outputs, enforces cross-source dependencies, and filters out shortcut cases solvable by trivial lookup.

AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, Chen Qian

arXiv / Code / Models

We introduce AppCopilot, a multimodal, multi-agent mobile agent designed for seamless cross-app operation. It implements a complete end-to-end pipeline encompassing data collection, model training, fine-tuning, efficient inference, and deployment across PC and mobile platforms. At the model level, it integrates multimodal foundation models with robust bilingual (Chinese-English) support. The reasoning and control layer employs chain-of-thought reasoning, hierarchical task decomposition, and multi-agent collaboration.

Open Source Projects

Active GitHub committer with 30+ open-source repositories and 1500+ commits.

Open-Source Projects

IntelliSearch V3.1: Unifying Search, Empowering Action for tool calling autonomous agents.

SAI Community: The first open-source SAIer's forum for courses, careers and future.

Technical Blog: 120+ articles on CS and AI, 450k+ words.

LLM Infra Docs: Core source code analysis of large-scale open-source LLM infrastructure.

Agent Codebase: Generalized codebase for agentic scientific research pipelines.

PaperFlow: Never use Overleaf again!

SyncFlash: Lightning-fast configuration of your server.

Course Labs

LLM Reasoning: Enhancing reasoning abilities for LLMs using reinforcement learning.

Clustering: Clustering of high-dimensional data with intrinsic low rank.

Image Scaling: 2D image scaling based on classical interpolation algorithms.

CTR-Press: Compress-then-Refine for training-free KV cache compression.

L-PHYM: Long-horizon language-driven physics-based motion control.

Data Structure: Course notes and source code for data structures.

CSAPP: Computer Systems: A Programmer's Perspective (labs and notes).

LLM: Stanford CS336 language models learning notes.

Technical Blogs

Maintainer of Xiyuan Yang's Technical Blog. I regularly publish technical content focusing on computer science and AI. To date, I have authored over 120 articles with a cumulative word count exceeding 450,000 words.

GitHub Repository / Blog Website