👋 About Me

I am a first-year M.S. student at the Gaoling School of Artificial Intelligence (GSAI), Renmin University of China (RUC), advised by Prof. Xin Zhao. My research interests include Natural Language Processing (NLP), Large Language Models (LLMs), and Agents, with a particular focus on improving models’ planning, tool use, and long-horizon reasoning in complex real-world tasks.

My email is: sunshuang@ruc.edu.cn.


🔍 Research Interests

My research centers on LLMs, World Models, and Agents. I am especially interested in systematically strengthening the foundational capabilities of LLMs and further advancing their generalization and practical utility in complex real-world settings through LLM-based feedback simulation and tool-augmented agent paradigms.

  • Enhancing foundational LLM capabilities: I study effective recipes that combine continued pre-training (CPT), supervised fine-tuning (SFT), reinforcement learning (RL), and test-time scaling (TTS) to expand models’ knowledge boundaries and to develop more effective data construction and training strategies.
  • World model learning: I investigate learnable “surrogate environments” that simulate execution and interaction feedback, enabling models to approximate real environment dynamics at low cost, reducing reliance on heavyweight execution stacks (e.g., containers), and improving scalability.
  • Agent applications and tool use: I aim to enhance LLMs’ ability to use tools (e.g., web search, code tools, and command-line operations) in realistic workflows, enabling them to solve long-horizon tasks in real-world scenarios.

🔥 News

  • [2026-02-04] We released SWE-Master and SWE-World, aiming to lower the barrier to training code agents and to democratize SWE agent research.
    • SWE-Master: an open, end-to-end post-training pipeline for SWE agents, along with practical enhancements such as LSP.
    • SWE-World: an LLM-based surrogate environment that simulates environment feedback, breaking the traditional dependence on heavyweight SWE environments (e.g., Docker) and enabling the first end-to-end Docker-free training framework.
  • [2025-05-22] We released SimpleDeepSearcher: by synthesizing and filtering a small set of high-quality samples for SFT, we substantially improve deep information seeking capability and outperform RL-based methods from the same period.

  • [2025-03-06] We released YuLan-Mini-Instruct: a compact yet strong 2.4B instruction-tuned model, post-trained from the YuLan-Mini base model. Trained efficiently on both open and synthetic data, it achieves competitive performance against mainstream small models such as Qwen2.5-1.5B-Instruct and LLaMA-3.2-3B-Instruct.

📝 Publications

arXiv 2026
SWE-World

SWE-World: Building Software Engineering Agents in Docker-Free Environments

Shuang Sun*, Huatong Song*, Lisheng Huang*, Jinhao Jiang*, Ran Le, Zhihao Lv, Zongchao Chen, Yiwen Hu, Wenyang Luo, Wayne Xin Zhao†, Yang Song†, Hongteng Xu, Tao Zhang, Ji-Rong Wen
(* Equal contribution; † Corresponding)

  • We propose SWE-World, a Docker-free framework that simulates execution feedback with LLMs, enabling end-to-end training and inference for SWE agents (SFT/RL/TTS) without relying on Docker, thereby substantially lowering the infrastructure barrier.

Paper Code 🤗HuggingFace WeChat

arXiv 2026
SWE-Master

SWE-Master: A Fully Open, End-to-End Post-Training Pipeline for Software Engineering Agents

Huatong Song*, Lisheng Huang*, Shuang Sun*, Jinhao Jiang*, Ran Le, Daixuan Cheng, Guoxin Chen, Yiwen Hu, Zongchao Chen, Wayne Xin Zhao†, Yang Song†, Tao Zhang, Ji-Rong Wen
(* Equal contribution; † Corresponding)

  • We open-source a fully reproducible end-to-end post-training pipeline for SWE agents (data → long-horizon SFT → RL → TTS), and introduce LSP-based structured code navigation to improve interaction efficiency.

Paper Code 🤗HuggingFace WeChat

EMNLP Findings 2025
SimpleDeepSearcher

SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

Shuang Sun*, Huatong Song*, Yuhao Wang, Ruiyang Ren, Jinhao Jiang, Junjie Zhang, Fei Bai, Jia Deng, Wayne Xin Zhao†, Zheng Liu†, Lei Fang†, Zhongyuan Wang, Ji-Rong Wen
(* Equal contribution; † Corresponding)

  • We propose a real Web-based data synthesis and multi-criteria curation framework, and show that SFT on only 871 high-quality samples substantially improves deep information seeking capability, outperforming contemporary RL-based approaches.

Paper Code 🤗HuggingFace WeChat

🎖 Honors

  • 2025 Outstanding Graduate, Northeastern University (Top 0.9%)
  • 2023 Huawei Scholarship (Top 2.3%)
  • 2022 National Scholarship (Top 0.2%)

📖 Education

  • Sep. 2025 – Present M.S. student, Gaoling School of Artificial Intelligence, Renmin University of China
  • Sep. 2021 – Jun. 2025 B.E., School of Computer Science and Engineering, Northeastern University

💻 Internships

  • Oct. 2025 – Present Nanbeige LLM Lab, Boss Zhipin