Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


NeurIPS

REMA: Learning to Meta-Think for LLMS with Multi-Agent Reinforcement Learning

Ziyu Wan, Yunxiang LI, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, Ying Wen
NeurIPS 2025 • 2025
NeurIPS

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Shulin Huang, Linyi Yang, Yan Song, Shuang Chen, Leyang Cui, Ziyu Wan, Qingcheng Zeng, Ying Wen, Kun Shao, Weinan Zhang, Jun Wang, Yue Zhang
NeurIPS 2025 Datasets & Benchmarks Track • 2025
ECML-PKDD

Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models

Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang
ECML-PKDD 2025 • 2025
AAMAS

Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future

Yan Song*, He Jiang*, Haifeng Zhang, Zhen Tian, Weinan Zhang, Jun Wang
AAMAS 2024 • 2024

Journal Articles


Machine

An empirical study on google research football multi-agent scenarios

Yan Song, He Jiang, Zheng Tian, Haifeng Zhang, Yingping Zhang, Jiangcheng Zhu, Zonghong Dai, Weinan Zhang & Jun Wang
Machine Intelligence Research, Volume 21, pages 549–570, (2024) • 2024

Preprints


arXiv

Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs

Luoyang Sun, Jiwen Jiang, Yifeng Ding, Fengfa Li, Yan Song, Haifeng Zhang, Jian Ying, Lei Ren, Kun Zhan, Wei Chen, Yan Xie and Cheng Deng
arXiv preprint • 2026 • Preprint
arXiv

Natural Language Reinforcement Learning

Xidong Feng, Bo Liu, Yan Song, Haotian Fu, Ziyu Wan, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang
arXiv preprint • 2024 • Preprint
arXiv

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang
arXiv preprint • 2024 • Preprint