Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

portfolio

publications

An empirical study on google research football multi-agent scenarios

Published on Machine Intelligence Research, Volume 21, pages 549–570, (2024), 2024

Yan Song, He Jiang, Zheng Tian, Haifeng Zhang, Yingping Zhang, Jiangcheng Zhu, Zonghong Dai, Weinan Zhang & Jun Wang

Recommended citation: Song, Y., Jiang, H., Tian, Z., Zhang, H., Zhang, Y., Zhu, J., Dai, Z., Zhang, W., & Wang, J. (2024). An empirical study on google research football multi-agent scenarios. Machine Intelligence Research, 21, 549–570.
Download Paper

Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future

Published on AAMAS 2024, 2024

Yan Song*, He Jiang*, Haifeng Zhang, Zhen Tian, Weinan Zhang, Jun Wang

Recommended citation: Song, Y., Jiang, H., Zhang, H., Tian, Z., Zhang, W., & Wang, J. (2024). Boosting Studies of Multi-Agent Reinforcement Learning on Google Research Football Environment: the Past, Present, and Future. Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
Download Paper

AI-Olympics: Exploring the Generalization of Agents through Open Competitions

Published on IJCAI Demo, 2024

Chen Wang, Yan Song, Shuai Wu, Sa Wu, Ruizhi Zhang, Shu Lin, Haifeng Zhang

Recommended citation: Wang, C., Song, Y., Wu, S., Wu, S., Zhang, R., Lin, S., & Zhang, H. (2024). AI-Olympics: Exploring the Generalization of Agents through Open Competitions. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) Demo Track.
Download Paper

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Released on arXiv preprint, 2024

Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang

Recommended citation: Wang, J., Fang, M., Wan, Z., Wen, M., Zhu, J., Liu, A., Gong, Z., Song, Y., Chen, L., Ni, L. M., Yang, L., Wen, Y., & Zhang, W. (2024). OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models. arXiv preprint arXiv:2410.09671.
Download Paper

Natural Language Reinforcement Learning

Submitted on arXiv preprint, 2024

Xidong Feng, Bo Liu, Yan Song, Haotian Fu, Ziyu Wan, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang

Recommended citation: Feng, X., Liu, B., Song, Y., Fu, H., Wan, Z., Koushik, G. A., Hu, Z., Yang, M., Wen, Y., & Wang, J. (2024). Natural Language Reinforcement Learning. arXiv preprint arXiv:2411.14251.
Download Paper

Efficient Reinforcement Learning with Large Language Model Priors

Published on ICLR 2025, 2025

Xue Yan, Yan Song, Xidong Feng, Mengyue Yang, Haifeng Zhang, Haitham Bou Ammar, Jun Wang

Recommended citation: Yan, X., Song, Y., Feng, X., Yang, M., Zhang, H., Bou Ammar, H., & Wang, J. (2025). Efficient Reinforcement Learning with Large Language Model Priors. International Conference on Learning Representations (ICLR).
Download Paper

Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models

Published on ECML-PKDD 2025, 2025

Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang

Recommended citation: Yan, X., Song, Y., Cui, X., Christianos, F., Zhang, H., Mguni, D. H., & Wang, J. (2025). Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD).
Download Paper

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Published on NeurIPS 2025 Datasets & Benchmarks Track, 2025

Shulin Huang, Linyi Yang, Yan Song, Shuang Chen, Leyang Cui, Ziyu Wan, Qingcheng Zeng, Ying Wen, Kun Shao, Weinan Zhang, Jun Wang, Yue Zhang

Recommended citation: Huang, S., Yang, L., Song, Y., Chen, S., Cui, L., Wan, Z., Zeng, Q., Wen, Y., Shao, K., Zhang, W., Wang, J., & Zhang, Y. (2025). ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning. Advances in Neural Information Processing Systems (NeurIPS) Datasets & Benchmarks Track.
Download Paper

REMA: Learning to Meta-Think for LLMS with Multi-Agent Reinforcement Learning

Published on NeurIPS 2025, 2025

Ziyu Wan, Yunxiang LI, Xiaoyu Wen, Yan Song, Hanjing Wang, Linyi Yang, Mark Schmidt, Jun Wang, Weinan Zhang, Shuyue Hu, Ying Wen

Recommended citation: Wan, Z., LI, Y., Wen, X., Song, Y., Wang, H., Yang, L., Schmidt, M., Wang, J., Zhang, W., Hu, S., & Wen, Y. (2025). REMA: Learning to Meta-Think for LLMS with Multi-Agent Reinforcement Learning. Advances in Neural Information Processing Systems (NeurIPS).
Download Paper

Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs

Released on arXiv preprint, 2026

Luoyang Sun, Jiwen Jiang, Yifeng Ding, Fengfa Li, Yan Song, Haifeng Zhang, Jian Ying, Lei Ren, Kun Zhan, Wei Chen, Yan Xie and Cheng Deng

Recommended citation: Sun, L., Jiang, J., Ding, Y., Li, F., Song, Y., Zhang, H., Ying, J., Ren, L., Zhan, K., Chen, W., Xie, Y., & Deng, C. (2026). Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs. arXiv preprint arXiv:2602.10377.
Download Paper

talks

teaching