REMA: Learning to Meta-Think for LLMS with Multi-Agent Reinforcement Learning
Wan, Z., LI, Y., Wen, X., Song, Y., Wang, H., Yang, L., Schmidt, M., Wang, J., Zhang, W., Hu, S., & Wen, Y. (2025). REMA: Learning to Meta-Think for LLMS with Multi-Agent Reinforcement Learning. Advances in Neural Information Processing Systems (NeurIPS).