💻 Internship
-
2025.06 - Present, Tongyi Laboratory, Summer Intern
Topic: RL + LLM Agent
-
2024.07 - 2025-06, Taobao & Tmail Group of Alibaba, Research Intern
Topic: LLM Agent Planning
Contribution: We investigate the effectiveness of pseudocode-style plans in agent reasoning, which are more concise and structured than NL plans. Based on two designed planning-oriented rewards, we further introduce PGPO, a preference optimization method that empowers LLM agents with enhanced reasoning capabilities.
🎓 Education
- 2023.09 - 2026.03 (expected), M.S.@SJTU, Computer Science and Technology, Shanghai, China.
- 2019.09 - 2023.06, B.S.@SEU, Computer Science and Technology, Nanjing, China.
🌲 Service
- Reviewers: ACL Rolling Review, AAAI
- Student Works
- Volunteers