💻 Internship

  • 2025.06 - Present, Tongyi Laboratory, Summer Intern

    Topic: RL + LLM Agent

  • 2024.07 - 2025-06, Taobao & Tmail Group of Alibaba, Research Intern

    Topic: LLM Agent Planning

    Contribution: We investigate the effectiveness of pseudocode-style plans in agent reasoning, which are more concise and structured than NL plans. Based on two designed planning-oriented rewards, we further introduce PGPO, a preference optimization method that empowers LLM agents with enhanced reasoning capabilities.

🎓 Education

  • 2023.09 - 2026.03 (expected), M.S.@SJTU, Computer Science and Technology, Shanghai, China.
  • 2019.09 - 2023.06, B.S.@SEU, Computer Science and Technology, Nanjing, China.

🌲 Service

  • Reviewers: ACL Rolling Review, AAAI
  • Student Works
  • Volunteers