I’m a Ph.D. student at New Laboratory of Pattern Recognition(NLPR), the University of Chinese Academy of Sciences, advised by Prof. Yan Huang. I strongly believe in the power of interdisciplinary collaboration and the potential it holds for driving impactful research outcomes. If you are interested in partnering on research projects, offering internship opportunities or exchange programs, I would be thrilled to connect with you.

My research interests cover Multimodal Large Language Models.

šŸ”„ News

  • 2026.01: Ā šŸŽ‰šŸŽ‰ One paper on Browser Agent were accepted by TMLR!
  • 2025.11: Ā šŸŽ‰šŸŽ‰ Two papers on Multi-View Clustering and Deepfake were accepted by AAAI 2026!
  • 2025.07: Ā šŸŽ‰šŸŽ‰ One technical report on Kwai Keye-VL was released!
  • 2025.05: Ā šŸŽ‰šŸŽ‰ One paper on DPO (Direct Preference Optimization) was accepted by ICML 2025!
  • 2025.02: Ā šŸŽ‰šŸŽ‰ One paper on GUI Agent was accepted by CVPR 2025!
  • 2024.06: Ā šŸŽ‰šŸŽ‰ One paper on Knowledge Editing Benchmark was accepted by NeurIPS 2024 Datasets and Benchmarks Track!
  • 2023.08: Ā šŸŽ‰šŸŽ‰ One paper on Mobile Agent was accepted by Mobicom 2024 Summer Round!

šŸ“ Publications

Arxiv
sym

PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

GitHub Repo stars Ā  Project

Tao Yu, Minghui Zhang, Zhiqing Cui, Hao Wang, Zhongtian Luo, Shenghua Chai, Junhao Gong, Yuzhao Peng, Yuxuan Zhou, Yujia Yang, Zhenghao Zhang, Haopeng Jin, Xinming Wang, Yufei Xiong, Jiabing Yang, Jiahao Yuan, Hanqing Wang, Hongzhu Yi, YiFan Zhang, Yan Huang, Liang Wang

Arxiv
sym

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

GitHub Repo stars Ā  Project

Tao Yu, Haopeng Jin, Hao Wang, Shenghua Chai, Yujia Yang, Junhao Gong, Jiaming Guo, Minghui Zhang, Xinlong Chen, Zhenghao Zhang, Yuxuan Zhou, Yufei Xiong, Shanbin Zhang, Jiabing Yang, Hongzhu Yi, Xinming Wang, Cheng Zhong, Xiao Ma, Zhang Zhang, Yan Huang, Liang Wang

TMLR
sym

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

GitHub Repo stars Ā  Project

Tao Yu, Zhengbo Zhang, Zhiheng Lyu, Junhao Gong, Hongzhu Yi, Xinming Wang, Yuxuan Zhou, Jiabing Yang, Ping Nie, Yan Huang, Wenhu Chen

arXiv
sym

Aligning Multimodal LLM with Human Preference: A Survey

GitHub Repo stars Ā  Project

Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan

Others

šŸŽ– Honors and Awards

  • 2024.12, National Scholarship.(0.4%)
  • 2023.12, National Encouragement Scholarship.(0.4%)
  • 2022.12, National Scholarship.(0.4%)
  • 2022.06, First Grade Scholarship.(0.4%)

šŸ“– Educations

  • 2025.09 - Current, Ph.D. Student in Pattern Recognition and Intelligent Systems (Institute of Automation, Chinese Academy of Sciences)
  • 2021.09 - 2025.06, Bachelor in Computer Science and Technology (School of Computer Science and Technology, Harbin Institute of Technology), GPA: 93.79/100 (Ranking: 2/135)

šŸ’» Internships

  • 2025.01 - 2025.07, Multimodal Understanding and Application Group, Kuaishou Technology, China.
  • 2024.03 - 2024.06, New Laboratory of Pattern Recognition(NLPR), Institute of Automation, China.
  • 2023.05 - 2024.07, Institute for AI Industry Research(AIR), Tsinghua University, China.