Career
Internship I have experienced.
Tencent WXG-LLM (QINGYUN Program)
2025.05 – PresentMultimodal Large Model Algorithm Intern
1. Responsible for WeChat's OCR content understanding project based on multimodal large models, building data pipelines and training frameworks for the OCR initiative. 2. Led the Wedoc project, using small-scale VLMs to achieve document parsing performance and scenario generalization that surpass traditional models.
ByteDance Doubao Team (Seed-VLM)
2024.12 – 2025.05Multimodal Interaction and World Model Intern
1. Participated in the chart-text understanding project, using multimodal chain-of-thought interaction data to enhance the model's ability to understand complex charts. 2. Participated in building the Doubao-Evals platform, responsible for the evaluation construction of VLM's video understanding, 3D vision and content perception initiatives.
International Digital Economy Academy (IDEA)
2023.08 – 2024.11Large Model Algorithm Research Intern
1. Proposed ChartBench, the first dataset focused on evaluating multimodal large models' ability to understand charts without data point annotations. 2. Proposed ChartMoE, replacing the linear layer between ViT and LLM with a multi-expert architecture, demonstrating excellent performance and interpretability.
International Digital Economy Academy (IDEA)
2022.09 – 2023.08Data Algorithm Research Intern
1. Proposed ECL, a collaborative learning framework that integrates contrastive learning proxy tasks, and proved its effectiveness on four datasets. 2. Proposed HLC, a hierarchical reasoning architecture that expands label information based on multimodal data to improve the reasoning performance of vision-language models.
Tencent TEG AI Lab
2021.07 – 2022.07Virtual Human 3D Reconstruction Algorithm Intern
1. Proposed REALY, a 3D face reconstruction effect evaluation method and benchmark test based on local region alignment. 2. Proposed FFHQ-UV, a standardized framework that obtains high-quality texture datasets by correcting poses, removing lighting and expressions from 2D face images.