Career

Internship I have experienced.

Tencent WXG-LLM (QINGYUN Program)

2025.05 – Present

Multimodal Large Model Algorithm Intern

1. Responsible for WeChat's OCR content understanding project based on multimodal large models, building data pipelines and training frameworks for the OCR initiative. 2. Led the Wedoc project, using small-scale VLMs to achieve document parsing performance and scenario generalization that surpass traditional models.

Multimodal Large ModelOCRVLM

ByteDance Doubao Team (Seed-VLM)

2024.12 – 2025.05

Multimodal Interaction and World Model Intern

1. Participated in the chart-text understanding project, using multimodal chain-of-thought interaction data to enhance the model's ability to understand complex charts. 2. Participated in building the Doubao-Evals platform, responsible for the evaluation construction of VLM's video understanding, 3D vision and content perception initiatives.

Multimodal InteractionModel EvaluationChart Understanding

International Digital Economy Academy (IDEA)

2023.08 – 2024.11

Large Model Algorithm Research Intern

1. Proposed ChartBench, the first dataset focused on evaluating multimodal large models' ability to understand charts without data point annotations. 2. Proposed ChartMoE, replacing the linear layer between ViT and LLM with a multi-expert architecture, demonstrating excellent performance and interpretability.

Dataset ConstructionMulti-Expert ArchitectureModel Research

International Digital Economy Academy (IDEA)

2022.09 – 2023.08

Data Algorithm Research Intern

1. Proposed ECL, a collaborative learning framework that integrates contrastive learning proxy tasks, and proved its effectiveness on four datasets. 2. Proposed HLC, a hierarchical reasoning architecture that expands label information based on multimodal data to improve the reasoning performance of vision-language models.

Collaborative LearningHierarchical ReasoningVision-Language Model

Tencent TEG AI Lab

2021.07 – 2022.07

Virtual Human 3D Reconstruction Algorithm Intern

1. Proposed REALY, a 3D face reconstruction effect evaluation method and benchmark test based on local region alignment. 2. Proposed FFHQ-UV, a standardized framework that obtains high-quality texture datasets by correcting poses, removing lighting and expressions from 2D face images.

3D ReconstructionFace Image ProcessingEvaluation Benchmark