Publications

(2025). VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning. NeurIPS 2025.

PDF Cite Code Dataset Video Page

(2025). Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?. ICCV 2025.

PDF Cite

(2025). Mitigating Object Hallucinations via Sentence-Level Early Intervention. ICCV 2025.

PDF Cite Code Dataset Page

(2025). Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs. Preprint.

PDF Cite

(2025). Logits-based Finetuning. Preprint.

PDF Cite

(2024). Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition. NeurIPS 2025.

PDF Cite Code Page

(2024). VisionZip: Longer is Better but Not Necessary in Vision Language Models. CVPR 2025.

PDF Cite Code Video Page

(2023). Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation. CVPR 2024.

PDF Cite Page

(2023). LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model. Tech Report.

PDF Cite Dataset

(2023). ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation. ICLR 2024.

PDF Cite Code Page