1

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition