07 Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
10 VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning