text-image retrieval 9
- RzenEmbed: Towards Comprehensive Multimodal Retrieval
- U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
- MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction
- Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
- Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
- CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
- SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
- FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
- Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning