홍 주영 – Page 2 – Robotics and Computer Vision Lab

홍 주영 on [ICML 2026] Are Object-Centric Representations Better At Compositional Generalization?07/27/2026
Q1. k-means 실험의 목적? >> DINOv2+k-means와 DINOv2+CA는 DINOSAURv2의 장점이 단순히 token을 줄였기 때문인지 확인하는 비교 실험입니다. 아마 리뷰어가 "그냥 토큰…
이 예은 on [arXiv 2025] UniFGVC: Universal Training-Free Few-Shot Fine-Grained Visual Classification via Attribute-Aware Multimodal Retrieval07/27/2026
안녕하세요 주영님! 댓글 감사드립니다. 질문주신 것과 관련된 실험들이 있었는데, 리뷰에는 담지 못한 것 같아 댓글로 설명드리고자 합니다. 잘못된(판별력 있는 description…
이 예은 on [arXiv 2025] UniFGVC: Universal Training-Free Few-Shot Fine-Grained Visual Classification via Attribute-Aware Multimodal Retrieval07/27/2026
안녕하세요 주연님 댓글 감사합니다! 저도 학습없이 fine-grained class에 대해 discriminative한 캡션을 잘~뽑아서 classification 성능을 올린다는 점에서 이 논문을 재미있게 읽은…
이 예은 on [arXiv 2025] UniFGVC: Universal Training-Free Few-Shot Fine-Grained Visual Classification via Attribute-Aware Multimodal Retrieval07/27/2026
안녕하세요 승현님 댓글 감사합니다! Q1) Category-Discriminative Visual Captioner과정에서, 타겟 이미지와 시각적으로 유사한 t개의 샘플을 선택한다고 하셨는데, 클래스당 K개의 이미지로 구성된다면…
홍 주영 on ICML 2026 참관기07/27/2026
특정 대화 하나보다는, 저자들이 후속 연구와 현재 방법의 한계를 솔직하게 설명해주셨던 순간들이 기억에 남습니다. 논문에 적힌 결과보다 저자들이 실제로 중요하게…

Author: 홍 주영

[Arxiv 2026] RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval

[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

EV-5, VLM2Vec, VLM2Vec-V2: Generative MLLMs as Embedding Models

[ICLR 2023] CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment

[ECCV 2024] InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

[Arxiv 2026] Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

[Arxiv 2026] DeepSeek-OCR 2: Visual Causal Flow

[ICCV 2025] Bidirectional Likelihood Estimation withMulti-Modal Large Language Models for Text-Video Retrieval

[Arxiv 2026] Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Conference Deadline

NEW POST

New Comment