홍 주영 – Robotics and Computer Vision Lab

이 재윤 on [CVPR 2026] SARMAE : Masked Autoencoder for SAR Representation Learning05/11/2026
안녕하세요 우진님, 좋은 질문 감사합니다. 이쪽 분야를 접한 이유는 저희 팀 기업 과제가 task가 SAR object detection이고, 과제 팔로우업을 겸해서…
이 재윤 on [CVPR 2026] SARMAE : Masked Autoencoder for SAR Representation Learning05/11/2026
안녕하세요 정우님, 좋은 질문 감사합니다. DINOv3는 frozen 상태로 optical branch에서 이미지 패치 feature를 추출하는 용도로만 사용되며, SAR branch에서는 일반적인 ViT…
이 재윤 on [CVPR 2026] SARMAE : Masked Autoencoder for SAR Representation Learning05/11/2026
안녕하세요 인택님, 좋은 질문 감사합니다. 말씀주신 대로 SAR-1M 데이터셋은 SAR 이미지 중 매칭된 광학 이미지 쌍이 존재하는 경우도 있고, 아닌…
이 재윤 on [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?05/11/2026
안녕하세요 예은님, 좋은 리뷰 감사합니다. description selection 과정에서, 단순히 타겟 클래스의 이미지와 가장 유사도가 높은 텍스트를 고르는 것에 그치지 않고…
최 인하 on [RSS 2025] DEXOP: A Device for Robotic Transfer of Dexterous Human Manipulation05/11/2026
안녕하세요 승현님 좋은 질문 감사합니다 프로젝트 페이지에 따로 fingertip nail을 사용해서 task를 수행한 정성적인 영상 결과가 있습니다. 예를 들어서 바닥에…

Author: 홍 주영

[Arxiv 2024] Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?

[CVPR 2025] Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models

[CVPR 2025] VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

[Arxiv 2026] RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval

[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

EV-5, VLM2Vec, VLM2Vec-V2: Generative MLLMs as Embedding Models

[ICLR 2023] CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Alignment

[ECCV 2024] InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

[Arxiv 2026] Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

Conference Deadline

NEW POST

New Comment