X-Review – Page 4 – Robotics and Computer Vision Lab

[arXiv 2026] SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction

안녕하세요 손우진입니다. 이번에 리뷰할 논문은 RGB와 열화상을 3D 공간으로 정합하는 SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction 입니다. 요즘…

X-Review

[RO-MAN 2023] Affective Computing for Human-Robot Interaction Research: Four Critical Lessons for the Hitchhiker

안녕하세요. 이번에는 HRI(Human-Robot Interaction) 연구에서 AC(Affective Computing)를 어떻게 써야 하는지 다룬 논문을 읽어보게 되었습니다. 쉽게 말하면, 로봇이 사람의 감정이나 정서 상태를 읽고 반응하게 만들 때…

Paper X-Review

[CVPR 2026] FINER: MLLMs Hallucinate under Fine-grained Negative Queries

안녕하세요 오늘은 MLLM의 fine grained hallucination과 관련된 FINER논문을 읽어봤습니다CVPR 2026 oral논문이고 MLLM 모델이 이미지 속의 시각적인 정보와 텍스트를 real로 세밀하게 맞춰 보고 있는지! 라는 문제의식이…

Paper X-Review

[ArXiv 2026]From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

안녕하세요. 이번에 리뷰로 가져온 논문은 From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation 라는 논문 입니다. 간단하게 컨셉만 설명을 드리면 논문 제목처럼…

X-Review

[CVPR 2026] POGA: Paraphrased and Oppositional Graph Alignment for Fine-Grained Cross-Modal Retrieval

Abstract retrieval에서 embedding 생성에 사용되는 대부분의 모델은 다른 목적으로 학습이 되다 보니 물체의 세부 속성보다 coarse한 물체에 집중하는 경향이 있습니다. 또한, 서로 다른 description을 구분하는…

X-Review

[ICRA 2026] VITRA : Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

안녕하세요, 이번주 X-review는 unscripted real-life human video를 VLA pretraining 데이터로 바꾸는 연구를 리뷰해보려고 합니다. 지난 ActiveMimic 리뷰에 이어 egocentric human video를 로봇 학습으로 끌어오는 결의…

X-Review

[NIPS 2025] Mitigating Semantic Collapse in Partially Relevant Video Retrieval

안녕하세요. 이번 X-Review에선 새롭게 Partially Relevant Video Retrieval(PRVR) 문제를 다룬 논문을 소개해드리고자 합니다. 이 논문은 PRVR에서 자주 발생하는 semantic collapse 문제를 text embedding과 video embedding…

X-Review

[ICML 2026 Oral] Necessary Conditions for Compositional Generalization of Embedding Models

오늘 리뷰는 생각보다 기네요.. compositional generalization을 위한 임베딩 구조는 무엇인가? 에 대한 답변을 찾기위해 고민한 페이퍼입니다. Venue: ICML 2026 OralAuthors: Arnas Uselis, Andrea Dittadi, Seong Joon…

X-Review

[arXiv2026] What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Intro LLM의 활용 형태중 하나로 MAS(multi agent system)이 활발하게 연구되고 있습니다. 최근 연구는 특히 roles 제안이나 tool 구성 제안과 같은 방식으로 이루어지고 있는데, 에이전트간 소통의…

Paper X-Review

[Arxiv 2022] Exploring Visual Explanations for Contrastive Language-Image Pre-training

안녕하세요 이번에 들고온 논문은 CLIP 의 Vision 과 Text embedding의 similarity map 이 기대하는 경향성과는 반대라는점을 밝히고 개선한 논문입니다. Abstract 우선 저자는 Contrastive Language-Image Pre-training,…

Category: X-Review

[arXiv 2026] SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction

[RO-MAN 2023] Affective Computing for Human-Robot Interaction Research: Four Critical Lessons for the Hitchhiker

[CVPR 2026] FINER: MLLMs Hallucinate under Fine-grained Negative Queries

[ArXiv 2026]From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

[CVPR 2026] POGA: Paraphrased and Oppositional Graph Alignment for Fine-Grained Cross-Modal Retrieval

[ICRA 2026] VITRA : Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

[NIPS 2025] Mitigating Semantic Collapse in Partially Relevant Video Retrieval

[ICML 2026 Oral] Necessary Conditions for Compositional Generalization of Embedding Models

[arXiv2026] What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

[Arxiv 2022] Exploring Visual Explanations for Contrastive Language-Image Pre-training

Conference Deadline

NEW POST

New Comment