X-Review – Page 8 – Robotics and Computer Vision Lab

[arXiv 2026] Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video

안녕하세요. 오늘 리뷰할 논문은 Video-MME-v2입니다. Video-MME는 긴 비디오 이해 분야에서 가장 널리 활용되는 데이터셋입니다. 최근에 Video-MME 팀이 새로 데이터셋을 공개하여 해당 논문을 리뷰하려합니다. Introduction 최근…

X-Review

[arXiv 2026]Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned

안녕하세요. 이번에 리뷰로 가져온 논문은 Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned라는 논문입니다. 제목 그대로 최근 mobile robot navigation 쪽에서 많이…

Paper X-Review

[arXiv 2026] Zero-shot World Models Are Developmentally Efficient Learners

안녕하세요 오늘은 월드 모델을 가지고 왔습니다. 근데 그냥 월드 모델이 아니라 Zero-shot World Model이라고 해서 어떤 부분에서 Zero-shot이고 어디에 쓸 수 있는지 궁금해서 좀 들고…

X-Review

[2026 RA-L] ThermoAct Thermal-Aware Vision-Language-Action Modelsfor Robotic Perception and Decision-Making

안녕하세요 손우진입니다. 이번에 제가 리뷰할 논문은 thermal 카메라를 VLA 모델에 통합해서 온도 기반 로봇 시스템을 만드는 논문입니다. 평소 thermal을 6D pose estimation 쪽에서 활용하는 연구를…

X-Review

[IROS 2025] OpenRoboCare: A Multimodal Multi-Task Expert Demonstration Dataset for Robot Caregiving

안녕하세요. 저는 케어 로봇에 대해서 흥미도 많고 이쪽으로도 연구를 해보고 싶다는 마음이 매우매우매우 큰데요. 케어와 관련해서는 벤치마크가 없나? 싶어 찾아보니 데이터셋을 정말 잘 구축한 논문이…

X-Review

[arxiv2026] Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

최근 다양한 도메인에서 멀티 에이전트가 도입되고 있습니다. 다양한 페르소나 부여를 통한 전문가 간의 비교나 작업을 분업하는등 다양한 방식으로 활용되고는 하며, 실제로 멀티 에이전트 도입으로 유의미한…

Paper X-Review

[CVPR 2026] DIvide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

안녕하세요.오늘은 long video understanding을 수행할 때, 모든 query에 대해 같은 방식으로 프레임을 샘플링하는 것이 아니라 query type에 따라 적절한 frame selection strategy를 다르게 적용하는 방법을…

Paper X-Review

[ICLR 2026] PRUNE REDUNDANCY, PRESERVE ESSENCE: VISION TOKEN COMPRESSION IN VLMS VIA SYNERGISTIC IMPORTANCE-DIVERSITY

안녕하세요 이번에 들고온 논문도 VLM 에서의 token pruning 논문입니다. 제가 분석하고있는 방법론과 비슷한 키워드로 검색되어 찾아본 논문으로 아이디어를 확인하고자 읽게되었습니다. 바로 리뷰 시작하겠습니다. Abstract VLM들은…

Paper

[CVPR 2026] WANDERLAND: Geometrically Grounded Simulation for Open-World Embodied AI

안녕하세요. 이번에 리뷰로 가져온 논문은 CVPR 2026 highlight 논문인 WANDERLAND: Geometrically Grounded Simulation for Open-World Embodied AI라는 논문입니다. 이 논문은 최근 embodied AI나 visual navigation…

X-Review

[ICML 2026] DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter

안녕하세요, 이번주 X-review에는 tactile 관련 연구를 가져왔습니다. 최근 제안서 작업한 과제 내용에 기존 pretrained VLA에 tactile 센싱 모듈을 추가하겠다는 내용을 적었는데, 이거 어떻게 하면 효과적으로…

Category: X-Review

[arXiv 2026] Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video

[arXiv 2026]Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned

[arXiv 2026] Zero-shot World Models Are Developmentally Efficient Learners

[2026 RA-L] ThermoAct Thermal-Aware Vision-Language-Action Modelsfor Robotic Perception and Decision-Making

[IROS 2025] OpenRoboCare: A Multimodal Multi-Task Expert Demonstration Dataset for Robot Caregiving

[arxiv2026] Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

[CVPR 2026] DIvide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding

[ICLR 2026] PRUNE REDUNDANCY, PRESERVE ESSENCE: VISION TOKEN COMPRESSION IN VLMS VIA SYNERGISTIC IMPORTANCE-DIVERSITY

[CVPR 2026] WANDERLAND: Geometrically Grounded Simulation for Open-World Embodied AI

[ICML 2026] DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter

Conference Deadline

NEW POST

New Comment