X-Review – Page 7 – Robotics and Computer Vision Lab

[arxiv 2025] Is Diversity All You Need for Scalable Robotic Manipulation?

안녕하세요, 이번에는 로봇 조작 학습에서 데이터 다양성이 정말 항상 좋은 것인지에 대해 다룬 연구를 리뷰해보려고 합니다. Agibot에서 진행한 연구이고, 저자들은 task diversity, multi-embodiment pre-training, expert…

X-Review

[ICLR 2026] CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally

이번 논문은 “CLIP은 정말 Bag-of-Words처럼밖에 이해하지 못하는가?”라는 질문에서 시작하는데요, CLIP의 compositionality failure 원인을 encoder 내부 정보 부족과 cross-modal alignment 문제로 분리해 분석한 연구입니다. Venue: ICLR 2026…

Conference X-Review

[CVPR2025] Video Depth Anything

Intro 최근 단안 깊이 추정(Monocular Depth Estimation, MDE)은 깊이 파운데이션 모델(depth foundation model)의 발전에 힘입어 좋은 모습을 보여주고 있습니다. 그러나 이러한 모델들은 여전히 한계가 있는데…

Paper X-Review

[NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Models

안녕하세요 오늘은 multimodal token compression관련 논문을 읽어보겠습니다. Intro 최근의 Video-LLM은 video understanding에서 좋은 성능을 보여주고 있습니다. 하지만 비디오는 여러 프레임으로 구성되어 있고 또 각 프레임마다…

X-Review

[HRI 2026] Learning Human Preferences over a Human-Robot Collaboration Based on Explicit and Implicit Human Feedback

안녕하세요. 이번 논문은 preference-aware 논문이지만 특이하게 implicit human feedback까지 고려하는 논문을 가져와봤습니다. 그럼 시작해보겠습니다. 1. Introduction 로봇 하드웨어와 physical manipulation 능력이 발전하면서, 로봇이 사람 사용자에…

X-Review

[CoRL 2024] APRICOT : Active Preference Learning and Constraint-Aware Task Planning with LLMs

오늘은 preference-aware 논문을 가져와봤습니다. preference-aware는 로봇이 사람의 선호하는 바를 인지하고 이를 action에 반영하는 논문이라고 보시면 되겠습니다. 사람과 로봇의 introduction이 흥미가 있어 읽어봤습니다. 그럼 리뷰 시작하겠습니다….

Paper X-Review

[ICML 2026] VideoBrain : Learning Adaptive Frame Sampling for Long Video Understanding

안녕하세요, 요즘 SAR만 파다 보니 루즈해지기도 해서 마침 ICML conference 참가 신청도 했겠다 어떤 논문들이 있는지 찾아보았는데, adaptive frame sampling이라는 말에 끌려 이 논문을 읽어보게…

X-Review

[ICLR 2026 Workshop] World Action Models are Zero-shot Policies

안녕하세요 이번주는 WAM을 소개하려고 합니다. 최근 로봇 파운데이션 모델들의 연구에서 로봇 데이터의 teleoperation 의존도를 낮추는 연구와 기존 데이터를 통해서 3차원 현실에서 작동하기 위한 모델 구조,…

X-Review

[CVPR 2026] EgoX: Egocentric Video Generation from a Single Exocentric Video

안녕하세요 오늘은 위 영상처럼 3인칭 영상을 1인칭 영상으로 만들어주는 논문을 가져왔습니다.시각적 결과가 인상깊어서 어떻게 했나 궁금해서 한번 읽어봤습니다. Introduction 1인칭 시점 영상을 만드는 것을 쉽지…

Paper X-Review

[ICLR 2026] AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

안녕하세요 이번에 들고온 논문도 VLM 에서의 token pruning 논문입니다. 제가 분석하고있는 방법론과 비슷한 방법을 사용하고, 기존 방법론들이 성능 올리고 어거지로 주장하는 느낌보다는 분석적인 내용도 깔끔하고…

Category: X-Review

[arxiv 2025] Is Diversity All You Need for Scalable Robotic Manipulation?

[ICLR 2026] CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally

[CVPR2025] Video Depth Anything

[NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Models

[HRI 2026] Learning Human Preferences over a Human-Robot Collaboration Based on Explicit and Implicit Human Feedback

[CoRL 2024] APRICOT : Active Preference Learning and Constraint-Aware Task Planning with LLMs

[ICML 2026] VideoBrain : Learning Adaptive Frame Sampling for Long Video Understanding

[ICLR 2026 Workshop] World Action Models are Zero-shot Policies

[CVPR 2026] EgoX: Egocentric Video Generation from a Single Exocentric Video

[ICLR 2026] AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Conference Deadline

NEW POST

New Comment