X-Review – Page 12 – Robotics and Computer Vision Lab

[CoRL 2025] DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control

안녕하세요 최인하입니다. Robot이 다양한 task와 environment 에서 강건하게 작동하도록 하는 것은 robot domain에서 중요한 주제인데요. VLA model이 등장하면서 이러한 문제가 어느정도 해결되는 것처럼 보였습니다. 하지만…

Paper X-Review

[arXiv 2025] WorldVLA: Towards Autoregressive Action WorldModel

안녕하세요 오늘은 WorldVLA에 대해서 설명드리도록 하겠습니다. 최근 들어서 계속 VLA 관련 논문들을 읽고 있는데 세계에 대한 일반화? 능력에 대한 부분이 상당히 필요한 것 같다고 느꼈습니다….

X-Review

[arXiv 2026] Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models

해당 논문이 CVPR 2026에 제출된 것 같은데, 아직 정확한 정보 확인은 어렵습니다. 해당 논문은 다양한 VFM에 대하여 affordance 추론 능력에 대하여 분석한 논문입니다. Abstract 저자들은…

Paper X-Review

[CVPR 2025]CityWalker Learning Embodied Urban Navigation from Web-Scale Videos

안녕하세요 이번에 리뷰할 논문은 CVPR 2025년에 올라온 CityWalker Learning Embodied Urban Navigation from Web-Scale Videos 라는 논문입니다. 바로 리뷰 시작하도록 하겠습니다. introduction 동적 도시 환경에서의…

Paper X-Review

[arXiv2026]Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

오늘 소개드린 논문은 LLM의 evalutation에 대한 분석과 분석 방법을 다룬 논문입니다. 일반적인 벤치마크는 정확도를 기준으로 평가합니다. 하지만 이는 LLM이 실제로 그 정보에 대한 지식이 없는지(empty…

Conference X-Review

[ECCV2024] Self-Supervised Any-Point Tracking by Contrastive Random Walks

Intro 본 논문이 타깃으로 하는 task는 Tracking Any Point (TAP)라는 task로 Deepmind가 작성한 TAP-Vid: A Benchmark for Tracking Any Point in a Video라는 논문에서 처음…

X-Review

[ECCV 2024] InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

비디오 진영의 파운데이션 모델(Foundation Model)로 군림하던 InternVideo라는 모델이 있었는데요. 해당 논문에 대한 리뷰는 2023년 임근택 연구원이 읽기 쉽게 잘 정리해주신 걸 확인할 수 있었습니다: [InternVideo…

X-Review

[CVPR 2023]Shape-Constraint Recurrent Flow for 6D Object Pose Estimation

안녕하세요 손우진입니다. 오늘 제가 소개드릴 논문은 단일 rgb 기반의 6D pose 입니다. 단일 rgb 같은 경우는 깊이정보가 없기 때문에 6D 정보를 찾아내는게 쉽지않습니다. 또한 6D…

X-Review

[ICRA 2025] Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs

안녕하세요. 이번 X-Review에서는 로봇 관점에서 attribute를 알아내고자 하는 논문에 대해 다루어보려고 합니다. CaP나 VoxPoser와 마찬가지로 LLM이 직접 코드를 생성해 계층적으로 API를 호출하는 방식을 활용하며, 이를…

Paper X-Review

[Arxiv 2026] Agentic Very Long Video Understanding

안녕하세요.이번에 리뷰해볼 논문은 long video understanding에서 1시간 가량의 롱이 아닌 최대 50시간 정도의 베리롱!! VU를 다룬 논문입니다. 그럼 리뷰 시작하겠습니다. Intro 이 논문에서는 “very long…

Category: X-Review

[CoRL 2025] DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control

[arXiv 2025] WorldVLA: Towards Autoregressive Action WorldModel

[arXiv 2026] Probing and Bridging Geometry-Interaction Cues for Affordance Reasoning in Vision Foundation Models

[CVPR 2025]CityWalker Learning Embodied Urban Navigation from Web-Scale Videos

[arXiv2026]Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

[ECCV2024] Self-Supervised Any-Point Tracking by Contrastive Random Walks

[ECCV 2024] InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

[CVPR 2023]Shape-Constraint Recurrent Flow for 6D Object Pose Estimation

[ICRA 2025] Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs

[Arxiv 2026] Agentic Very Long Video Understanding

Conference Deadline

NEW POST

New Comment