X-Review – Page 19 – Robotics and Computer Vision Lab

[arXiv 2025] IGen: Scalable Data Generation for Robot Learning from Open-World Images

안녕하세요, 이번주는 로봇을 위한 합성데이터 생성 방법론을 제안한 논문을 리뷰해보려고 합니다. 최근의 비디오 생성 모델에 대항해 VFM, VLM 등을 활용해 비디오 생성 모델 만큼 확장성…

[ECCV 2024]Thermal3D-GS :Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis

안녕하세요 손우진입니다. 오늘은 제가 지금 껏 리뷰했던 6D pose estimation 분야가 아닌 graphics 분야의 논문을 들고왔습니다. 이번년도 연구 타이틀은 Multispectral 통해 object perception 과 6D…

Paper X-Review

[NIPS 2025] Don’t Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention

이번 주 X-Review에선 25년도 NeurIPS에 게재된 논문 <Don’t Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention>을 소개해드리겠습니다. 88.9%의 pruning ratio에도 기존 성능의…

X-Review

[NIPS 2017]Attention Is All You Need

안녕하세요 최인하입니다. 오늘은 예전부터 리뷰하고 싶었던 Attention Is All You Need 논문을 리뷰해 보려고 합니다. 기존 자연어 처리 모델들이 attention으로 Encoder와 Decoder가 연결되어있는 구조로 좋은…

Paper X-Review

[RA-L2025] VL-TGS: Trajectory Generation and Selection Using Vision Language Models in Mapless Outdoor Environments

본 논문은 지도 없는 야외 환경에서 로봇이 사람 중심(Human-centered)의 주행을 수행할 수 있도록 새로운 알고리즘을 제안하는 논문입니다. Intro 야외 환경은 공사 현장이나 계절 변화 등…

Paper X-Review

[2024 ECCV] VideoAgent: Long-form Video Understanding with Large Language Model as Agent

안녕하세요. 이번에 소개할 논문은 Long-form Video Understanding 태스크 논문이며 긴 영상을 처리하는 방식을 인간이 비디오를 이해하는 흐름을 모사해 방법론을 제안합니다. 저자는 이를 위해 VideoAgent라는 에이전트…

Conference X-Review

[CVPR 2025]Compositional Caching for Training-free Open-vocabulary Attribute Detection

제가 이번에 리뷰할 논문은 속성을 활용하여 물체를 인지하는 Attribute detection이라는 연구입니다. 제가 담당하고 있는 파지 과제에서 속성정보를 활용하여 유의미한 물체를 인식하는 연구를 진행하고있는데, 서베이를 하다…

Paper X-Review

[CVPR 2025] What’s in the Image? A Deep-Dive into the Vision of Vision Language Models

안녕하세요. 새해 첫 엑스리뷰로는 기존에 읽어왔던 AVQA 관련 논문보단 VLM 에 관련된 논문을 들고왔습니다. 뭔가 한 태스크에 시야가 갇히는 느낌이 없지않아 있어서, 좀 다른 시야를…

Paper X-Review

[arXiv 2025]OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation

안녕하세요. 이번에 리뷰할 논문은 OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation입니다. 2025년 9–10월쯤 아카이브에 올라온 논문인데, 읽어보니 현재 연구실에서 돌리고 있는 모바일 플랫폼에도 적용…

Paper X-Review

[ICRA 2023] Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

안녕하세요, 허재연 입니다. 오늘 리뷰할 논문은 ICRA 2023에 게재된 논문으로, 인접 프레임 간의 관계 변화를 포착하는 데 어려움을 겪는 기존 모델들의 한계를 극복하기 위해 Cross-Modality…

Category: X-Review

[arXiv 2025] IGen: Scalable Data Generation for Robot Learning from Open-World Images

[ECCV 2024]Thermal3D-GS :Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis

[NIPS 2025] Don’t Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention

[NIPS 2017]Attention Is All You Need

[RA-L2025] VL-TGS: Trajectory Generation and Selection Using Vision Language Models in Mapless Outdoor Environments

[2024 ECCV] VideoAgent: Long-form Video Understanding with Large Language Model as Agent

[CVPR 2025]Compositional Caching for Training-free Open-vocabulary Attribute Detection

[CVPR 2025] What’s in the Image? A Deep-Dive into the Vision of Vision Language Models

[arXiv 2025]OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation

[ICRA 2023] Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Conference Deadline

NEW POST

New Comment