X-Review – Page 10 – Robotics and Computer Vision Lab

[RSS 2025]NaVILA: Legged Robot Vision-Language-Action Model for Navigation

안녕하세요. 오늘 리뷰할 논문은 NaVILA: Legged Robot Vision-Language-Action Model for Navigation이라는 논문입니다.여러 후속 논문들에서 이 논문을 자주 사이테이션하거나 베이스라인으로 잡고 있길래한번 읽어봐야겠다 싶어서 찾아서 읽어보게…

X-Review

[ICCV 2025] Bidirectional Likelihood Estimation withMulti-Modal Large Language Models for Text-Video Retrieval

1. Introduction Text-Video Retrieval은 주어진 텍스트에 대응되는 비디오, 혹은 비디오에 대응되는 텍스트를 검색하는 태스크입니다. 기존에는 CLIP이나 BERT 기반의 dual-encoder 구조가 주로 사용되었으며, 계산 효율은 높았지만…

X-Review

[RSS 2023] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

안녕하세요 최인하입니다. 이번에는 Diffusion Policy에 대해서 리뷰하겠습니다. 항상 해야지 해야지 했었던 논문인데, 이해하는데 background가 필요했던 논문이라 오래걸렸던 것 같습니다. 아직까지 완전히 이해한건 아닌 것 같지만…

X-Review

[arxiv 2025] Motus: A Unified Latent Action World Model

이번 리뷰는 논문 작업이 끝난 후 다음 연구 주제인 Long-horizon Task와 Failure Detection 분야를 서칭하던 중, 자극적인 제목에 끌려 보게되었습니다. Latent Action, World Model 을…

Paper X-Review

[CVPR 2023] R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

안녕하세요. 첫번째 X-review네요.바로 시작하겠습니다. 1. Introduction VPR에서는 주로 two stage로 retrival을 진행하는데, 먼저 global retrival과 reranking을 진행합니다. 지금까지의 논문들은 먼저 global retrival로 top N개의 이미지를…

Paper X-Review

[IROS 2025] GSPR: Multimodal Place Recognition using 3D Gaussian Splatting for Autonomous Driving

본 논문은 최근 핫한 토픽은 3D Gaussian Splatting 을 Place Recognition (PR) 분야에 접목한 논문입니다. 기존 PR 모델들이 피처 레벨에서의 추상적인 퓨전에 집중했다면, 본 논문은…

Paper X-Review

[RSS 2023] Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

안녕하세요, 저번 주에는 VLA 모델의 대표작 중 하나인 SmolVLA에 대해서 리뷰를 했었습니다. 해당 논문에서 Baseline으로 언급된 것이 Vision Action(VA) 기반의 ACT였고, 이에 대해서 흥미가 생겨서…

Paper X-Review

[Arxiv 2025] VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation

안녕하세요, 허재연입니다. 오늘도 Video Scene Graph Generation 논문을 가져 왔습니다. 포멧과 공개 시기를 미루어 보아 CVPR2026에 제출된 논문이 아닐까 하네요. 지금까지의 방법론들과는 다르게 VLM의 정보를…

Paper X-Review

[arxiv 2025] GigaWorld-0: World Models as Data Engine to Empower Embodied AI – Part 1… GigaWorld-0-Video

이번 리뷰 논문은 요즘 로봇 러닝 분야에서 눈에 띄는 성과를 보이고 있는 GigaAI에서 출판한 논문입니다. GigaAI에서 수행 중인 연구 중에서 가장 대단한 연구라고 생각하는 기법을…

X-Review

[arXiv 2026] Sim2real Image Translation Enables Viewpoint Robust Policies from Fixed-Camera Datasets

안녕하세요 이번주 x-review 에서는 VLA가 시점 변화에 강건하지 못한 점을 sim 데이터로 해결하되, 그 사이에서 생기는 sim2real gap을 효과적으로 줄이는 연구에 대한 논문을 리뷰해보려고 합니다….

Category: X-Review

[RSS 2025]NaVILA: Legged Robot Vision-Language-Action Model for Navigation

[ICCV 2025] Bidirectional Likelihood Estimation withMulti-Modal Large Language Models for Text-Video Retrieval

[RSS 2023] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

[arxiv 2025] Motus: A Unified Latent Action World Model

[CVPR 2023] R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

[IROS 2025] GSPR: Multimodal Place Recognition using 3D Gaussian Splatting for Autonomous Driving

[RSS 2023] Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

[Arxiv 2025] VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation

[arxiv 2025] GigaWorld-0: World Models as Data Engine to Empower Embodied AI – Part 1… GigaWorld-0-Video

[arXiv 2026] Sim2real Image Translation Enables Viewpoint Robust Policies from Fixed-Camera Datasets

Conference Deadline

NEW POST

New Comment