Paper – Page 3 – Robotics and Computer Vision Lab

[WACV 2024] Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering

안녕하세요. 오늘의 X-Review에서 소개해드릴 논문은 24년도 WACV에 게재된 <Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering> 입니다. 현재 개인적으로 Audio-Visual Question…

Paper X-Review

[NeurIPS 2024]To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty

짧은 소개 본 논문은 LLM 답변의 불확실성을 수치화하기 위한 방법을 제시한 논문입니다. 특히 할루시네이션에 직접적인 영향을 미치는 지식부족형 불확실성을 수치화하는 기법을 제시하였으며, 수학적으로 증명할 수…

Paper X-Review

[ICRA 2021]ViNG: Learning Open-World Navigation with Visual Goals

안녕하세요. 이번에 리뷰할 논문은 ViNG: Learning Open-World Navigation with Visual Goals라는 논문 입니다. 이 논문은 2020년 ICRA에 게재된 논문이고 Visual Goal-Conditioned Navigation을 다룹니다. 사실 지지난번에…

Paper X-Review

[ECCV 2024 Workshops]Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

안녕하세요, 오늘도 AVQA 관련해서 논문 팔로우업도 할겸, 읽어보게 된 논문을 들고왔습니다. 이후 AVQA 관련 논문을 적을때 어떤 figure 어떤 실험들이 필요로하게 될지 생각의 폭을 넓히기…

Paper X-Review

[CVPR 2025] VisionZip: Longer is Better but Not Necessary in Vision Language Models

안녕하세요. 오늘의 X-Review는 25년도 CVPR에 게재된 VisionZip이라는 논문입니다. 제목에서도 알 수 있듯 VLM의 vision token efficiency와 관련된 논문이며, 개인적으로는 VisionZip의 방법론 자체도 좋지만 이 방법론의…

Paper X-Review

[CVPR 2023] Align and Attend: Multimodal Summarization with Dual Contrastive Losses

안녕하세요 황찬미입니다. 오늘 살펴볼 논문은 비디오 요약 task에서 multimodel summarization의 문제를 다루는 논문입니다. 동영상이 인풋으로 들어왔을때 통합모델 하나로 텍스트도 요약하고 비디오도 요약할수 있는 MSMO(Multimodal Summarization…

Paper X-Review

[ICLR 2018]SEMI-PARAMETRIC TOPOLOGICAL MEMORY FOR NAVIGATION

안녕하세요 이번에 리뷰로 들고온 논문은 ICLR 2018년에 게재된 Semi-Parametric Topological Memory For Navigation이라는 논문입니다. 비록 나온지 오래된 논문이지만 navigation중에서도 visual navigation 그 중에서도 기하학적인 지도를…

Paper X-Review

[NeurIPS 2025]Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

오늘 소개드릴 논문은 NeurIPS 2025 에 소개된 Vide RAG 관련 논문입니다. 해당 논문은 시각적으로 정렬(Visually-aligned)된 정보를 통해 Long video에 대한 이해력을 높이는 RAG 기술을 제안한…

Paper X-Review

[ArXiv 2025] VLA-0: Building State-of-the-Art VLAs with Zero Modification

이번 리뷰 논문은 NVIDIA에서 나온 따끈한 VLA 논문입니다. 최근 VLA의 연구들이 활성화되면서 구조에 대한 변화나 특화된 표현 방법을 사용하는 방법들이 제시되고 있는 추세입니다. 해당 논문은…

Paper X-Review

[ICRA 2025] HeLiOS: Heterogeneous LiDAR Place Recognition via Overlap-based Learning and Local Spherical Transformer

오랜만에 엑스리뷰 작성 감 좀 잡을 겸 인턴 기간동안 읽었던 논문 한편을 가볍게 리뷰할까 합니다. ICRA 2025 에 게재된 HeLiOS 라고 하는 논문이며, 서울대 김아영…

Category: Paper

[WACV 2024] Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering

[NeurIPS 2024]To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty

[ICRA 2021]ViNG: Learning Open-World Navigation with Visual Goals

[ECCV 2024 Workshops]Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

[CVPR 2025] VisionZip: Longer is Better but Not Necessary in Vision Language Models

[CVPR 2023] Align and Attend: Multimodal Summarization with Dual Contrastive Losses

[ICLR 2018]SEMI-PARAMETRIC TOPOLOGICAL MEMORY FOR NAVIGATION

[NeurIPS 2025]Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

[ArXiv 2025] VLA-0: Building State-of-the-Art VLAs with Zero Modification

[ICRA 2025] HeLiOS: Heterogeneous LiDAR Place Recognition via Overlap-based Learning and Local Spherical Transformer

Conference Deadline

NEW POST

New Comment