X-Review – Page 11 – Robotics and Computer Vision Lab

[CoRL 2024]LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos

안녕하세요. 이번에 리뷰로 들고온 논문은 2024 CoRL에 게재된 LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos 이라는 논문입니다. 이 논문의 간단한 컨셉은 언어 조건…

Paper X-Review

[arxiv 2025] Solving Spatial Supersensing Without Spatial Supersensing

안녕하세요! 어쩌다 보니 첫 x-review를 쓰게 된 이재윤입니다. 제 첫 x-review는 ResNet이나 Transformer가 될 줄 알았는데, 이번에 근택님 논문 작업에 참여하게 되어 Long video understanding…

Paper X-Review

[2025 ICLR] Retrieval Head Mechanistically ExplainsLong Context Factuality

안녕하세요. 이번에 소개할 논문은 롱컨텍스트 LLM이 긴 입력에서 정보를 실제로 어떻게 찾아 쓰는지를 모델 내부 attention head를 통해 분석한 연구입니다. 그럼 바로 리뷰 시작하겠습니다. 1….

X-Review

[ArXiv 2025] Active Video Perception: Iterative Evidence Seekingfor Agentic Long Video Understanding

안녕하세요, 오늘 리뷰할 논문은 Active Video Perception(AVP)입니다. Long Video Understanding 연구로 기존의 agentic 파이프라인의 단점을 보완한 연구입니다. Introduction 긴 비디오 이해(Long Video Understanding, LVU)는 대부분…

Paper X-Review

[Arxiv 2026] BabyVision: Visual Reasoning Beyond Language

안녕하세요 이번에 들고온 논문은 최신 MLLM 들의 시각적 능력이 언어적 priors에 크게 의존하고 있고 실제 모델의 근본적인 시각적 능력을 평가하기 위한 벤치마크를 제공한 논문입니다. 1/13일에…

X-Review

[arXiv 2025]A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

제가 이번에 리뷰할 논문은 작년 12월 중순에 공개된 논문으로, affordance reasoning에 기존 pretrained VLMs를 그대로 활용한 연구입니다. 성능이 굉장히 크게 개선되었다는 점에 눈에 띄고, 다른…

X-Review

[Arxiv 2026] Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

오늘은 DeepSeek AI 연구팀이 최근 공개한 LLM 논문을 리뷰해보겠습니다. DeepSeek 팀은 작년 이맘때 MoE 기반 모델로 큰 주목을 받았던 만큼, 1/12에 공개한 이번 논문도 많은…

X-Review

[CORL 2022]RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

안녕하세요 이번에 리뷰할 논문은 CoRL에 2022년에 발표된 RECON: Rapid Exploration Controllers for Outcome-driven Navigation 입니다. 저어번에 리뷰한 ViKiNG에서 얕게 다루고 넘어갔던 내용들이 RECON에 자세하게 다룬…

Paper X-Review

[IJCV 2025] Guiding Audio-Visual Question Answering with Collective Question Reasoning

Guiding Audio-Visual Question Answering with Collective Question Reasoning 안녕하세요 이번에도 AVQA 관련된 논문을 들고왔습니다. 방법론적으로 현재 연구중인 상황에서 각 모달리티별 아웃풋들을 어떻게 잘 Fusion 해서…

Paper X-Review

[NeurIPS2025]VideoLucy: Deep Memory Backtracking for Long Video Understanding

논문 간단 소개 본 논문은 Long video Understanding을 위한 agent 기반 프레임워크를 제시합니다. LLM을 활용하여 비디오에서 중요 정보를 찾아내거나 정보를 통합해 답변을 생성하는 agent를 설계하는…

Category: X-Review

[CoRL 2024]LeLaN: Learning A Language-Conditioned Navigation Policy from In-the-Wild Videos

[arxiv 2025] Solving Spatial Supersensing Without Spatial Supersensing

[2025 ICLR] Retrieval Head Mechanistically ExplainsLong Context Factuality

[ArXiv 2025] Active Video Perception: Iterative Evidence Seekingfor Agentic Long Video Understanding

[Arxiv 2026] BabyVision: Visual Reasoning Beyond Language

[arXiv 2025]A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

[Arxiv 2026] Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

[CORL 2022]RECON: Rapid Exploration for Open-World Navigation with Latent Goal Models

[IJCV 2025] Guiding Audio-Visual Question Answering with Collective Question Reasoning

[NeurIPS2025]VideoLucy: Deep Memory Backtracking for Long Video Understanding

Conference Deadline

NEW POST

New Comment