Paper – Page 3 – Robotics and Computer Vision Lab

[TMLR 2026] A Survey of Token Compression for Efficient Multimodal Large Language Models (1)

안녕하세요. 오늘의 X-Review는 MLLM에서의 이미지, 비디오, 오디오 관련 token compression 서베이 논문을 소개해드리고자합니다. 저번주 Audio-Visual Question Answering task에 대한 논문을 제출한 뒤, 졸업 전까지 VLM을…

Paper X-Review

[RA-L 2022]Socially CompliAnt Navigation Dataset (SCAND) A Large-Scale Dataset of Demonstrations for Social Navigation

안녕하세요. 이번에 리뷰할 논문은 RAL 2022년에 올라온 Socially CompliAnt Navigation Dataset (SCAND) A Large-Scale Dataset of Demonstrations for Social Navigation 이라는 데이터셋 논문입니다. 바로 리뷰…

Paper X-Review

[AAAI 2026] VideoChat-A1: Thinking with Long Videos byChain-of-Shot Reasoning

안녕하세요! 이번에 소개할 논문은 Long Video Understanding에서 긴 비디오를 효과적으로 이해하기 어려운 문제를 해결하기 위해 shot단위의 점진적인 추론 방식인 Chain-of-Shot 프레임워크(VideoChat-A1)를 제안한 연구입니다이 논문은 기존…

Paper X-Review

[arXiv2025]LongVideoAgent: Multi-Agent Reasoning with Long Videos

왜 제안되었나? Crucially, most prior systems are non-agentic models: they process a static, pre-encoded or down-sampled video. 기존의 연구들은 미리 설계된(pre-encoded) 아키텍쳐로 분석을 수행하였다. 이러한…

Paper X-Review

[arXiv 2025] LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

안녕하세요 이번에 리뷰할 논문은 아카이브에 올라온지 2달된 LoGoPlanner Localization Grounded Navigation Policy with Metric-aware Visual Geometry라는 논문 입니다.지금까지는 image goal, language prompt 기반의 navigation 논문들을…

Paper X-Review

[CVPR 2025] Apollo: An Exploration of Video Understanding in Large Multimodal Models

안녕하세요, 3번째 x-review는 Apollo라는 논문입니다. (논문 기준) 현재까지 video-LLM 연구의 문제점을 짚고, 저자 자신들의 모델을 제안하는 구성이기 때문에 LVU task에 익숙하지 않으신 분들도 꽤(?) 재밌게…

Paper X-Review

[NeurIPS 2023] DAC-DETR: Divide the Attention Layers and Conquer

안녕하세요, 허재연입니다. 오늘은 DETR 관련 논문을 들고왔습니다. 요즘 비전 쪽 모델 보면 DETR 구조를 기반으로 변형된 모델들이 굉장히 많고, 다양한 분야의 task에서 DETR 구조를 도입하고…

Paper X-Review

[NeurIPS 2025] VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form VideoUnderstanding by Uncertainty-Aware CoT

안녕하세요이번에 소개할 논문은 Long Video Understanding에서 기존 LLM에이전트들이 tool사용 과정에서 발생하는 불확실성과 그로인해 누적되는 오류 문제를 지적하고 이것을 해결하기위해 uncertainty-aware CoT와 plan-adjust기반의 추론 구조를 제안한…

Paper X-Review

[arXiv2025]VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

# 들어가며 오늘 소개드린 논문은 video understanding 수행에 있어서 multi-agent를 사용하는 방법을 소개하는 논문입니다. 논문에 따르면 기존 방식은 추론 과정에서 초기 계획이 변하지 않는 fixed…

Paper X-Review

[CVPR 2022] Contrasitive Test Time Adaptation

안녕하세요 이번에 리뷰할 논문은 2022년에 CVPR에 개제된 Contrasitive Test Time Adaptation 이라는 논문입니다.Test Time Adaptation 이라는 Test time에 실제 그 도메인에 맞게끔 모델 자체가 스스로…

Category: Paper

[TMLR 2026] A Survey of Token Compression for Efficient Multimodal Large Language Models (1)

[RA-L 2022]Socially CompliAnt Navigation Dataset (SCAND) A Large-Scale Dataset of Demonstrations for Social Navigation

[AAAI 2026] VideoChat-A1: Thinking with Long Videos byChain-of-Shot Reasoning

[arXiv2025]LongVideoAgent: Multi-Agent Reasoning with Long Videos

[arXiv 2025] LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

[CVPR 2025] Apollo: An Exploration of Video Understanding in Large Multimodal Models

[NeurIPS 2023] DAC-DETR: Divide the Attention Layers and Conquer

[NeurIPS 2025] VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form VideoUnderstanding by Uncertainty-Aware CoT

[arXiv2025]VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

[CVPR 2022] Contrasitive Test Time Adaptation

Conference Deadline

NEW POST

New Comment