Paper – Page 4 – Robotics and Computer Vision Lab

[CVPR 2026] Think, Then Verify: A Hypothesis–Verification Multi-Agent Framework for Long Video Understanding

안녕하세요. 오늘은 long video understanding 분야의 논문 중 긴 비디오를 무작정 탐색하는 대신 정답 선지에 대한 가설을 먼저 세운 뒤 영상의 증거로 검증하는 방식을 제안한…

[ICLR 2020] Dream to Control: Learning Behaviors by Latent Imagination (Dreamer)

Latent World Models기반 월드 모델의 계보를 잇는 모델이자, 최근 DreamderV4까지 나온 논문의 시초인 dreamer를 리뷰해봤습니다. 재밌게 읽어주시면 감사하겠습니다. 먼저, Dreamer를 읽을 때 강화학습, 월드 모델,…

Paper X-Review

[NIPS 2023] Scaling Open-Vocabulary Object Detection

안녕하세요, 이번에 리뷰할 논문은 Google Deepmind에서 2023년에 발표한 NIPS spotlight 논문입니다. 현재 저희 팀 과제에 투입되기 위한 팔로우업 중에 읽게 된 논문으로, detection 데이터셋이 제한적인…

Paper X-Review

[CVPR 2026] ApET: Approximation-Error Guided Token Compression for Efficient VLMs

안녕하세요 이번에 들고온 논문도 VLM 에서의 token pruning 논문입니다. 해당 논문은 25년도까지의 pruning 논문들이 ViT의 [CLS] 토큰이나 llm decoder 단에서의 visual-text attention 정보에 어느정도 의존하는…

Paper X-Review

[AAAI 2026] SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation

안녕하세요, 오늘은 좀 (저한테만?) 좀 신기한 논문을 가져왔습니다. VLA 관해서 “흠 뭐가 재밌을까”하다가 2026년 AAAI에 어쩌고 저쩌고 한 논문이라고 해서 쓱 훓어보니까 제가 알던 VLA의…

Paper X-Review

[CVPR 2026] VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

안녕하세요. 이번에 리뷰로 가져온 논문은 CVPR 2026애 올라온 VidEoMT: Your ViT is Secretly Also a Video Segmentation Model라는 논문입니다. 현재 내비게이션 플래닝 분야에서 action을 생성하는데…

Paper X-Review

[CVPR 2024] Optimal Transport Aggregation for Visual Place Recognition

Introduction VPR에서는 이미지를 apperance pattern descriptor로 설명합니다. 결국 VPR를 잘 수행하기 위해서는 이미지마다 구분력 있는 descriptor를 추출하는 것이 중요합니다. 이를 위해서는 변화하는 조명, 이동, 시간에…

Paper X-Review

[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

안녕하세요 오늘은 CVPR 2026에 accept된 video understanding 연구를 리뷰해보겠습니다.요즘 저는 적은 프레임, 작은 모델을 사용하면서도 성능은 어느 정도 나오는 효율적인 프레임워크들을 관심있게 보고 있는데요! 이…

Paper X-Review

[ICML 2025] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

안녕하세요 이번에 들고온 논문도 VLM 에서의 Token pruning 논문입니다. 최근에 나온 VLM token pruning 논문들의 성능이 훨씬 개선되기도 했지만, 24년도의 FastV와 마찬가지로 llm decoder단에서의 visual-text…

Paper X-Review

[CVPR 2025] Efficient Motion-Aware Video MLLM

안녕하세요. 이번에 리뷰로 가져온 논문은 Efficient Motion-Aware Video MLLM라는 논문입니다. 압축 비디오 안에는 이미 I-frame, P/B-frame, motion vector 같은 구조가 있고, 그 안에들어 있는 motion에…

Category: Paper

[CVPR 2026] Think, Then Verify: A Hypothesis–Verification Multi-Agent Framework for Long Video Understanding

[ICLR 2020] Dream to Control: Learning Behaviors by Latent Imagination (Dreamer)

[NIPS 2023] Scaling Open-Vocabulary Object Detection

[CVPR 2026] ApET: Approximation-Error Guided Token Compression for Efficient VLMs

[AAAI 2026] SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation

[CVPR 2026] VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

[CVPR 2024] Optimal Transport Aggregation for Visual Place Recognition

[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

[ICML 2025] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

[CVPR 2025] Efficient Motion-Aware Video MLLM

Conference Deadline

NEW POST

New Comment