Paper – Page 2 – Robotics and Computer Vision Lab

[AAAI 2026] SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation

안녕하세요, 오늘은 좀 (저한테만?) 좀 신기한 논문을 가져왔습니다. VLA 관해서 “흠 뭐가 재밌을까”하다가 2026년 AAAI에 어쩌고 저쩌고 한 논문이라고 해서 쓱 훓어보니까 제가 알던 VLA의…

[CVPR 2026] VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

안녕하세요. 이번에 리뷰로 가져온 논문은 CVPR 2026애 올라온 VidEoMT: Your ViT is Secretly Also a Video Segmentation Model라는 논문입니다. 현재 내비게이션 플래닝 분야에서 action을 생성하는데…

Paper X-Review

[CVPR 2024] Optimal Transport Aggregation for Visual Place Recognition

Introduction VPR에서는 이미지를 apperance pattern descriptor로 설명합니다. 결국 VPR를 잘 수행하기 위해서는 이미지마다 구분력 있는 descriptor를 추출하는 것이 중요합니다. 이를 위해서는 변화하는 조명, 이동, 시간에…

Paper X-Review

[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

안녕하세요 오늘은 CVPR 2026에 accept된 video understanding 연구를 리뷰해보겠습니다.요즘 저는 적은 프레임, 작은 모델을 사용하면서도 성능은 어느 정도 나오는 효율적인 프레임워크들을 관심있게 보고 있는데요! 이…

Paper X-Review

[ICML 2025] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

안녕하세요 이번에 들고온 논문도 VLM 에서의 Token pruning 논문입니다. 최근에 나온 VLM token pruning 논문들의 성능이 훨씬 개선되기도 했지만, 24년도의 FastV와 마찬가지로 llm decoder단에서의 visual-text…

Paper X-Review

[CVPR 2025] Efficient Motion-Aware Video MLLM

안녕하세요. 이번에 리뷰로 가져온 논문은 Efficient Motion-Aware Video MLLM라는 논문입니다. 압축 비디오 안에는 이미 I-frame, P/B-frame, motion vector 같은 구조가 있고, 그 안에들어 있는 motion에…

Paper X-Review

[ICCV 2025] Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

안녕하세요 이번에 들고온 논문은 VLM 에서의 Token pruning 논문입니다. 다음연구로 VLM 에서의 visual token을 어떻게 잘 pruning 하거나 기존 방법론들을 분석해서 왜 잘되거나 잘 안되는지를…

Paper X-Review

[ICLR 2024] CLIPSELF: VISION TRANSFORMER DISTILLS ITSELF FOR OPEN-VOCABULARY DENSE PREDICTION

안녕하세요, 오늘은 ICLR 2024 Spotlight 논문인 CLIPself를 리뷰해 보려고 합니다. object detection 논문인 만큼 아마 많은 분들이 흥미롭게 읽을 수 있는 논문이지 않으까 싶네요. CLIP이…

Paper X-Review

[arXiv]On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

안녕하세요. 오늘은 지난번 세미나에서 소개해 드린 강화학습에서의 entropy dynamic을 정리한 논문을 소개해 드리려고 합니다. 지난번 세미나에서는 제가 설명을 너무 어렵게 드렸는데요. 오늘은 세미나에서 받은 질문을…

Paper X-Review

[arXiv 2026]Bridging the Indoor-Outdoor Gap Vision-Centric Instruction-Guided Embodied Navigation

안녕하세요. 이번에 리뷰할 논문은 중국의 알리바바 그룹 AMAP랩에서 작성한 Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation이라는 논문 입니다. 실제 로봇 배달이나 라스트마일 시나리오를 생각해보면…

Category: Paper

[AAAI 2026] SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation

[CVPR 2026] VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

[CVPR 2024] Optimal Transport Aggregation for Visual Place Recognition

[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

[ICML 2025] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

[CVPR 2025] Efficient Motion-Aware Video MLLM

[ICCV 2025] Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

[ICLR 2024] CLIPSELF: VISION TRANSFORMER DISTILLS ITSELF FOR OPEN-VOCABULARY DENSE PREDICTION

[arXiv]On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

[arXiv 2026]Bridging the Indoor-Outdoor Gap Vision-Centric Instruction-Guided Embodied Navigation

Conference Deadline

NEW POST

New Comment