X-Review – Page 14 – Robotics and Computer Vision Lab

[CVPR 2025] Efficient Motion-Aware Video MLLM

안녕하세요. 이번에 리뷰로 가져온 논문은 Efficient Motion-Aware Video MLLM라는 논문입니다. 압축 비디오 안에는 이미 I-frame, P/B-frame, motion vector 같은 구조가 있고, 그 안에들어 있는 motion에…

X-Review

[ICLR 2026] Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

안녕하세요, 이번주는 Large-Scale RL에 대해 다루어보려고 합니다. RL을 통해 policy를 학습하게되면 너무 optimal한 행동에 fitting되고 여러 상황에 대응하기는 좀 힘들 뿐 만 아니라 reward shaping이…

X-Review

[RA-L 2026] Guiding Robotic Cloth Grasping in Darkness: Infrared Semantic Segmentation andGrasping Position Selection

안녕하세요 손우진입니다. 오늘은 그동안 주로 다루었던 6D Pose Estimation 방법론이나 데이터셋 구축 논문보다는 로봇 매니퓰레이션 이라는 새로운 태스크의 논문을 리뷰해보려 합니다. 최근 멀티스펙트럴 데이터를 활용한…

X-Review

[CVPR 2025] VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

오늘은 비디오에서의 compositionality 를 분석한 논문을 리뷰해보겠습니다. 리뷰하고보니, 2022년에 저희 연구실에서 세미나를 진행해주신 구글 딥마인드의 김다훈 박사님의 논문이네요 리뷰 시작해보겠습니다. Venue: CVPR 2025 Authors: Dahun Kim,…

X-Review

[NeurIPS 2025] Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era

안녕하세요 오늘 리뷰할 논문은 Visual Place Recognition에서 현재 SOTA를 달성하고 있는 Towards Implicit Aggregation(이하 ImAge)입니다. 논문의 제목에서도 Transformer Era에서의 Place Recognition이라고 달아둘 정도로 아주 깔끔하면서…

X-Review

[arxiv 2026] LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

논문 정보 저자:Lucas Maes*¹, Quentin Le Lidec*², Damien Scieur¹·³, Yann LeCun², Randall Balestriero⁴1: Mila & Université de Montréal, 2: New York University, 3: Samsung SAIL,…

X-Review

[NeurIPS 2025] RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

안녕하세요. 오늘 소개드릴 논문은 3D-aware VLM인 RoboRefer입니다. 1. Introduction 로봇이 복잡한 환경과 잘 상호작용하기 위해서는 3D 공간을 이해하는 것이 중요합니다. 따라서 embodied AI에서는 open-world spatial…

Paper X-Review

[ICCV 2025] Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

안녕하세요 이번에 들고온 논문은 VLM 에서의 Token pruning 논문입니다. 다음연구로 VLM 에서의 visual token을 어떻게 잘 pruning 하거나 기존 방법론들을 분석해서 왜 잘되거나 잘 안되는지를…

Paper X-Review

[ICLR 2024] CLIPSELF: VISION TRANSFORMER DISTILLS ITSELF FOR OPEN-VOCABULARY DENSE PREDICTION

안녕하세요, 오늘은 ICLR 2024 Spotlight 논문인 CLIPself를 리뷰해 보려고 합니다. object detection 논문인 만큼 아마 많은 분들이 흥미롭게 읽을 수 있는 논문이지 않으까 싶네요. CLIP이…

Paper X-Review

[arXiv]On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

안녕하세요. 오늘은 지난번 세미나에서 소개해 드린 강화학습에서의 entropy dynamic을 정리한 논문을 소개해 드리려고 합니다. 지난번 세미나에서는 제가 설명을 너무 어렵게 드렸는데요. 오늘은 세미나에서 받은 질문을…

Category: X-Review

[CVPR 2025] Efficient Motion-Aware Video MLLM

[ICLR 2026] Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

[RA-L 2026] Guiding Robotic Cloth Grasping in Darkness: Infrared Semantic Segmentation andGrasping Position Selection

[CVPR 2025] VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

[NeurIPS 2025] Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era

[arxiv 2026] LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

[NeurIPS 2025] RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

[ICCV 2025] Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

[ICLR 2024] CLIPSELF: VISION TRANSFORMER DISTILLS ITSELF FOR OPEN-VOCABULARY DENSE PREDICTION

[arXiv]On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

Conference Deadline

NEW POST

New Comment