X-Review – Page 3 – Robotics and Computer Vision Lab

[ICLR 2026] DiVE-k: Differential Visual Reasoning for Fine-Grained Image Recognition

Abstract LVML은 방대한 text 지식을 보유하고 있으나, 이를 fine-grained image recognition에 적용하기에는 어려움 겪고있으며, 시각적으로 유사한 카테고리를 구분하지 못하는 경우가 많습니다. 정확한 문자열이 일치할 경우…

X-Review

[ICML 2026] Are Object-Centric Representations Better At Compositional Generalization?

오늘은 object-centric representation이 정말로 composition 능력에 강인한가? 에 대해 분석한 논문을 리뷰해보겠습니다. Venue: ICML 2026 Authors: Ferdinand Kapl, Amir Mohammad Karimi Mamaghan, Maximilian Seitzer, Karl Henrik…

X-Review

[cvpr 2025]Sonata: Self-Supervised Learning of Reliable Point Representations

안녕하세요 손우진입니다.오늘 리뷰할 논문은 CVPR 2025에 발표된 3D Self-Supervised Learning 논문, “Sonata: Self-Supervised Learning of Reliable Point Representations” 입니다. 3D 포인트에 대한 파운데이션 모델을 찾아보던중…

Paper X-Review

[AAAI 2026] Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment

Abstract LVLMs 들은 visual 입력들을 dense 한 sequences 들의 패치들로 변환하여 미세한 semantics들을 포착한다고 합니다. 이러한 visual tokens 들은 textual tokens 들과 달리 토큰수가 많고…

X-Review

[2025 ACL] Shifting from Ranking to Set Selection for Retrieval Augmented Generation

이번 X-Review에서 소개할 논문은 Shifting from Ranking to Set Selection for Retrieval-Augmented Generation입니다. 이 논문은 RAG에서 retrieval을 단순히 관련 passage를 순위화하는 문제가 아니라, 질문에 필요한…

X-Review

[CVPR 2026] Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

안녕하세요. 최근에 다시 감정에 관심을 가지게 되면서 이번에는 Nano-EmoX를 읽어보게 되었습니다. 해당 논문은 AI가 사람의 표정, 목소리, 영상, 말을 함께 보고 단순히 “기쁘다”, “슬프다”를 맞히는…

Paper X-Review

[ECCV 2024] PALM : Predicting Actions through Language Models

안녕하세요, 이번에 리뷰할 논문은 action anticipation 이라는 task를 다루는 논문입니다. 창의학기제 논문이 마무리되는대로 본 연구 주제로 넘어갈 예정이라 입문할 겸 해서 읽어보게 되었습니다. Action Anticipation…

Paper X-Review

[RA-L 2025] GeNIE: A Generalizable Navigation System for In-the-Wild Environments

안녕하세요. 이번에 리뷰로 가져온 논문은 GeNIE: A Generalizable Navigation System for In-the-Wild Environments 라는 논문입니다. 해당 논문은 2025 IROS에서 열린 earth rover challenge라는 대회에서 우승한…

X-Review

Protected: [Peer Review] Text-Conditioned Static-Dynamic Composition for Composed Video Retrieval

There is no excerpt because this is a protected post.

X-Review

[arXiv 2026] Modular Sensory Stream for Integrating Physical Feedback in Vision-Language-Action Models

안녕하세요 최인하입니다. 최근들어 진행하는 과제에 핏한 논문만 읽다가 RLWRLD에서 논문이 나와 읽어보게 되었습니다. 리뷰 시작하겠습니다. 사람은 시각 정보만 가지고 작업을 수행하지 않습니다. 키보드를 사용할 때도…

Category: X-Review

[ICLR 2026] DiVE-k: Differential Visual Reasoning for Fine-Grained Image Recognition

[ICML 2026] Are Object-Centric Representations Better At Compositional Generalization?

[cvpr 2025]Sonata: Self-Supervised Learning of Reliable Point Representations

[AAAI 2026] Rethinking Visual Token Reduction in LVLMs Under Cross-Modal Misalignment

[2025 ACL] Shifting from Ranking to Set Selection for Retrieval Augmented Generation

[CVPR 2026] Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

[ECCV 2024] PALM : Predicting Actions through Language Models

[RA-L 2025] GeNIE: A Generalizable Navigation System for In-the-Wild Environments

Protected: [Peer Review] Text-Conditioned Static-Dynamic Composition for Composed Video Retrieval

[arXiv 2026] Modular Sensory Stream for Integrating Physical Feedback in Vision-Language-Action Models

Conference Deadline

NEW POST

New Comment