X-Review – Robotics and Computer Vision Lab

[ICML 2026] Causal-JEPA: Learning World Models through Object-Level Latent Masking

안녕하세요. 오늘은 ICML 2026에서 소개된 Causal-JEPA(C-JEPA)를 리뷰해보도록 하겠습니다. 제목을 직역하면 객체 단위의 latent masking을 통한 world model 학습인데요, 핵심 컨셉은 object-level masking을 통해 객체 간…

Paper X-Review

[ICML 2026] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

안녕하세요오늘은 ICML2026에 억셉된 논문이자 ICML참관했을때 주의깊게 봤던 포스터였던 논문인 Zooming without Zooming 논문을 리뷰해보겠습니다.들어가기에 앞서 간략하게 설명하자면 추론 과정에서 이미지의 작은 영역을 반복적으로 크롭하고 확대하는…

Paper X-Review

[ICLR 2021] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

안녕하세요. 강희승입니다. 지난주 Transformer에 이어, Computer Vision 연구에 Transformer를 적용한 ViT에 대해서 리뷰하려고 합니다. ViT는 현재 VLM에서도 많이 채택되어 활용되기 때문에, 다시 한번 복습하고자 해당…

X-Review

[ICRA 2025] Dual-BEV Nav: Dual-layer BEV-based Heuristic Path Planning for Robotic Navigation in Unstructured Outdoor Environments

Introduction 여전히 저자들은 로봇이 path planning을 잘못하고 있다고 시작합니다. 먼저 local한 path planning 관점에서는 outdoor의 변화무쌍한 환경으로 인해 traversability map을 제대로 감지 못한다고 합니다. 이를…

X-Review

[CVPR2025] ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Abstrcat 대규모 이미지-텍스트 데이터로 학습된 VLM은 classification에서 상당한 개선을 이루었으나, 모델의 성능이 프롬프트 품질에 의존하게 됩니다. 최근 연구들은 LLM이 생성한 시각적 설명이 VLM의 일반화 능력을…

X-Review

[ICML 2026] Adaptive Token Refinement in Long-Tailed Large Vision-Language Models Fine-Tuning

대부분의 모델이 가지는 long-tailed 문제를 해결하려고 한 ICML 논문에 대해 리뷰해보겠습니다. Venue: ICML 2026Authors: Wenjun Miao, Mingda Li, Yanchao Hao, Zheng WeiAffiliation: Beihang University (Beijing), Tencent…

X-Review

[arXiv 2026] XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning

안녕하세요. 이번에는 XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning 논문을 읽어봤습니다. 해당 논문은 감정 인식 AI가 “이 사람은 슬퍼 보인다”에서…

X-Review

[arXiv 2025] UniFGVC: Universal Training-Free Few-Shot Fine-Grained Visual Classification via Attribute-Aware Multimodal Retrieval

안녕하세요. 이번주에는 retrieval로 few-shot fine-grained visual classification(FGVC)를 수행하는 UniFGVC라는 논문을 소개하고자 합니다. 1. Introduction Fine-grained visual classification(FGVC)은 조류 종이나 자동차 모델처럼 더 넓은 상위 범주(superordinate…

Paper X-Review

[RSS 2024] Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

안녕하세요, 조성민입니다. 첫 글이다 보니 미숙한 점이 많을 것 같습니다. 내용 질문과 함께 글쓰기에 대한 피드백이 있다면 함께 댓글에 적어주시면 감사하겠습니다. 첫 리뷰를 어떤 논문으로…

Paper X-Review

[NIPS 2017] Attention Is All You Need

Context 안녕하세요, RCV 강희승 연구원입니다. ICML 이후 드디어 첫 X-review를 작성하게 되었습니다. RCV에 공식적으로 입실한 이후, 약 4개월이 지난 지금 기초교육 간 적지 않은 논문들을…

Category: X-Review

[ICML 2026] Causal-JEPA: Learning World Models through Object-Level Latent Masking

[ICML 2026] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

[ICLR 2021] AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

[ICRA 2025] Dual-BEV Nav: Dual-layer BEV-based Heuristic Path Planning for Robotic Navigation in Unstructured Outdoor Environments

[CVPR2025] ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

[ICML 2026] Adaptive Token Refinement in Long-Tailed Large Vision-Language Models Fine-Tuning

[arXiv 2026] XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning

[arXiv 2025] UniFGVC: Universal Training-Free Few-Shot Fine-Grained Visual Classification via Attribute-Aware Multimodal Retrieval

[RSS 2024] Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

[NIPS 2017] Attention Is All You Need

Conference Deadline

NEW POST

New Comment