X-Review – Page 18 – Robotics and Computer Vision Lab

[arXiv 2025]Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

제가 이번에 리뷰할 논문은 8월11일에 아카이브에 공개된 논문으로, Affordance에 대한 Chain-Of-Thought를 위해 reward를 도입하여 학습한 방식입니다. 새로운 접근법 같기도 하고, CoT를 위한 데이터도 공개했다는 점에서…

X-Review

[2023 CVPR] Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

안녕하세요. 이번에 소개할 논문은 사전 학습된 CLIP 모델을 비디오 도메인으로 확장할 때 시간 모델링에 대한 분석을 다룬 연구입니다. 비디오 태스크에는 Retrieval과 같은 고수준(high-level) 태스크와, Video…

X-Review

[ICCV2025] Object-centric Video Question Answering with Visual Grounding and Referring

안녕하세요. 박성준 연구원입니다. 최근 ICCV2025에 공개된 Video Question Grounding 연구입니다. Introduction 최근에 제가 관심가지고 서베이 중인 분야인 Video Question Grounding은 기본적으로 Video Question Answering이지만, 모델이…

X-Review

[ICCV 2025] DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

다시 비디오 이해와 관련한 태스크 리뷰를 수행해보겠습니다. MLLM에서 비디오 표현을 위한 설계를 다룬 논문인 것 같아 읽게되었습니다. 1. Introduction 멀티모달 대형 언어모델(MLLM)의 발전은 이미지 기반의…

X-Review

[WACV 2024]Revisiting Token Pruning for Object Detection and Instance Segmentation

안녕하세요, 이번에 리뷰할 논문은 토큰 프루닝 관련 논문입니다. 토큰 프루닝 관련 논문은 처음 접해보는 분야인지라 아무리 쉬운 방법론이라고 저자가 언급하여도 저한테는 어렵고 낯설어서 읽기가 어려웠던…

Paper X-Review

[IEEE 2024 IJCNN]Image Caption Method from Coarse to Fine Based On Dual Encoder-Decoder Framework

안녕하세요 이번 리뷰는 fine grained 레벨의 이미지 캡션 생성 논문입니다. 최근 GPT 계열의 foundation 모델을 통해 fine grained 캡션 생성도 가능해졌지만, 이 논문은 별도의 foundation…

Conference X-Review

[ICCV2025] Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching

이번에 소개드릴 논문은 ICCV2025에 게재된 논문으로 feature matching task를 다루고 있습니다. 제가 예전에 homography estimation 논문을 작성할 때 feature matching 방법론들 논문을 종종 보곤 했었는데…

X-Review

[ICCV 2025] RoboPearls: Editable Video Simulation for Robot Manipulation

안녕하세요, 이번주는 RoboPearls라는 비디오 기반의 시뮬레이션 환경을 제안한 논문을 리뷰해보려고 합니다. 이번 논문은 사실 승현님이 LLM과제에 활용할 수 있지 않을까 하시면서 알려주신 논문입니다. LLM과 3DGS를…

Paper X-Review

[TMM 2025] Spatial-Temporal Saliency Guided Unbiased Contrastive Learning for Video Scene Graph Generation

안녕하세요, 허재연입니다. 이번에도 video 기반의 Scene Graph Generation(SGG) 논문을 들고 왔습니다. 오늘 다룰 논문은 IEEE TRANSACTIONS ON MULTIMEDIA(TMM)에 게재된 논문으로, object의 식별에 집중한 논문입니다. 리뷰…

Paper X-Review

[CVPR 2025] UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

안녕하세요, 71번째 X-Review입니다. 이번 논문은 2025년도 CVPR에 올라온 UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection 입니다. 바로 시작하도록 하겠습니다. 1. Introduction 기존…

Category: X-Review

[arXiv 2025]Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

[2023 CVPR] Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

[ICCV2025] Object-centric Video Question Answering with Visual Grounding and Referring

[ICCV 2025] DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding

[WACV 2024]Revisiting Token Pruning for Object Detection and Instance Segmentation

[IEEE 2024 IJCNN]Image Caption Method from Coarse to Fine Based On Dual Encoder-Decoder Framework

[ICCV2025] Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching

[ICCV 2025] RoboPearls: Editable Video Simulation for Robot Manipulation

[TMM 2025] Spatial-Temporal Saliency Guided Unbiased Contrastive Learning for Video Scene Graph Generation

[CVPR 2025] UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection

Conference Deadline

NEW POST

New Comment