X-Review – Page 19 – Robotics and Computer Vision Lab

[ACM MM 2024] Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

안녕하세요. 오늘의 X-Review는 24년도 ACM MM 학회에 게재된 AVQA(Audio-Visual Question Answering) task 방법론 논문입니다. AVQA task와 관련된 내용은 논문과 함께 설명드리겠습니다. 1. Introduction 저는 최근까지…

X-Review

[CVPR2025] Cross-modal Causal Relation Alignment for Video Question Grounding

안녕하세요. 박성준 연구원입니다. 오늘 리뷰할 논문은 CVPR 2025 Highlight 논문으로 Video Question Grounding(VQG)을 다룬 논문입니다. Introduction Video Question Answering(VideoQA)는 비디오와 자연어 질문을 입력 받아서 해당…

Conference X-Review

[CVPR2025] NVILA: Efficient Frontier Visual Language Models

오늘 소개드릴 논문은 CVPR2025에 게재된 NVIDIA 논문이고 제목에서도 보시면 아시다시피 효율적인 VLM을 만드는 방법에 대해서 소개하는 논문입니다. 근데 제가 논문을 검색해서 찾을 때는 CVPR 포멧이…

Conference X-Review

[ICCV 2025]A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

제가 이번에 리뷰할 논문은 ICCV 2025에 paper list에 있는 논문입니다. 2단계로 이루어져서 affordance를 찾고 그에 대한 action을 생성하는 과정으로 이루어집니다. affordance learning이 실제 application에 적용되도록…

Paper X-Review

[IEEE CBMI 2024]Is CLIP the main roadblock for fine-grained open-world perception?

안녕하세요 오늘 논문은 CLIP을 사용하면서 fine-grained 레벨의 객체를 찾는 논문을 찾다가 제목을 보고 읽게 되었습니다. 제목을 해석해보자면 CLIP이 세밀한 레벨에서의 open-world perception에 있어서 주된 병목,…

Paper X-Review

[Arxiv 2022]BinsFormer:Revisiting Adaptive Bins forMonocular Depth Estimation

안녕하세요 이번에 들고온 논문은 2022년에 arxiv에 올라온 BinsFormer:Revisiting Adaptive Bins forMonocular Depth Estimation라는 논문입니다.이번에는 이전에 리뷰했던 Scale Depth의 근간이 되는 BinsFormer를 직접 읽어보면서, 무엇이 핵심…

Paper X-Review

[ICCV 2025] MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

안녕하세요, 70번째 X-Review입니다. 이번 논문은 2025년도 ICCV에 올라온 MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning 입니다. 바로 시작하도록 하겠습니다. 1….

X-Review

[CVPR Workshop 2025] Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

안녕하세요 이번주 X-review는 Robotic Manipulation 데이터에 관한 논문을 리뷰해보도록 하겠습니다. Video Diffusion을 활용한 아이디어가 참신해서 읽어보게 되었습니다. Robots Imitating GeneratedVideos (RIGVid) 라는 프레임워크를 제안한 연구인데,…

X-Review

[arXiv 2024] Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

안녕하세요. 이번엔 text 관련 태스크이지만 detection, recognition도 아닌 segmentation 논문을 가져왔습니다. Text segmentation 모델을 제안한 연구는 아니고요 Segment Anything Model로 text segmentation 을 수행하게 하는데…

Paper X-Review

[ECCV 2024] Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

안녕하세요, 75번째 x-review 입니다. 이번 논문은 2024 ECCV에 게재된 Diffusion Models for Monocular Depth Estimation이라는 논문 입니다. 그럼 바로 리뷰 시작하겠습니다 1. Introduction MDE는 한…

Category: X-Review

[ACM MM 2024] Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

[CVPR2025] Cross-modal Causal Relation Alignment for Video Question Grounding

[CVPR2025] NVILA: Efficient Frontier Visual Language Models

[ICCV 2025]A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

[IEEE CBMI 2024]Is CLIP the main roadblock for fine-grained open-world perception?

[Arxiv 2022]BinsFormer:Revisiting Adaptive Bins forMonocular Depth Estimation

[ICCV 2025] MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning

[CVPR Workshop 2025] Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

[arXiv 2024] Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

[ECCV 2024] Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

Conference Deadline

NEW POST

New Comment