Robotics and Computer Vision Lab

김 영규 on [IROS 2025] Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels10/20/2025
안녕하세요 태주님 댓글 감사합니다. 답변을 드리자면, A1. 저자가 real data 수를 바꿔가며 실험을 진행할 때, Real data가 150개일땐 Real data…
김 영규 on [IROS 2025] Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels10/20/2025
안녕하세요 석준님 댓글 감사합니다. Ablation에는 ‘sim 데이터와 real 데이터를 이렇게 세밀하게 조합하여 학습을 하면 가장 효과적이다~’ 에 대해서도 실험을 진행해본…
홍 주영 on [ICCV 2023] Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval10/20/2025
네, 논문에서는 “텍스트가 요구하는 세분화 수준(객체 vs 이벤트)”을 개별적으로 구분하거나 동적으로 조정하지 않습니다. 모든 텍스트 쿼리에 대해 두 수준(객체-구, 이벤트-문장)을…
김 태주 on [ArXiv 2025] VLA-0: Building State-of-the-Art VLAs with Zero Modification10/20/2025
Q1. 액션을 텍스트로 반환하는 능력 자체는 논문에 있는 3가지 스킬로 어느정도 구현이 됐다고 생각하고, 이 논문의 핵심중에 하나인 것 같습니다.…
김 태주 on [ArXiv 2025] VLA-0: Building State-of-the-Art VLAs with Zero Modification10/20/2025
Q1. 연속적인 행동 값을 정해진 정수 범위로 정규한다는 것이 궁금합니다. A1. 정확한 구현 방법에 대해서는 코드가 공개된 시점에서 밝혀질 것…

[AAAI 2025] Patch-level Sounding Object Tracking for Audio-Visual Question Answering

[CVPR 2025] SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

[CoRL 2025] Planning from Point Clouds over Continuous Actions for Multi-object Rearrangement

[ACCV2024]Vision language models are blind: Failing to translate detailed visual features into words

Improving Language Understanding by Generative Pre-Training

[CoRL 2025] O3Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation

[CoRL 2025]One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation

[CVPR 2024] Open-Vocabulary Calibration for Fine-tuned CLIP

[AAAI 2025]HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models

[CVPR 2025] Object-aware Sound Source Localization via Audio-Visual Scene Understanding

Conference Deadline

NEW POST

New Comment