Robotics and Computer Vision Lab

김 영규 on [IROS 2025] Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels10/20/2025
안녕하세요 태주님 댓글 감사합니다. 답변을 드리자면, A1. 저자가 real data 수를 바꿔가며 실험을 진행할 때, Real data가 150개일땐 Real data…
김 영규 on [IROS 2025] Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels10/20/2025
안녕하세요 석준님 댓글 감사합니다. Ablation에는 ‘sim 데이터와 real 데이터를 이렇게 세밀하게 조합하여 학습을 하면 가장 효과적이다~’ 에 대해서도 실험을 진행해본…
홍 주영 on [ICCV 2023] Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval10/20/2025
네, 논문에서는 “텍스트가 요구하는 세분화 수준(객체 vs 이벤트)”을 개별적으로 구분하거나 동적으로 조정하지 않습니다. 모든 텍스트 쿼리에 대해 두 수준(객체-구, 이벤트-문장)을…
김 태주 on [ArXiv 2025] VLA-0: Building State-of-the-Art VLAs with Zero Modification10/20/2025
Q1. 액션을 텍스트로 반환하는 능력 자체는 논문에 있는 3가지 스킬로 어느정도 구현이 됐다고 생각하고, 이 논문의 핵심중에 하나인 것 같습니다.…
김 태주 on [ArXiv 2025] VLA-0: Building State-of-the-Art VLAs with Zero Modification10/20/2025
Q1. 연속적인 행동 값을 정해진 정수 범위로 정규한다는 것이 궁금합니다. A1. 정확한 구현 방법에 대해서는 코드가 공개된 시점에서 밝혀질 것…

[ICML 2021] Learning Transferable Visual Models From Natural Language Supervision

[CVPR2023] Teaching Structured Vision & Language Concepts to Vision & Language Models

CoRL 2025 참관 후기

[arXiv 2022] Disentangled Representation Learning for Text-Video Retrieval

[NeurIPS 2024]Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

CoRL 2025 참관기

[CoRL 2025] Learning from 10 Demos: Generalisable and Sample-Efficient Policy Learning with Oriented Affordance Frames

CoRL 2025 참관기

[AAAI 2024] Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering

SmolVLM: Redefining small and efficientmultimodal models

Conference Deadline

NEW POST

New Comment