Robotics and Computer Vision Lab

정 의철 on [NIPS2025] Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding01/13/2026
안녕하세요 성준님 좋은 리뷰 감사합니다 Structured Reasoning 부분에서 하위 질문을 생성한다고 하셨는데 이때 생성 모델로는 무엇을 사용하고 프롬프트는 무엇을 사용하는지…
신 인택 on [NIPS 2017]Attention Is All You Need01/13/2026
안녕하세요 인하님 트랜스포머를 다뤄주셨네요. 저도 트랜스포머를 처음 봤을떄도 그렇고 지금도 cross attention 이나 self attention 을 모듈에 사용하면서도 어떻게 연산이…
김기현 on [arXiv 2025] SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics01/12/2026
안녕하세요, 영규님 댓글 감사합니다. 비동기(asynchronous) inference와 관련해 논문에서는 명시적·정량적으로 성능이 우수하다고 평가한 부분은 없고, 정성적으로 더 빠른 반응성과 연속적인 움직임을…
김 영규 on [arXiv 2025] SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics01/12/2026
안녕하세요 기현님 리뷰 감사합니다. Smol VLA의 구조 에 대해서 잘 설명을 해주신 것 같습니다. Asynchronous inference의 실험 결과에서 성능이 좋아지는…
김기현 on [arXiv 2025] IGen: Scalable Data Generation for Robot Learning from Open-World Images01/12/2026
안녕하세요, 영규님 좋은 리뷰 감사합니다. 리뷰를 읽으며 특히 인상 깊었던 점은, 단일 이미지로부터 로봇의 action뿐 아니라 visual observation까지 포함된 시퀀스를…

[AAAI 2024] SECap: Speech Emotion Captioning with Large Language Model

[CVPR 2023] Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network

[arXiv 2024] DEPTH PRO: Sharp Monocular Metric Depth In Less Than a Second

[RA-L 2024]Uncertainty-Aware Suction Grasping for Cluttered Scenes

[NeurIPS 2020] FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence

[CVPR 2023] DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

[CVPR 2024] pix2gestalt: Amodal Segmentation by Synthesizing Wholes

[ECCV 2024] SegPoint: Segment Any Point Cloud via Large Language Model

[2023 CVPR] Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

[ECCV 2024] HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

Conference Deadline

NEW POST

New Comment