Robotics and Computer Vision Lab

신 인택 on What are World Models..?12/22/2025
안녕하세요 영규님 좋은글 감사합니다. 전체적으로 배운내용을 정리하면서 작성하신 것 같은데, 제가 모르는 분야지만 어느정도 이해가 되게 잘 정리해 주신 것…
신 인택 on [CVPR 2025] Video Summarization with Large Language Models12/22/2025
안녕하세요 찬미님 좋은글 감사합니다. video summarization task 를 읽어본적이 없어 읽게되었습니다. 우선 해당 방법론이 어떠한 method를 통해 동작되는지는 얼추 이해가…
김기현 on What are World Models..?12/21/2025
안녕하세요, 영규님. 좋은 리뷰 감사드립니다. 리뷰를 읽으며 개인적으로 궁금한 점이 생겨 질문드립니다. 제가 이해하기로는 현재 소개된 월드 모델들은 주로 비디오와…
이 재찬 on [IROS 2025] VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model12/16/2025
승현님, 리뷰 읽어주셔서 감사합니다. 1. 타당한 질문이라고 생각이 들지만, 본 논문에서는 pick-and-place를 low-level primitive action으로 두기 때문에, keyframe selection에서 이동중이다에…
이 재찬 on [IROS 2025] VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model12/16/2025
인하님, 리뷰 읽어주셔서 감사합니다! 말씀해주신 부분 중 1. wrist keypoint에 대한 속도만 계산한거냐? -> 손에 모든 keypoints들의 centroid를 계산해서 그…

What are World Models..?

[ICCV2025] Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

[CVPR 2024] Koala: Key frame-conditioned long video-LLM

[IROS 2025] VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model

[AAAI 2025] V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

[arXiv 2025]Rethinking Intermediate Representation for VLM-based Robot Manipulation

[arXiv 2025] EgoDemoGen: Novel Egocentric Demonstration Generation Enables Viewpoint-Robust Manipulation

[Arxiv 2025] DeepSeek-OCR: Contexts Optical Compression

[RSS 2022]ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints

[CVPR 2020] Counterfactual Samples Synthesizing for Robust Visual Question Answering

Conference Deadline

NEW POST

New Comment