Robotics and Computer Vision Lab

안 우현 on [arXiv 2025]OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation01/06/2026
안녕하세요 우진님 댓글 감사합니다. 리뷰에서 말씀드렸다 싶이 예를들어 어떤 샘플이 현재 이미지 + 언어 프롬프트만 있고 2D 포즈/goal image가 없다면,…
홍 주영 on [EMNLP 2025] X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning01/06/2026
좋은 질문 감사합니다. q–v를 각각 평가하거나 q와 여러 비디오를 한 번에 비교하는 방식은 계산적으로는 효율적이겠지만, LLM이 각 비디오를 절대적인 기준으로…
홍 주영 on [EMNLP 2025] X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning01/06/2026
좋은 질문 감사합니다. X-CoT를 단순히 백본 모델의 오답을 고치는 '교정기' 라기보다는, 임베딩 유사도만으로는 잘 드러나지 않는 차이를 비교해 주는 보완…
이 예은 on [CVPR 2025]Compositional Caching for Training-free Open-vocabulary Attribute Detection01/05/2026
안녕하세요 승현님, 좋은 리뷰 감사합니다! compatibility를 구할때 db 기반 점수와 llm 기반 점수의 곱을 사용한 이유가 llm이 가진 편향의 영향을…
이 재윤 on [EMNLP 2025] X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning01/05/2026
안녕하세요 주영님, 좋은 리뷰 감사합니다. X-CoT가 CLIP과 같은 foundation model뿐만 아니라, 이미 좋은 성능을 가진 X-Pool 위에서도 일관된 성능 향상을…

CVPR 2025 참관기

[arXiv 2025] [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster

[arXiv 2025] Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data

[AAAI 2025] Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior

[arXiv 2025] DidSee: Diffusion-Based Depth Completion for Material-Agnostic Robotic Perception and Manipulation

[CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval

CVPR2025 참관기

CVPR 2025 참관기

[arXiv 2024] Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

[CVPR 2024] Bridging the Gap Between End-to-End and Two-Step Text Spotting

Conference Deadline

NEW POST

New Comment