Robotics and Computer Vision Lab

손 건화 on [CVPR 2024] WorDepth: Variational Language Prior for Monocular Depth Estimation08/14/2025
안녕하세요, 리뷰 읽어주셔서 감사합니다.. 논문에서 text로 얻는 평균과 분산은 텍스트에 적합한 다양한 장면들의 분포를 나타내는 prior 역할을 하게 됩니다. 그런데…
손 건화 on [CVPR 2024] WorDepth: Variational Language Prior for Monocular Depth Estimation08/14/2025
안녕하세요, 리뷰 읽어주셔서 감사합니다. latent 공간에서는 d차원의 벡터로 이미지 공간 정보와 같은 형태를 가지고 있지 않아서 이미지 차원에 맞추어서 모든…
손 건화 on [CVPR 2024] WorDepth: Variational Language Prior for Monocular Depth Estimation08/14/2025
안녕하세요, 리뷰 읽어주셔서 감사합니다. 논문에서 어떤 구간을 1%로 사용한지에 대해서는 언급하진 않았지만, 말씀하신 것처럼 무작위로 선택되는 것이기 때문에 특정 에포크에서의…
정 윤서 on [ICCV 2023] CLIPTER: Looking at the Bigger Picture in Scene Text Recognition08/13/2025
댓글 감사합니다. 본 모델 구조를 보면 아시겠지만 text encoder는 사용하고 있지 않습니다. VLM의 encoder iamge 부분만 가져와 scene image를 embedding한…
정 윤서 on [TPAMI 2025] Instruction-Guided Scene Text Recognition08/13/2025
안녕하세요. 댓글 감사합니다. 1. 말 그대로 condition은 사전에 image에 대한 부가 정보를 주는 것으로 보심 되겠습니다. question이 예를 들어 이미지에…

[arXiv 2025] Depth Anything with Any Prior

[CVPR 2025] Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment

[CVPR2022] Think Global, Act Local: Dual-scale Graph Transformer for vision-and-Language Navigation

[arXiv 2025] Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups

ICRA 2025 참관기

[CVPR 2020] On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

[ICLR 2025] Dense Video Object Captioning from Disjoint Supervision

[arXiv 2024]EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

[arXiv 2025]OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

[ICCV 2023] Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Conference Deadline

NEW POST

New Comment