Robotics and Computer Vision Lab

손 건화 on [CVPR 2024] WorDepth: Variational Language Prior for Monocular Depth Estimation08/14/2025
안녕하세요, 리뷰 읽어주셔서 감사합니다.. 논문에서 text로 얻는 평균과 분산은 텍스트에 적합한 다양한 장면들의 분포를 나타내는 prior 역할을 하게 됩니다. 그런데…
손 건화 on [CVPR 2024] WorDepth: Variational Language Prior for Monocular Depth Estimation08/14/2025
안녕하세요, 리뷰 읽어주셔서 감사합니다. latent 공간에서는 d차원의 벡터로 이미지 공간 정보와 같은 형태를 가지고 있지 않아서 이미지 차원에 맞추어서 모든…
손 건화 on [CVPR 2024] WorDepth: Variational Language Prior for Monocular Depth Estimation08/14/2025
안녕하세요, 리뷰 읽어주셔서 감사합니다. 논문에서 어떤 구간을 1%로 사용한지에 대해서는 언급하진 않았지만, 말씀하신 것처럼 무작위로 선택되는 것이기 때문에 특정 에포크에서의…
정 윤서 on [ICCV 2023] CLIPTER: Looking at the Bigger Picture in Scene Text Recognition08/13/2025
댓글 감사합니다. 본 모델 구조를 보면 아시겠지만 text encoder는 사용하고 있지 않습니다. VLM의 encoder iamge 부분만 가져와 scene image를 embedding한…
정 윤서 on [TPAMI 2025] Instruction-Guided Scene Text Recognition08/13/2025
안녕하세요. 댓글 감사합니다. 1. 말 그대로 condition은 사전에 image에 대한 부가 정보를 주는 것으로 보심 되겠습니다. question이 예를 들어 이미지에…

[CoRL 2023 Oral] Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

[ICML 2025] FG-CLIP: Fine-Grained Visual and Textual Alignment

[ICRA 2022] Affordance Learning from Play for Sample-Efficient Policy Learning

[CVPR 2024]YOLO-World:Real-Time Open-Vocabulary Object Detection

[NeurIPS 2024] Scene Graph Generation with Role-Playing Large Language Models

[AAAI 2023] DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

[ICLR 2025] TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval

[CVPR 2025] Distilling Monocular Foundation Model for Fine-grained Depth Completion

[RSS 2025]Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation

[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

Conference Deadline

NEW POST

New Comment