Robotics and Computer Vision Lab

황 유진 on [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark08/12/2025
안녕하세요 우현님 답글 감사합니다. 질문해주신 Table4의 Ablation 실험에 대해서는 단일모달리티에서 caption의 유의미함을 통해 제안한 caption enhancement 구조의 타당성을 주장했는데요, caption에…
이 승현 on [CVPRw 2024] Strategies to Leverage Foundation Model Knowledge in Object Affordance Grounding08/11/2025
질문 감사합니다. 1. 넵. [1]은 멀티모달 추론에 사용되는 transformer에서 self-attention뿐만 아니라 co-attention도 함께 처리하여 bi-modal transformer구조에서 프롬프트에 해당하는 영역을 확인할…
신 인택 on [arxiv 2025]Fine Tuning without Catastrophic Forgetting via Selective Low Rank Adaptation08/11/2025
안녕하세요 재연님 답글 감사합니다. 우선 detection segmentation 등과같은 다른 task 에대한 실험은 없었습니다. 그리고 dino 는 확인해보니 VLM이 아니고 self…
신 인택 on [IEEE CBMI 2024]Is CLIP the main roadblock for fine-grained open-world perception?08/11/2025
안녕하세요 승현님 답글 감사합니다. 우선 두번째 가능성인 매칭 방식의 부족함 때문에 fine-grained 정보를 끄집어내기 힘들다고 이해하시면 될 것 같습니다. 그리고…
신 인택 on [IEEE CBMI 2024]Is CLIP the main roadblock for fine-grained open-world perception?08/11/2025
안녕하세요 지연님 답글 감사합니다. fig 3에서 CLIP 과 owl-cit를 비교실험하는 이유는 저희가 잘 아는 CLIP 과 이를 백본으로 사용하는 (저자가…

[NAACL 2018] BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

[PMLR 2021] Learning Transferable Visual Models From Natural Language Supervision

[arXiv 2021] QAHOI: Query-Based Anchors for Human-Object Interaction Detection

[ICRA 2019] Build your own hybrid thermal/EO camera for autonomous vehicle

MPViT : Multi-Path vision Transformer for Dense Prediction

[CVPR2020]Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision

[VISAPP2022] Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

Barlow Twins: Self-Supervised Learning via Redundancy Reduction

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

[CVPR 2017] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

Conference Deadline

NEW POST

New Comment