신 인택 – Robotics and Computer Vision Lab

허 재연 on [ICRA 2023] Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs01/15/2026
재밌는 의견 주셔서 감사합니다. 요약하면 t-1->t 프레임 간 변화 정보(차이)를 모델링하는데 있어 전체 프레임을 보는 것보다 부분 정보를 활용하면 좋을…
박 성준 on [NIPS2025] Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding01/14/2026
안녕하세요, 재윤님 좋은 댓글 감사합니다. 재윤님이 말해주신 극단적인 케이스에서는 시간 순대로 나열하는 방식과 차이가 적긴하지만, 시간 정보와 클립 사이의 연결성도…
박 성준 on [NIPS2025] Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding01/14/2026
안녕하세요, 예은님 좋은 댓글 감사합니다. LVU task 중에서도 DB를 생성하고 평가하는 RAG방식의 방법론은 일반적으로 오프라인으로 DB를 생성하는 과정이 오래걸리는 것을…
박 성준 on [NIPS2025] Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding01/14/2026
안녕하세요, 현우님 좋은 댓글 감사합니다. 실제로 저자가 Appendix에서 Limitation 중 하나로 필터링에서 오류가 존재할 수 있다는 점을 언급하고 있습니다. 학습…
박 성준 on [NIPS2025] Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding01/14/2026
안녕하세요, 기현님 좋은 댓글 감사합니다. Vgent는 오프라인 DB를 생성할때에는 연산량이 늘어나고 시간이 오래걸리지만, DB를 생성한 이후에 평가를 진행할 때에는 효율적인…

Author: 신 인택

[IJCV 2025] Guiding Audio-Visual Question Answering with Collective Question Reasoning

[CVPR 2025] What’s in the Image? A Deep-Dive into the Vision of Vision Language Models

[WACV 2024] CAD – Contextual Multi-modal Alignment for Dynamic AVQA

[신인택] 2025년을 보내며

[NeurIPS 2020]Object-Centric Learning with Slot Attention

[ACM MM 2024]Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

[ECCV 2024 Workshops]Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

[NeurIPS 2024]Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

[Arxiv 2023]ONE-PEACE: EXPLORING ONE GENERAL REPRESENTATION MODEL TOWARD UNLIMITED MODALITIES

[CVPR 2024 Highlight]SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Conference Deadline

NEW POST

New Comment