April 2023 – Page 2 – Robotics and Computer Vision Lab

이 재윤 on [TCSVT 2024] Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering12/23/2025
안녕하세요 현우님! 좋은 리뷰 감사합니다. 질문 하나 드리고자 합니다. Local branch는 질문에 따라 필요한 정보를 동적으로 추출해야 하는 곳인데, 여기서…
김기현 on [TCSVT 2024] Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering12/22/2025
안녕하세요, 현우님. 좋은 리뷰 감사드립니다. 리뷰를 읽으면서 궁금한 점이 생겼습니다. Global–Local fusion 단계에서 두 feature는 attention 기반 정제 이후 단순…
이 재윤 on [CVPR 2025] Video Summarization with Large Language Models12/22/2025
안녕하세요 찬미님 좋은 리뷰 감사합니다! M-LLM으로 '장면이 왜 중요한지' 판단할 수 있고, 두 번째 LLM과 self attention을 통해서 최종 중요도…
이 예은 on [TCSVT 2024] Question-Aware Global-Local Video Understanding Network for Audio-Visual Question Answering12/22/2025
안녕하세요 현우님 좋은 리뷰 감사합니다! co-attention에서 bi-modal attention은 스스로에 대한 self-attention과 타 모달리티와의 cross-attention의 평균을 낸 연산이라고 하였는데요 이 부분이…
김 현우 on [NeurIPS 2025] Video-R1: Reinforcing Video Reasoning in MLLMs12/22/2025
안녕하세요 좋은 리뷰 감사합니다. 학습 과정은 아래와 같다고 이해하였는데, 이에 두 가지 질문이 있습니다. (1) SFT 단계: Video-R1-CoT-165k (Qwen2.5-VL이 만든…

Month: April 2023

[ICCV 2021]Group-Free 3D Object Detection via Transformers

[CVPR 2023]R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

[CVPR 2022] RBGNet: Ray-based Grouping for 3D Object Detection

[ICASSP 2022] MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

[CVPR 2023] Masked Motion Encoding for Self-Supervised Video Representation Learning

[WACV 2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

[ICASSP 2022] Speech emotion recognition with co-attention based multi-level acoustic information

[ICLR 2019] A Closer Look at Few Shot Classification

Conference Deadline

NEW POST

New Comment