Skip to content

Robotics and Computer Vision Lab

AI in Sensing, AI in Perception, AI in Action

  • About
    • History
    • Photo
    • Admission
  • Members
  • Publications
    • Patents
  • X-Review
  • X-Diary
  • Peer Review

Profile

정 의철

About Posts
[arXiv 2025] CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe
with Sparse Upcycling
  • Posted on: 05/18/2025 –
  • Comments: 4 Comments
CLIP-MOE: TOWARDS BUILDING MIXTURE OF EXPERTS FOR CLIP WITH DIVERSIFIED MULTIPLET UPCYCLING
  • Posted on: 05/05/2025 –
  • Comments: 2 Comments
[2023 CVPR] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
  • Posted on: 04/28/2025 –
  • Comments: 4 Comments
[2022 NIPS] Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
  • Posted on: 04/07/2025 –
  • Comments: 8 Comments
[2022 NIPS] On the Representation Collapse of Sparse Mixture of Experts
  • Posted on: 04/07/2025 –
  • Comments: 8 Comments
[ CVPR 2022 ] X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
  • Posted on: 03/24/2025 –
  • Comments: 2 Comments
[CVPR 2024] Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
  • Posted on: 03/18/2025 –
  • Comments: No Comments
[2025 WACV] Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge
  • Posted on: 02/03/2025 –
  • Comments: 2 Comments
[2024 CVPR] The Neglected Tails in Vision-Language Models
  • Posted on: 01/20/2025 –
  • Comments: 6 Comments
[2024 EACL] Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
  • Posted on: 01/12/2025 –
  • Comments: 5 Comments
Newer Posts 1 2 3 … 6 7 Older Posts

Conference Deadline

NEW POST

  • [arxiv 2025.02] SOFAR: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
  • [arXiv 2024] Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
  • [ArXiv 2025]Accurate and efficient Zero-shot 6D pose estimation with frozen foundation models
  • [NIPS2023] Self-Chained Image-Language Model for Video Localization and Question Answering
  • [CVPR 2023] Feature Aggregated Queries for Transformer-based Video Object Detectors

New Comment

  1. 정 의철 on [2025 CVPR] Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions09/08/2025

    안녕하세요 성준님 질문 감사합니다. 먼저 co-attention에서 서로 다른 모달리티가 들어와도 projection을 통해서 차원은 맞춰줄 수 있습니다. query-aware adaptive filtering은 단지…

  2. 정 의철 on [2025 CVPR] Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions09/08/2025

    안녕하세요 유진님 질문 감사합니다. video level caption의 캡션은 비디오의 전역적인 정보를 담고 있어, 비디오의 전반적인 내용을 갖는다고 할 수 있습니다.…

  3. 이상인 on [ECCV 2018] CBAM: Convolutional Block Attention Module09/08/2025

    안녕하세요. 너무 예전에 쓴 리뷰라 해당 논문이 100% 기억나진 않지만, 지금의 제 지식으로 어느 정도 설명드릴 순 있을 것 같습니다.…

  4. 최 인하 on [arxiv 2025.02] SOFAR: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation09/08/2025

    안녕하세요 재찬님 좋은 논문 리뷰 감사합니다. 리뷰를 읽으면서 로봇 조작에 있어서 객체 중심의 위치뿐만 아니라 semantic orientation의 정보도 매우 중요하다는…

  5. 박 성준 on [NIPS2023] Self-Chained Image-Language Model for Video Localization and Question Answering09/08/2025

    안녕하세요. 홍주영 연구원님 좋은 댓글 감사합니다. 저자도 localizer의 초기 성능의 중요성의 중요성을 언급하고 있긴합니다. 저자는 BLIP-2의 성능을 믿고 있기도하고(?) 추가적으로…

  • Sign-in
  • RCV-Calendar
  • RCV-Github
  • Paper R/W
    • Arxiv
    • Deadline
    • Overleaf
  • Coding
    • OnlineJudge
    • Kaggle

포기하지 않는 강한 집념 만이 작은 차이를 만든다.

Design by SejongRCV