[arXiv] Unifying Deep Local and Global Features for Image Search

Bingyi Cao , Andre Araujo, Jack Sim Google Research, USA

[그림 1] Image Retrieval 을 위한 논문의 pipeline

이 논문은 DELF 를 이어 낸 Google 의 Image Retrieval을 위한 Feature 논문입니다. 기존 DELF Localdescriptor tower 만을 가지고 진행 했지만 , 이번 DELG(DEep Learning Local and Global descriptor) 는 기존 DELF 에 Global Descriptor에 Global Descriptor tower 까지 추가하여 Retrieval 의 성능을 올리고 Local Descriptor 간의 matching의 성능을 올린 논문이다.

이렇게 모델 하나에 Local 과 Global Descriptor를 한번에 추출하는 모델에는 [1]이 있습니다. [1] 은 SuperPoint로 Local Descriptor를 NetVlad로 Global Descriptor 를 Distill 하는 방식으로 학습을 한다. 하지만 DELG는 학습한 다른 모델을 베이스로하고 있지 않아서 모델 성능에 제한이 없다는 장점이 있다.

이 논문이 특징점으로는 세가지가있다.

DELG를 설계하여 Local Descriptor 와 Global Descriptor 를 Mean polling [2] 과 Attentive local feature detection을 활용한 CNN베이스로 한번에 추출한다.
CNN autoencoder 를 적용하여 Low Dimension 의 Local descriptor를 추출하여 PCA와 같은 기존 Processing을 사용하지 않아도 된다.
이미지 자체를 라벨로 가지는 학습 방식을 제안한다. 따라서 라벨이 원하는 Descriptor 표현 방식을 방해하지 않게 되었다.

DELG

Global tower 는 Resnet Backbone 에 GeM pooling과 FC layer 로 whitening을 하여 descriptor를 생성 한다. 그리고 Local Descriptor 는 기존 DELF 의 방식 대로 Attention Layers를 통해 Local Descriptor를 생성한다. 그리고 Auto Encoder 를 통해 생성된 Vector와 비교하여 Reconstruction Loss를 구한다. Reconstruction loss 와 Attention loss를 통해 Backpropagation 할때 미분값이 Backbone에는 영향을 안끼치도록 멈춘다. 이는 얕은 네트워크 층인 Local descriptor 부분이 backbone 에 영향을 끼치면 Global Descriptor 에도 악영향을 끼치고 결고ㅏ적으로 Local 쪽에도 안 좋은 영향을 끼치게 된다.

결과

표 1을 통해 DELG 가 현재 Retrieval 방식 중에 SOTA 를 보유하고 있는 것을 알려준다.

총평

DELF 에 Global Descritptor 와 AE 만 붙은 거 같은데 기존 다른 모델들에 비해 높은 성능을 내고 있다는 것에 놀랐다. 기존 DELF 의 방식에서 Local 하게 Retrieval 하던것 을 버리고 Global로 대체한 것에 의미가 큰 것이라고 생각한다. AE를 통해 차원을 효과적으로 줄여 PCA를 대체했다는데 AE가 무엇인지도 자세히 봐야할 필요가 있을것 같다.

References

[1] Sarlin, Paul-Edouard, et al. “From coarse to fine: Robust hierarchical localization at large scale.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[2] Radenovic, F., Tolias, G., Chum, O.: Fine-tuning CNN Image Retrieval with No Human ´ Annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)
[3] Deng, Jiankang, et al. “Arcface: Additive angular margin loss for deep face recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
[4] Noh, Hyeonwoo, et al. “Large-scale image retrieval with attentive deep local features.” Proceedings of the IEEE international conference on computer vision. 2017.

1 thought on “[arXiv] Unifying Deep Local and Global Features for Image Search”

Jo-won says:

05/17/2020 at 23:53

Backbone network의 feature volume을 auto encoder를 태워 reconstruction loss를 구하는데 이에 대한 의미는 무엇인가요?

Leave a Reply to Jo-won Cancel reply

안녕하세요 재연님, 리뷰 읽어주셔서 감사합니다. 1. 말씀하신대로 texture나 기타 물리적인 속성은 VLM에 템플릿을 제공해 대답하게 한 뒤 결과로 나온 자연어…

안녕하세요 찬미님! 글 잘 읽었습니다! 저도 인생에서 가장 중요한 것은 재미라고 생각합니다! 살면서 계속 할 일인데 재미가 없으면 지속 가능성도…

안녕하세요 정우님 글 잘읽었습니다 허허 오랜 기간은 아니지만 옆에서 정우님을 계속 지켜보면서 느낀점은 정우님은 계속 해내는 사람처럼 보였습니다. 아직 진짜…

안녕하세요 재윤님 답글 감사합니다. 제 옆자리 부사수로 저와 성준님의 기초교육을 잘 따라와주고 있는 것 같아서 대견합니다. 재윤님의 회고록에 스스로에 대한…

안녕하세요 정우님 답글 감사합니다. 제가 날카로운 질문을 날리는지 저도 처음 알았네요 ㅋㅋㅋ 위에 예은님한테 언급했지만 정우님도 이번 URP 보조멘토로서 지내다…

[arXiv] Unifying Deep Local and Global Features for Image Search

Author: 한 대찬

1 thought on “[arXiv] Unifying Deep Local and Global Features for Image Search”

Leave a Reply to Jo-won Cancel reply

Conference Deadline

NEW POST

New Comment