[CVPR 2016] Structure-from-Motion Revisited

Git : https://colmap.github.io/ [COLMAP]

이번 리뷰는 이전부터 공부를 하고 싶은 Structure-from-Motion(이하 SfM) 관련 논문을 담기로 결정했습니다.
SfM 관련 논문 중 해당 논문으로 결정하게된 계기는 자신의 SfM 알고리즘을 오픈 소스를 통해 공개를 하였고, 1.8K의 star를 받은 만큼 인정 받았다고 생각한 것에 있습니다.

Structure-from-Motion?

아래의 동영상 Video 1.은 SfM이란 방법론을 처음 들었을 동료 연구원들을 위해 공유한 영상이며, 해당 영상으로 SfM이 어떤 방식으로 동작하는지 얕게나마 알았으면 합니다. 영상을 볼 시간이 없는 동료들을 위해 SfM을 간단하게 설명하자면 2D 영상간 관계를 이용한 3D 공간상 오토 스티칭으로 이해하시면 됩니다.

Video 1. Structure-from-Motion(SfM) Pipeline,

Incremental perspective structure from motion

해당 논문의 방법론은 여러 SfM 방법론 중 Incremental SfM에 해당합니다.
해당 세션에서는 Incremental SfM의 간략한 파이프 라인을 소개합니다. 위의

Figure 1. Incremental Structure-from-Motion pipeline

Compute features
Match images
Reconstruct
1. Solve for pose and 3D points in two camers
2. Solve for pose of additional camera(s) that observe reconstructed 3D points
3. Solve for new 3D points that are viewed in at least two cameras
4. Bundle adjust to minimize reprojection error
3-2 ~ 3-4 과정을 반복

Correspondence Search
– Feature Extraction. e.g SIFT, ORB, SURF…
– Matching. match feature -> potentially overlap image pairs
– Geometric Verification( RANSAC을 통해 inlier를 추정). -> using projective geometry -> potentially overlap image pair -> homography(평면 공간상 카메라의 움직임/회전 변환) -> Epipolar geometry(essential matrix E, 3D 공간상 카메라의 움직임/회전 변환) => scene graph(Fig 2) # 호모그래피 구하고 RANSAC인가…?

Figure 2. Scene graph(tracks graph), 추정된 3D point와 영상 사이의 이분 그래프

Incremental Reconstruction
– Initialization. 중심이 될 two-view를 선정. 밀집된 곳의 view 선정은 에러를 줄이고 BA의 runtime을 줄이기 때문에 신중하게 선택되도록 해야함.
– Image Registration. selected two-view reconstruction(3D point) -> new image(2D) -> PnP [2](2D-3D correspondences) -> Pose estimation. (outliner – RANSAC)
– Triangulation. 추가된 영상은 기존 만들어진 scene point와 같은 곳을 바라봐야 합니다. 삼각 측량을 통해 새로운 3D point를 만듭니다. # 추가 공부 필요
– Bundle Adjustment(이하 BA)[3]. Image registration과 Triangulation의 관계성이 높더라도 둘은 분리된 생산자로 reprojection error E가 발생합니다. BA는 Levenberg-Marquardt(LM)을 통해 E를 최소화합니다.

Contributions

Figute 3. Scores for different number of points (left and right) with different distributions (top and bottom) in the image for L = 3.

COLMAP

[1] Hartley, Richard, and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.
[2] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM, 1981.
[3] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. Fitzgibbon. Bundle adjustment a modern synthesis. 2000.

Leave a Reply Cancel reply

안녕하세요 인하님, 좋은 리뷰 감사합니다. 쉽게 설명해주셔서 덕분에 공부가 많이 되었습니다. positional encoding 부분에서 궁금한 점이 있는데요, 하필 sinusoid 형태의…

질문 감사합니다. φ_db와 φ_llm을 곱하는 게 아니 더하는 등의 다양한 조합에 대해서는 논문에 따로 언급하고있지 않습니다. (Supplementary Material에도 따로 없네요)…

안녕하세요 우진님 댓글 감사합니다. 리뷰에서 말씀드렸다 싶이 예를들어 어떤 샘플이 현재 이미지 + 언어 프롬프트만 있고 2D 포즈/goal image가 없다면,…

좋은 질문 감사합니다. q–v를 각각 평가하거나 q와 여러 비디오를 한 번에 비교하는 방식은 계산적으로는 효율적이겠지만, LLM이 각 비디오를 절대적인 기준으로…

좋은 질문 감사합니다. X-CoT를 단순히 백본 모델의 오답을 고치는 '교정기' 라기보다는, 임베딩 유사도만으로는 잘 드러나지 않는 차이를 비교해 주는 보완…

Author: 김 태주

Leave a Reply Cancel reply

Conference Deadline

NEW POST

New Comment