Paper Reading: Stereo DSO

开篇第一篇就写一个paper reading吧,用markdown+vim写东西切换中英文挺麻烦的,有些就偷懒都用英文写了。

Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras

Abstract

Optimization objectives:

  1. intrinsic/extrinsic parameters of all keyframes
  2. all selected pixels‘ depth

Integrate constraints from static stereo (左右两个相机的立体视觉约束是静态的) into the bundle adjustment pipeline of temporal multi-view stereo.
Fixed-baseline stereo resolves scale drift.

? It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods.

1. Introduction

stem from: working in an effective way
heuristically: 启发式的
hallucinate: 出现幻觉
strip down: reduced to its simplest form

Strasdat et al. proposed to expand the concept of keyframes to integrate scale and proposed a double window optimization (Figure out what is it)

Direct methods aim at computing geometry and motion directly from the images thereby skipping the intermediate keypoint selection step.

The key idea of LSD SLAM is to incrementally track the camera and simultaneously perform a pose graph optimization in order to kepp the entire camera trajectory globally consistent. 作者认为这种方式没有减少累计误差,只是把它扩散到整个轨迹中( So the meaning of pose graph is? )。

Three drawbacks of DSO:

  1. The mentioned performance was gained on a photometrically calibrated dataset, in its absense, the performance would degrade.
  2. Scale drift
  3. DSO is quite sensitive to geometric distortion as those induces by fast motion and rolling shutter. While techniques for calibrating rolling shutter exist for direct SLAM algorithm, these are often quite involved and far from real-time capable.

Contribution:

  1. A stereo version of DSO. detail the proposed combination of temporal multi-view stereo and static stereo.
  2. Stereo DSO is good.

2. Direct Sparse VO with Stereo Camera

  • Absolute scale can be directly calculated from static stereo from the known baseline of the stereo camera
  • Static stereo can provide initial depth estimation for multi-view stereo
  • Static Stereo can only accurately triangulate 3D points within a limited depth range while this limit is resolved by temporal multi-view stereo.

New stereo frames are first tracked with respect to their reference keyframe in a coarse-to-fine mannar.

A joint optimization of their poses, affine brightness (两个参数:a和b) parameters, as well as the depts of all the observed 3D points and camera intrinsics, is performed.

2.1 Notation

Nothing important.

2.2 Direct Image Alignment Formulation

E_{ij}=\sum_{p\in P_i}\omega_p \left\| I_j[p‘] - I_i[p] \right\|_\gammaEij =p∈Pi ∑ ωp ∥Ij [p′]−Ii [p]∥γ

where \omega_pωp  is the weight which is shown as follows.(梯度越大权重越小,不知道为啥)
\omega_p = \frac{c^2}{c^2+\left\| \nabla I_i(p) \right\| ^2_2}ωp =c2+∥∇Ii (p)∥22 c2

光度误差对突然的光照变化非常敏感。

2.3 Tracking

All the potins inside the active window are projected into the new frame. Then the pose of the new frame is optimized by minimizing the energy function.
在之前的单目DSO中,用随机深度值来初始化,所以都会需要一个确定模式的移动来初始化。在本文中,因为这时候stereo image pair的affine brightness transfor factor是位置的,所以用NCC在水平极限上的3*5的领域中搜索。

2.4 Frame Management

The basic idea is to check if the scene or the illumination has sufficiently changed.

  • scene change: 用mean square optical flow和 mean squared optical flow without rotation between the current frame and the last keyframe来衡量。
  • illumination change: 用relative brightness factor |a_j - a_i|∣aj −ai ∣ 来衡量。

-> 一个点如果是梯度大于一个阈值并且是一个block里最大的点,那么他会被选择。

-> Before a candidate point is activated and optimized in the windowed optimization, its inverse depth is constantly refined by the following non-keyframes. (找出来怎么做的)

-> 旧去新来:在边缘化点的时候把候选点加入到联合优化中。

-> The constraints from static stereo introduce scale information into the system, and they also provide good geometric priors to temporal multi-view stereo.

2.5 Windowed Optimization

-> Temporal Multi-View Stereo: 就一般的不同时刻的图片之间的立体视觉
-> Static Stereo:
-> Stereo Coupling: 为了平衡上两种约束的权重,我们引入了\lambdaλ参数。
-> Margninalization: 在边缘化一个关键帧之前,我们首先会边缘化所有没有被过去两个关键帧看到所有active window中的点。

3. Evaluation

暂且略过不表

4. Conclusion

未来可以做的两件事:

  • Loop closuring and a database for map maintenance (感觉半闲居士已经做过了啊)
  • Dynamic object handling to further boost the VO accuarcy and robustness. (加深度感知的动态物体检测然后滤出么?)

虽然自己在SLAM领域还有很多可以学习的,但是这样感觉直接法的东西也做完了?悲伤。。

原文地址:https://www.cnblogs.com/tweed/p/10432217.html

时间: 2024-10-11 02:59:42

Paper Reading: Stereo DSO的相关文章

Paper Reading: Perceptual Generative Adversarial Networks for Small Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection 2017-07-11  19:47:46   CVPR 2017 This paper use GAN to handle the issue of small object detection which is a very hard problem in general object detection. As shown in the followin

【Paper Reading】Object Recognition from Scale-Invariant Features

Paper: Object Recognition from Scale-Invariant Features Sorce: http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf SIFT 即Scale Invariant Feature Transfrom, 尺度不变变换,由David Lowe提出.是CV最著名也最常用的特征.在图像目标识别的应用中,常常要求图像的特征有很好的roboust即不容易受到平移,旋转,尺度缩放,光照,仿射的英雄.SIFT算子具有

Paper Reading: Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual TrackingECCV 2016  The key point of KCF is the ability to efficiently exploit available negative data by including all shifted versions of a training sample, in anthor w

CVPR 2016 paper reading (6)

1. Neuroaesthetics in fashion: modeling the perception of fashionability, Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, Raquel Urtasun, in CVPR 2015. Goal: learn and predict how fashionable a person looks on a photograph, and suggest subtle

paper reading in 1/1/2016~1/3/2016

CVPR15:Person Count Localization in Videos from Noisy Foreground and Detections paper主要的contribution是定义了person count localization及其周边,不过虽然之前提过的person count问题常用结果评估标准是只看最后给出的counts,但其实之前的文章也并不完全是只给出global counts的.文中可能是更加重视这个localization的问题并且确实是利用这个信息去

Paper Reading: In Defense of the Triplet Loss for Person Re-Identification

In Defense of the Triplet Loss for Person Re-Identification  2017-07-02  14:04:20   This blog comes from: http://blog.csdn.net/shuzfan/article/details/70069822 Paper:  https://arxiv.org/abs/1703.07737 Github: https://github.com/VisualComputingInstitu

【Paper Reading】Learning while Reading

Learning while Reading 不限于具体的书,只限于知识的宽度 这个系列集合了一周所学所看的精华,它们往往来自不只一本书 我们之所以将自然界分类,组织成各种概念,并按其分类,主要是因为我们是整个口语交流社会共同遵守的协定的参与者,这个协定以语言的形式固定下来.除非赞成这个协定中规定的有关语言信息的组织和分类,否则我们根本无法交谈. --Benjamin Lee Whorf Learning and Asking 为什么选择面向对象? 机器语言.汇编语言.面向过程的语言,通过一层层

Paper Reading - Attention Is All You Need ( NIPS 2017 )

Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of Recurrent Models precludes parallelization within training examples. Attention mechanisms have become an integral part of compelling sequence modeling

Paper Reading

最近(以及预感接下来的一年)会读很多很多的paper......不如开个帖子记录一下读paper心得 Database A. Pavlo et al., Self-Driving Database Engineering, in Unpublished Manuscript, 2019 写到这里啦:Self-Driving Database Distributed System DLion: Decentralized Distributed Deep Learning in Micro-Clo