ICRA2022 SLAM相关论文整理

时间：2023-07-21 08:37:00 lb膜视觉传感器基础传感器dgs66

视觉SLAM相关

线、面、3Dfeature

EDPLVO: Efficient Direct Point-Line Visual Odometry

周博paper，使用高效的线特性，DSO-based

This paper introduces an efficient direct visual odometry (VO) algorithm using points and lines. Pixels on lines are generally adopted in direct methods. However, the original photometric error is only defined for points. It seems difficult to extend it to lines. In previous works, the collinear constraints for points on lines are either ignored [1] or introduce heavy computational load into the resulting optimization system [2]. This paper extends the photometric error for lines. We prove that the 3D points of the points on a 2D line are determined by the inverse depths of the endpoints of the 2D line, and derive a closed-form solution for this problem. This property can significantly reduce the number of variables to speed up the optimization, and can make the collinear constraint exactly satisfied. Furthermore, we introduce a two-step method to further accelerate the optimization, and prove the convergence of this method. The experimental results show that our algorithm outperforms the state-of-the-art direct VO algorithms.

UV-SLAM: Unconstrained Line-Based SLAM Using Vanishing Points for Structural Mapping

优化方法，点线约束灭影点约束，代码开源

In feature-based simultaneous localization and mapping (SLAM), line features complement the sparsity of point features, making it possible to map the surrounding environment structure. Existing approaches utilizing line features have primarily employed a measurement model that uses line re-projection. However, the direction vectors used in the 3D line mapping process cannot be corrected because the line measurement model employs only the lines’ normal vectors in the Plucker coordinate. As a result, problems like degeneracy that occur during the 3D line mapping process cannot be solved. To tackle the problem, this paper presents a UV-SLAM, which is an unconstrained line-based SLAM using vanishing points for structural mapping. This paper focuses on using structural regularities without any constraints, such as the Manhattan world assumption. For this, we use the vanishing points that can be obtained from the line features. The difference between the vanishing point observation calculated through line features in the image and the vanishing point estimation calculated through the direction vector is defined as a residual and added to the cost function of optimization-based SLAM. Furthermore, through Fisher information matrix rank analysis, we prove that vanishing point measurements guarantee a unique mapping solution. Finally, we demonstrate that the localization accuracy and mapping quality are improved compared to the state-of-the-art algorithms using public datasets.

MSC-VO: Exploiting Manhattan and Structural Constraints for Visual Odometry

曼哈顿世界线说

Visual odometry algorithms tend to degrade when facing low-textured scenes from e.g. human-made environments, where it is often difficult to find a sufficient number of point features. Alternative geometrical visual cues, such as lines, which can often be found within these scenarios, can become particularly useful. Moreover, these scenarios typically present structural regularities, such as parallelism or orthogonality, and hold the Manhattan World assumption. Under these premises, in this work, we introduce MSC-VO, an RGB-D -based visual odometry approach that combines both point and line features and leverages, if exist, those structural regularities and the Manhattan axes of the scene. Within our approach, these structural constraints are initially used to estimate accurately the 3D position of the extracted lines. These constraints are also combined next with the estimated Manhattan axes and the reprojection errors of points and lines to refine the camera pose by means of local map optimization. Such a combination enables our approach to operate even in the absence of the aforementioned constraints, allowing the method to work for a wider variety of scenarios. Furthermore, we propose a novel multi-view Manhattan axes estimation procedure that mainly relies on line features. MSC-VO is assessed using everal public datasets, outperforming other state-of-the-art solutions, and comparing favourably even with some SLAM methods.

Integrating Point and Line Features for Visual-Inertial Initialization

点线特征用于初始化

Accurate and robust initialization is crucial in visual-inertial system, which significantly affects the localization accuracy. Most of the existing feature-based initialization methods rely on point features to estimate initial parameters. However, the performance of these methods often decreases in real scene, as point features are unstable and may be discontinuously observed especially in low textured environments. By contrast, line features, providing richer geometrical information than points, are also very common in man-made buildings. Thereby, in this paper, we propose a novel visual-inertial initialization method integrating both point and line features. Specifically, a closed-form method of line features is presented for initialization, which is combined with point-based method to build an integrated linear system. Parameters including initial velocity, gravity, point depth and line’s endpoints depth can be jointly solved out. Furthermore, to refine these parameters, a global optimization method is proposed, which consists of two novel nonlinear least squares problems for respective points and lines. Both gravity magnitude and gyroscope bias are considered in refinement. Extensive experimental results on both simulated and public datasets show that integrating point and line features in initialization stage can achieve higher accuracy and better robustness compared with pure point-based methods.

Unified Representation of Geometric Primitives for Graph-SLAM Optimization Using Decomposed Quadrics

3D物体的统一表示，可以用一个表示方式表示点线面，3D物体，方便优化In Simultaneous Localization And Mapping (SLAM) problems, high-level landmarks have the potential to build compact and informative maps compared to traditional point-based landmarks. In this work, we focus on the parameterization of frequently used geometric primitives including points, lines, planes, ellipsoids, cylinders, and cones. We first present a unified representation based on quadrics, leading to a consistent and concise formulation. Then we further study a decomposed model of quadrics that discloses the symmetric and degenerated properties of a primitive. Based on the decomposition, we develop geometrically meaningful quadrics factors in the settings of a graph-SLAM problem. Then in simulation experiments, it is shown that the decomposed formulation has better efficiency and robustness to observation noises than baseline parameterizations. Finally, in real-world experiments, the proposed back-end framework is demonstrated to be capable of building compact and regularized maps.

多相机系统

Globally Optimal Relative Pose Estimation for Multi-Camera Systems with Known Gravity Direction

多相机系统位姿估计

Multiple-camera systems have been widely used in self-driving cars, robots, and smartphones. In addition, they are typically also equipped with IMUs (inertial measurement units). Using the gravity direction extracted from the IMU data, the y-axis of the body frame of the multi-camera system can be aligned with this common direction, reducing the original three degree-of-freedom(DOF) relative rotation to a single DOF one. This paper presents a novel globally optimal solver to compute the relative pose of a generalized camera. Existing optimal solvers based on LM (Levenberg-Marquardt) method or SDP (semidefinite program) are either iterative or have high computational complexity. Our proposed optimal solver is based on minimizing the algebraic residual objective function. According to our derivation, using the least-squares algorithm, the original optimization problem can be converted into a system of two polynomials with only two variables. The proposed solvers have been tested on synthetic data and the KITTI benchmark. The experimental results show that the proposed methods have competitive robustness and accuracy compared with the existing state-of-the-art solvers.

Information-Theoretic Online Multi-Camera Extrinsic Calibration

多相机在线外参标定，keyframe提取方法很特别

Calibration of multi-camera systems is essential for lifelong use of vision-based headsets and autonomous robots. In this work, we present an information-based framework for online extrinsic calibration of multi-camera systems. While previous work largely focuses on monocular, stereo, or strictly non-overlapping field-of-view (FoV) setups, we allow arbitrary con-figurations while also exploiting overlapping pairwise FoV when possible. In order to efficiently solve for the extrinsic calibration parameters, which increase linearly with the number of cameras, we propose a novel entropy-based keyframe measure and bound the backend optimization complexity by selecting informative motion segments that minimize the maximum entropy across all extrinsic parameter partitions. We validate the pipeline on three distinct platforms to demonstrate the generality of the method for resolving the extrinsics and performing downstream tasks. Our code is available at https://github.com/edexheim/info_ext_calib.

Balancing the Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry

多相机视觉惯性里程计的前端算法

We present a multi-camera visual-inertial odometry system based on factor graph optimization which estimates motion by using all cameras simultaneously while retaining a fixed overall feature budget. We focus on motion tracking in challenging environments, such as narrow corridors, dark spaces with aggressive motions, and abrupt lighting changes. These scenarios cause traditional monocular or stereo odometry to fail. While tracking motion with extra cameras should theoretically prevent failures, it leads to additional complexity and computational burden. To overcome these challenges, we introduce two novel methods to improve multi-camera feature tracking. First, instead of tracking features separately in each camera, we track features continuously as they move from one camera to another. This increases accuracy and achieves a more compact factor graph representation. Second, we select a fixed budget of tracked features across the cameras to reduce back-end optimization time. Our proposed method was extensively tested using a hardware-synchronized device consisting of an IMU and four cameras (a front stereo pair and two lateral) in scenarios including: an underground mine, large open spaces, and building interiors with narrow stairs and corridors. Compared to stereo-only state-of-the-art visual-inertial odometry methods, our approach reduces the drift rate, relative pose error, by up to 80% in translation and 39% in rotation.

VIO+GPS

Tightly-Coupled GNSS-Aided Visual-Inertial Localization

VIO紧耦合融合GPS（黄老师实验室论文）

A navigation system which can output drift-free global trajectory estimation with local consistency holds great potential for autonomous vehicles and mobile devices. We propose a tightly-coupled GNSS-aided visual-inertial navigation system (GAINS) which is able to leverage the complementary sensing modality from a visual-inertial sensing pair, which provides high-frequency local information, and a Global Navigation Satellite System (GNSS) receiver with low-frequency global observations. Specifically, the raw GNSS measurements (including pseudorange, carrier phase changes, and Doppler frequency shift) are carefully leveraged and tightly fused within a visual-inertial framework. The proposed GAINS can accurately model the raw measurement uncertainties by canceling the atmospheric effects (e.g., ionospheric and tropospheric delays) which requires no prior model information. A robust state initialization procedure is presented to facilitate the fusion of global GNSS information with local visual-inertial odometry, and the spatiotemporal calibration between IMU-GNSS are also optimized in the estimator. The proposed GAINS is evaluated on extensive Monte-Carlo simulations on a trajectory generated from a large-scale urban driving dataset with specific verification for each component (i.e., online calibration and system initialization). GAINS also demonstrates competitive performance against existing state-of-the-art methods on a publicly available dataset with ground truth.

GVINS: Tightly Coupled GNSS-Visual-Inertial Fusion for Smooth and Consistent State Estimation

融合GPS,港科，VINS-based

Visual-inertial odometry (VIO) is known to suffer from drifting, especially over long-term runs. In this paper, we present GVINS, a non-linear optimization based system that tightly fuses global navigation satellite system (GNSS) raw measurements with visual and inertial information for real-time and drift-free state estimation. Our system aims to provide accurate global 6-DoF estimation under complex indoor-outdoor environments where GNSS signals may be intermittent or even inaccessible. To establish the connection between global measurements and local states, a coarse-to-fine initialization procedure is proposed to efficiently calibrate the transformation online and initialize GNSS states from only a short window of measurements. The GNSS code pseudorange and Doppler shift measurements, along with visual and inertial information, are then modelled and used to constrain the system states in a factor graph framework. For complex and GNSS-unfriendly areas, the degenerate cases are discussed and carefully handled to ensure robustness. Thanks to the tightly coupled multi-sensor approach and system design, our system fully exploits the merits of three types of sensors and is able to seamlessly cope with the transition between indoor and outdoor environments, where satellites are lost and reacquired.

其他

Map-Based Visual-Inertial Localization: A Numerical Study

基于特征点地图的VIO定位方法，EKF和SKF比较，MSCKF-based（黄老师实验室论文）

We revisit the problem of efficiently leveraging prior map information within a visual-inertial estimation framework. The use of traditional landmark-based maps with 2D-to-3D measurements along with the recently introduced keyframe-based maps with 2D-to-2D measurements are investigated. The full joint estimation of the prior map is compared within a visual-inertial simulator to the Schmidt-Kalman filter (SKF) and measurement inflation methods in terms of their computational complexity, consistency, accuracy, and memory usage. This study shows that the SKF can enable efficient and consistent estimation for small workspace scenarios and the use of 2D-to-3D landmark maps have the highest levels of accuracy. Keyframe-based 2D-to-2D maps can reduce the required state size while still enabling accuracy gains. Finally, we show that measurement inflation methods, after tuning, can be accurate and efficient for large-scale environments if the guarantee of consistency is relaxed.

Periodic SLAM: Using Cyclic Constraints to Improve the Performance of Visual-Inertial SLAM on Legged Robots

legged robots图像抖动很大，提出适用于相机剧烈运动情况的VIO算法，利用一些周期性可预测的规律提高定位精度。

Methods for state estimation that rely on visual information are challenging on legged robots due to rapid changes in the viewing angle of onboard cameras. In this work, we show that by leveraging structure in the way that the robot locomotes, the accuracy of visual-inertial SLAM in these challenging scenarios can be increased. We present a method that takes advantage of the underlying periodic predictability often present in the motion of legged robots to improve the performance of the feature tracking module within a visual-inertial SLAM system. Our method performs multi-session SLAM on a single robot, where each session is responsible for mapping during a distinct portion of the robot’s gait cycle. Our method produces lower absolute trajectory error than several state-of-the-art methods for visual-inertial SLAM in both a simulated environment and on data collected on a quadrupedal robot executing dynamic gaits. On real-world bounding gaits, our median trajectory error was less than 35% of the error of the next best estimate provided by state-of-the-art methods.

1D-LRF Aided Visual-Inertial Odometry for High-Altitude MAV Flight

用TOF辅助VIO定位，欢迎阅读

This paper addresses the problem of visual-inertial odometry (VIO) with a downward facing monocular camera when a micro aerial vehicle (MAV) flying at high altitude (over 100 meters). It is important to note that large scene depth causes visual motion constraints significantly less informative than that in near-sighted scenarios as considered in most existing VIO methods. To cope with this challenge, we develop an efficient MSCKF-based VIO algorithm aided by a single 1D laser range finder (LRF), termed LRF-VIO, which runs in real time on an embedded system. The key idea of the proposed LRF-VIO is to fully exploit the limited metric distance information provided by the 1D LRF to disambiguate the scale during visual feature tracking, thus improving the VIO performance at high altitude. Specifically, during the MSCKF visual measurement update, we deliberately constrain the depth of those SLAM features co-planar with the single LRF measuring point. Additionally, delayed initialization of features utilizes the LRF measurements whenever possible, and online extrinsic calibration between the LRF and monocular camera is performed to further improve estimation accuracy and robustness. The proposed LRF-VIO is extensively validated in both indoor and outdoor real-world experiments, outperforming the state-of-the-art methods.

Accurate and Robust Object-Oriented SLAM with 3D Quadric Landmarks in Outdoors

Object-oriented SLAM is a popular technology in autonomous driving and robotics. In this paper, we propose a stereo visual SLAM with a robust quadric landmark representation method. The system consists of four components, including deep learning detection, quadric landmark initialization, object data association and object pose optimization. State-of-the-art quadric-based SLAM algorithms always face observation-related problems and are sensitive to observation noise, which limits their application in outdoor scenes. To solve this problem, we propose a quadric initialization method based on the separation of the quadric parameters method, which improves the robustness to observation noise. The sufficient object data association algorithm and object-oriented optimization with multiple cues enable a highly accurate object pose estimation that is robust to local observations. Experimental results show that the proposed system is more robust to observation noise and significantly outperforms current state-of-the-art methods in outdoor environments. In addition, the proposed system demonstrates real-time performance.

FEJ2: A Consistent Visual-Inertial State Estimator Design

FEJ的改进，比FEJ一致性更好（黄老师实验室论文）

Continuous-Time Spline Visual-Inertial Odometry

连续时间的VIO

We propose a continuous-time spline-based formulation for visual-inertial odometry (VIO). Specifically, we model the poses as a cubic spline, whose temporal derivatives are used to synthesize linear acceleration and angular velocity, which are compared to the measurements from the inertial measurement unit (IMU) for optimal state estimation. The spline boundary conditions create constraints between the camera and the IMU, with which we formulate VIO as a constrained nonlinear optimization problem. Continuous-time pose representation makes it possible to address many VIO challenges, e.g., rolling shutter distortion and sensors that may lack synchronization. We conduct experiments on two publicly available datasets that demonstrate the state-of-the-art accuracy and real-time computational efficiency of our method.

Continuous-Time vs. Discrete-Time Vision-Based SLAM: A Comparative Study

同上，方法更清晰

Robotic practitioners generally approach the vision-based SLAM problem through discrete-time formulations. This has the advantage of a consolidated theory and very good understanding of success and failure cases. However, discrete-time SLAM needs tailored algorithms and simplifying assumptions when high-rate and/or asynchronous measurements, coming from different sensors, are present in the estimation process. Conversely, continuous-time SLAM, often overlooked by practitioners, does not suffer from these limitations. Indeed, it allows integrating new sensor data asynchronously without adding a new optimization variable for each new measurement. In this way, the integration of asynchronous or continuous high-rate streams of sensor data does not require tailored and highly-engineered algorithms, enabling the fusion of multiple sensor modalities in an intuitive fashion. On the down side, continuous time introduces a prior that could worsen the trajectory estimates in some unfavorable situations. In this work, we aim at systematically comparing the advantages and limitations of the two formulations in vision-based SLAM. To do so, we perform an extensive experimental analysis, varying robot type, speed of motion, and sensor modalities. Our experimental analysis suggests that, independently of the trajectory type, continuous-time SLAM is superior to its discrete counterpart whenever the sensors are not time-synchronized.

HiPE: Hierarchical Initialization for Pose Graphs

Pose graph optimization is a non-convex optimization problem encountered in many areas of robotics perception. Its convergence to an accurate solution is conditioned by two factors: the non-linearity of the cost function in use and the initial configuration of the pose variables. In this paper, we presented HiPE: a novel hierarchical algorithm for pose graph initialization. Our approach exploits a coarse-grained graph that encodes an abstract representation of the problem geometry. We construct this graph by combining maximum likelihood estimates coming from local regions of the input. By leveraging the sparsity of this representation, we can initialize the pose graph in a non-linear fashion, without computational overhead compared to existing methods. The resulting initial guess can effectively bootstrap the fine-grained optimization that is used to obtain the final solution. In addition, we performed an empirical analysis on the impact of different cost functions on the final estimate. Our experimental evaluation shows that the usage of HiPE leads to a more efficient and robust optimization process, comparing favorably with state- of-the-art methods.

Constrained Visual-Inertial Localization with Application and Benchmark in Laparoscopic Surgery

We propose a novel method to tackle the visual-inertial SLAM localization problem for constrained camera movements. We use residuals from the different modalities to jointly optimize a global cost function. The residuals emerge from IMU measurements, stereoscopic feature points, and constraints on possible solutions in $S E (3)$ . In settings where image quality is low and disturbances are frequent, the residuals reduce the complexity of the problem and make localization feasible. We verify the advantages of our method in a suitable medical use case and produce a dataset capturing a minimally invasive surgery in the abdomen. Our novel clinical dataset MITI is comparable to state-of-the-art evaluation datasets contains calibration and synchronization and is available at https://mediatum.ub.tum.de/1621941.

DM-VIO: Delayed Marginalization Visual-Inertial Odometry

VIO 边缘化优化 Delayed Marginalization

We present DM-VIO, a monocular visual-inertial odometry system based on two novel techniques called delayed marginalization and pose graph bundle adjustment. DM-VIO performs photometric bundle adjustment with a dynamic weight for visual residuals. We adopt marginalization, which is a popular strategy to keep the update time constrained, but it cannot easily be reversed, and linearization points of connected variables have to be fixed. To overcome this we propose delayed marginalization: The idea is to maintain a second factor graph, where marginalization is delayed. This allows us to later readvance this delayed graph, yielding an updated marginalization prior with new and consistent linearization points. In addition, delayed marginalization enables us to inject IMU information into already marginalized states. This is the foundation of the proposed pose graph bundle adjustment, which we use for IMU initialization. In contrast to prior works on IMU initialization, it is able to capture the full photometric uncertainty, improving the scale estimation. We evaluate our system on the EuRoC, TUM-VI, and 4Seasons datasets, which comprise flying drone, large-scale handheld, and automotive scenarios. Thanks to the proposed IMU initialization, our system exceeds the state of the art in visual-inertial odometry, even outperforming stereo-inertial methods while using only a single camera and IMU. The code will be published at http://vision.in.tum.de/dm-vio

Leveraging Structural Information to Improve Point Line Visual-Inertial Odometry

点线VIO优化，武汉大学

Leveraging line features to improve the accuracy of the SLAM system has been studied in many works. However, making full use of the characteristics of different line features (parallel, non-parallel) to improve the SLAM system is rarely mentioned. In this paper, we designed a VIO system based on points and straight lines, which divides straight lines into structural (that is, straight lines parallel to each other) and non-structural. To optimize the line features effectively, we used two-parameter representation methods for both structural and non-structural straight lines. Furthermore, we designed a stable line matching method based on frame-to-frame (2d-2d) and map-to-frame (3d-2d) strategies which can significantly improve the trajectory accuracy of the system. We conducted ablation experiments on synthetic data and public datasets, and also compared our method with state-of-the-art algorithms. The experiments verified the combination of different line features can improve the accuracy of the VIO system, and also demonstrated the effectiveness of our system.

Decoupled Right Invariant Error States for Consistent Visual-Inertial Navigation

features放在状态变量里时，IEKF的改进，计算更轻量，且保证一致性（黄老师实验室论文）

The invariant extended Kalman filter (IEKF) is proven to preserve the observability property of visual-inertial navigation systems (VINS) and suitable for consistent estimator design. However, if features are maintained in the state vector, the propagation of IEKF will become more computationally expensive because these features are involved in the covariance propagation. To address this issue, we propose two novel algorithms which preserve the system consistency by leveraging the invariant state representation and ensure efficiency by decoupling features from covariance propagation. The first algorithm combines right invariant error states with first-estimates Jacobian (FEJ) technique, by decoupling the features from the Lie group representation and utilizing FEJ for consistent estimation. The second algorithm is designed specifically for sliding-window filter-based VINS as it associates the features to an active cloned pose for Lie group representation. A new pseudo-anchor change algorithm is also proposed to maintain the features in the state vector longer than the window span. Both decoupled right- and left-invariant error based VINS methods are implemented for a complete comparison. Extensive Monte-Carlo simulations on three simulated trajectories and real world evaluations on the TUM-VI datasets are provided to verify our analysis and demonstrate that the proposed algorithms can achieve improved accuracy than a state-of-art filter-based VINS algorithm using FEJ.

Multirobot Collaborative Monocular SLAM Utilizing Rendezvous

多机器人协同SLAM

This paper proposes a collaborative monocular SLAM including a map fusion algorithm that utilizes rendezvous. Unlike existing rendezvous-based approaches that require additional sensors, the proposed system uses a monocular camera only. Our system can recognize robot rendezvous using non-static features (NSFs) without fiducial markers to identify team robots. NSFs, which are abandoned as outliers in typical SLAM systems for not supporting ego-motion, can include relative bearing measurements between robots in a rendezvous situation. The proposed pipeline consists of 1) a feature identification module that extracts the relative bearing measurements between robots from NSFs consisting of anonymous bearing vectors with false positives, and 2) a map fusion module that integrates local maps from each robot into a global map using identified measurements. The feature identification module can operate quickly using the proposed alternating minimization algorithm formulated by two sub-problems with closed-form solutions. The experimental results confirm that our multi-robot SLAM recognizes rendezvous rapidly and robustly, and fuses local maps into a global map accurately.

PixSelect: Less but Reliable Pixels for Accurate and Efficient Localization

华为，SFM heatmap 特征选择策略

Accurate camera pose estimation is a fundamental requirement for numerous applications, such as autonomous driving, mobile robotics, and augmented reality. In this work, we address the problem of estimating the global 6 DoF camera pose from a single RGB image in a given environment. Previous works consider every part of the image valuable for localization. However, many image regions such as the sky, occlusions, and repetitive non-distinguishable patterns cannot be utilized for localization. In addition to adding unnecessary computation efforts, extracting and matching features from such regions produce many wrong matches which in turn degrades the localization accuracy and efficiency. Our work addresses this particular issue and shows by exploiting an interesting concept of sparse 3D models that we can exploit discriminatory environment parts and avoid useless image regions for the sake of a single image localization. Interestingly, through avoiding selecting keypoints from non-reliable image regions such as trees, bushes, cars, pedestrians, and occlusions, our work acts naturally as an outlier filter. This makes our system highly efficient in that minimal set of correspondences is needed and highly accurate as the number of outliers is low. Our work exceeds state-ofthe-art methods on outdoor Cambridge Landmarks dataset. With only relying on single image at inference, it outweighs in terms of accuracy methods that exploit pose priors and/or reference 3D models while being much faster. By choosing as little as 100 correspondences, it surpasses similar methods that localize from thousands of correspondences, while being more efficient. In particular, it achieves, compared to these methods, an improvement of localization by 33% on OldHospital scene. Furthermore, It outstands direct pose regressors even those that learn from sequence of images. Our work will be publicly available

Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World

learning 方法恢复VO尺度

Monocular visual odometry (VO) has attracted extensive research attention by providing real-time vehicle motion from cost-effective camera images. However, state-of-the-art optimization-based monocular VO methods suffer from the scale inconsistency problem for long-term predictions. Deep learning has recently been introduced to address this issue by leveraging stereo sequences or ground-truth motions in the training dataset. However, it comes at an additional cost for data collection, and such training data may not be available in all datasets. In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases. Specifically, we first train a scale-aware disparity network using both monocular real images and stereo virtual data. The virtual-to-real domain gap is bridged by using an adversarial training strategy to map images from both domains into a shared feature space. The resulting scale-consistent disparities are then integrated with a direct VO system by constructing a virtual stereo objective that ensures the scale consistency over long trajectories. Additionally, to address the suboptimality issue caused by the separate optimization backend and the learning process, we further propose a mutual reinforcement pipeline that allows bidirectional information flow between learning and optimization, which boosts the robustness and accuracy of each other. We demonstrate the effectiveness of our framework on the KITTI and vKITTI2 datasets.

Tightly-Coupled Magneto-Visual-Inertial Fusion for Long Term Localization in Indoor Environment

We propose in this paper a tightly-coupled fusion of visual, inertial and magnetic data for long-term localization in indoor environment. Unlike state-of-the-art Visual-Inertial SLAM (VISLAM) solutions that reuse visual map to prevent drift, we present in this paper an extension of the Multi-State Constraint Kalman Filter (MSCKF) that exploits a magnetic map. It makes our solution more robust to variations of the environment appearance. The experimental results demonstrate that the localization accuracy of the proposed approach is almost the same over time periods longer than a year.

VIP-SLAM: An Efficient Tightly-Coupled RGB-D Visual Inertial Planar SLAM

RGBD用平面约束矫正累计漂移In this paper, we propose a tightly-coupled SLAM system fused with RGB, Depth, IMU and structured plane information. Traditional sparse points based SLAM systems always maintain a mass of map points to model the environment. Huge number of map points bring us a high computational complexity, making it difficult to be deployed on mobile devices. On the other hand, planes are common structures in man-made environment especially in indoor environments. We usually can use a small number of planes to represent a large scene. So the main purpose of this article is to decrease the high complexity of sparse points based SLAM. We build a lightweight back-end map which consists of a few planes and map points to achieve efficient Bundle Adjustment (BA) with an equal or better accuracy. We use homography constraints to eliminate the parameters of numerous plane points in the optimization and reduce the complexity of BA. We separate the parameters and measurements in homography and point-to-plane constraints and compress the measurements part to further effectively improve the speed of BA. We also integrate the plane information into the whole system to realize robust planar feature extraction, data association, and global consistent planar reconstruction. Finally, we perform an ablation study and compare our method with similar methods in simulation and real environment data. Our system achieves obvious advantages in accuracy and efficiency. Even if the plane parameters are involved in the optimization, we effectively simplify the back-end map by using plane structures. The global bundle adjustment is nearly 2 times faster than the sparse points based SLAM algorithm.

Electric Sense Based Pose Estimation and Localization for Small Underwater Robots

Accurate pose estimation and localization technology is always a challenge for small underwater robots, since the under-water lighting conditions could limit the use of cameras while the cramped environments restrict the use of sonars. In nature, some fishes perceive other creatures by sensing the weakly changes in their environmental electric field. Inspired by such passive electric sense behavior in fish, this letter presents an electro-localization scheme based on passive electric sense for short-distance accurate pose estimation and localization of small underwater robots. Our scheme includes a hardware solution to the electric sense and a pose estimation method. Specifically, first, we design a hardware solution including an electric emitter placed in the underwater environment and an electric receiver that can be carried by a small underwater robot. Then we construct the theoretical model of the electric field generated by our designed hardware. Finally, we propose an electric sense based method to estimate the position and orientation of the free-swimming robot, where a particle filter is utilized to combine the odometer and electric sense based measurements while the dynamic model of the robot is also included. Localization experiments for the small underwater robot equipped with the electric receiver are conducted, and the experimental results demonstrate the robustness and effectiveness of our proposed electric sense based pose estimation approach, especially in

Improved State Propagation through AI-Based Pre-Processing and Down-Sampling of High-Speed Inertial Data

We present a novel approach to improve 6 degree-of-freedom state propagation for unmanned aerial vehicles in a classical filter through pre-processing of high-speed inertial data with AI algorithms. We evaluate both an LSTM-based approach as well as a Transformer encoder architecture. Both algorithms take as input short sequences of fixed length N of high-rate inertial data provided by an inertial measurement unit (IMU) and are trained to predict in turn one pre-processed IMU sample that minimizes the state propagation error of a classical filter across M sequences. This setup allows us to provide sufficient temporal history to the networks for good performance while maintaining a high propagation rate of pre-processed IMU samples important for later deployment on real-world systems. In addition, our network architectures are formulated to directly accept input data at variable rates thus minimizing necessary data preprocessing. The results indicate that the LSTM based architecture outperforms the Transformer encoder architecture and significantly improves the propagation error even for long IMU propagation times.

Improved State Propagation through AI-Based Pre-Processing and Down-Sampling of High-Speed Inertial Data

Joint Localization Based on Split Covariance Intersection on the Lie Group

This paper presents a pose fusion method that accounts for the possible correlations among measurements. The proposed method can handle data fusion problems whose uncertainty has both independent and dependent parts. Different from the existing methods, the uncertainties of the various states or measurements are modeled on the Lie algebra and projected to the manifold through the exponential map, which is more precise than that modeled in the vector space. The correlation is based on the theory of covariance intersection, where the independent and dependent parts are split to yield a more consistent result. In this paper, we provide a novel method for the correlated pose fusion algorithm on the manifold. Theoretical derivation and analysis are detailed first, and then, the experimental results are presented to support the proposed theory. The main contributions are threefold: (1) We provide a theoretical foundation for the split covariance intersection filter performed on the manifold, where the uncertainty is associated with the Lie algebra. (2) The proposed method gives an explicit fusion formalism on SE(3) and SE(2), which covers the most use cases in the field of robotics. (3) We present a localization framework that can work for both single-robot and multirobot systems, where not only the fusion with possible correlation is derived on the manifold but also the state evolution and relative pose computation are performed on the manifold. The experimental results va

数据集

HD Ground - a Database for Ground Texture Based Localization

开源地面纹理数据集

We present the HD Ground Database, a comprehensive database for ground texture based localization. It contains sequences of a variety of textures, obtained using a downward facing camera. In contrast to existing databases of ground images, the HD Ground Database is larger, has a greater variety of textures, and has a higher image resolution with less motion blur. Also, our database enables the first systematic study of how natural changes of the ground that occur over time affect localization performance, and it allows to examine a teach-and-repeat navigation scenario. We use the HD Ground Database to evaluate four state-of-the-art localization approaches for global localization, localization with the approximate pose being known, and relative localization.

The Visual-Inertial-Dynamical Multirotor Dataset

人工加了强干扰的无人机数据，抖动很严重Recently, the community has witnessed numerous datasets built for developing and testing state estimators. However, for some applications such as aerial transportation or search-and-rescue, the contact force or other disturbance must be perceived for robust planning and control, which is beyond the capacity of these datasets. This paper introduces a Visual-Inertial-Dynamical (VID) dataset, not only focusing on traditional six degrees of freedom (6-DOF) pose estimation but also providing dynamical characteristics of the flight platform for external force perception or dynamics-aided estimation. The VID dataset contains hardware synchronized imagery and inertial measurements, with accurate ground truth trajectories for evaluating common visual-inertial estimators. Moreover, the proposed dataset highlights rotor speed and motor current measurements, control inputs, and ground truth 6-axis force data to evaluate external force estimation. To the best of our knowledge, the proposed VID dataset is the first public dataset containing visual-inertial and complete dynamical information in the real world for pose and external force evaluation.

M2DGR: A Multi-Sensor and Multi-Scenario SLAM Dataset for Ground Robots

无人车数据

We introduce M2DGR: a novel large-scale dataset collected by a ground robot with a full sensor-suite including six fish-eye and one sky-pointing RGB cameras, an infrared camera, an event camera, a Visual-Inertial Sensor (VI-sensor), an inertial measurement unit (IMU), a LiDAR, a consumer-grade Global Navigation Satellite System (GNSS) receiver and a GNSS-IMU navigation system with real-time kinematic (RTK) signals. All those sensors were well-calibrated and synchronized, and their data were recorded simultaneously. The ground truth trajectories were obtained by the motion capture device, a laser 3D tracker, and an RTK receiver. The dataset comprises 36 sequences (about 1TB) captured in diverse scenarios including both indoor and outdoor environments. We evaluate state-of-the-art SLAM algorithms on M2DGR. Results show that existing solutions perform poorly in some scenarios. For the benefit of the research community, we make the dataset and tools public. The webpage of our project is https://github.com/SJTU-ViSYS/M2DGR.

三维重建

Performance Guarantees for Spectral Initialization in Rotation Averaging and Pose-Graph SLAM

rotation average 优化算法，用于三维重建

In this work we present the first initialization methods equipped with explicit performance guarantees that are adapted to the pose-graph simultaneous localization and mapping (SLAM) and rotation averaging (RA) problems. SLAM and rotation averaging are typically formalized as large-scale nonconvex point estimation problems, with many bad local minima that can entrap the smooth optimization methods typically applied to solve them; the performance of standard SLAM and RA algorithms thus crucially depends upon the quality of the estimates used to initialize this local search. While many initialization methods for SLAM and RA have appeared in the literature, these are typically obtained as purely heuristic approximations, making it difficult to determine whether (or under what circumstances) these techniques can be reliably deployed. In contrast, in this work we study the problem of initialization through the lens of spectral relaxation. Specifically, we derive a simple spectral relaxation of SLAM and RA, the form of which enables us to exploit classical linear-algebraic techniques (eigenvector perturbation bounds) to control the distance from our spectral estimate to both the (unknown) ground-truth and the global minimizer of the estimation problem as a function of measurement noise. Our results reveal the critical role that spectral graph-theoretic properties of the measurement network play in controlling estimation accuracy; moreover, as a by-product of our analysis we obtain new bounds on the estimation error for the maximum likelihood estimators in SLAM and RA, which are likely to be of independent interest. Finally, we show experimentally that our spectral estimator is very effective in practice, producing initializations of comparable or superior quality at lower computational cost compared to existing state-of-the-art techniques.

Incremental Abstraction in Distributed Probabilistic SLAM Graphs

三维重建之平面重建

Scene graphs represent the key components of a scene in a compact and semantically rich way, but are difficult to build during incremental SLAM operation because of the challenges of robustly identifying abstract scene elements and optimising continually changing, complex graphs. We present a distributed, graph-based SLAM framework for incrementally building scene graphs based on two novel components. First, we propose an incremental abstraction framework in which a neural network proposes abstract scene elements that are incorporated into the factor graph of a feature-based monocular SLAM system. Scene elements are confirmed or rejected through optimisation and incrementally replace the points yielding a more dense, semantic and compact representation. Second, enabled by our novel routing procedure, we use Gaussian Belief Propagation (GBP) for distributed inference on a graph processor. The time per iteration of GBP is structure agnostic and we demonstrate the speed advantages over direct methods for inference of heterogeneous factor graphs. We run our system on real indoor datasets using planar abstractions and recover the major planes with significant compression.

Crossview Mapping with Graph-Based Geolocalization on City-Scale Street Maps

浙大zhangguofeng老师实验室

3D environment mapping has been actively studied recently with the development of autonomous driving and augmented reality. Although many image-based methods are proposed due to their convenience and flexibility compared to other complex sensors, few works focus on fixing the inherent scale ambiguity of image-based methods and registering the reconstructed structure to the real-world 3D map, which is very important for autonomous driving. This paper presents a low-cost mapping solution that is able to refine and align the monocular reconstructed point cloud given a public street map. Specifically, we first find the association between the street map and the reconstructed point cloud structure by a novel graph-based geolocalization method. Then, optimized with the corresponding relationship, the map accuracy is significantly improved. The rich environment information can also be associated with the point cloud by the geographical location. Experiments show that our geolocalization algorithm can locate the scene on a gigantic city-scale map (173.46 km2 ) in two minutes and support 3D map reconstruction with absolute scale and rich environmental information from Internet videos.

DynamicFilter: An Online Dynamic Objects Removal Framework for Highly Dynamic Environments

Emergence of massive dynamic objects will diversify spatial structures when robots navigate in urban environments. Therefore, the online removal of dynamic objects is critical. In this paper, we introduce a novel online removal framework for highly dynamic urban environments. The framework consists of the scan-to-map front-end and the map-to-map back-end modules. Both the front- and back-ends deeply integrate the visibility-based approach and map-based approach. The experiments validate the framework in highly dynamic simulation scenarios and real-world dataset.

Memory-Efficient Gaussian Fitting for Depth Images in Real Time

Computing consumes a significant portion of energy in many robotics applications, especially the ones involving energy-constrained robots. In addition, memory access accounts for a signficant portion of the computing energy. For mapping a 3D environment, prior approaches reduce the map size while incurring a large memory overhead used for storing sensor measurements and temporary variables during computation. In this work, we present a memory-efficient algorithm, named Single-Pass Gaussian Fitting (SPGF), that accurately constructs a compact Gaussian Mixture Model (GMM) which approximates measurements from a depthmap generated from a depth camera. By incrementally constructing the GMM one pixel at a time in a single pass through the depthmap, SPGF achieves higher throughput and orders-of-magnitude lower memory overhead than prior multi-pass approaches. By processing the depthmap row-by-row, SPGF exploits intrinsic properties of the camera to efficiently and accurately infer surface geometries, which leads to higher precision than prior approaches while maintaining the same compactness of the GMM. Using a low-power ARM Cortex-A57 CPU on the NVIDIA Jetson TX2 platform, SPGF operates at 32fps, requires 43KB of memory overhead, and consumes only 0.11J per frame (depthmap). Thus, SPGF enables real-time mapping of large 3D environments on energy-constrained rob

语义SLAM

Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. It is a meaningful task inspiring a surge of relevant research. However, most of the existing models are only effective for single-agent navigation, and a single agent has low efficiency and poor fault tolerance when conducting more complicated tasks. Multi-agent collaboration can improve the efficiency and has strong application potentials. In this paper, we propose the multi-agent visual semantic navigation, in which multiple agents collaborate with others to find multiple target objects. It is a challenging task that requires agents to learn reasonable collaboration strategies to perform efficient exploration under the restrictions of communication bandwidth. We develop a hierarchical decision framework based on semantic mapping, scene prior knowledge, and communication mechanism to solve this task. The experimental results in unseen scenes with both seen objects and unseen objects illustrate the higher accuracy and efficiency of the proposed model compared with the single-agent model.

LTSR: Long-Term Semantic Relocalization Based on HD Map for Autonomous Vehicles

华为，基于语义地图的定位，车载

Highly accurate and robust relocalization or localization initialization ability is of great importance for autonomous vehicles (AVs). Traditional GNSS-based methods are not reliable enough in occlusion and multipath conditions. In this paper we propose a novel long-term semantic relocalization algorithm based on HD map and semantic features which are compact in representation. Semantic features appear widely on urban roads, and are robust to illumination, weather, view-point and appearance changes. Repeated structures, missed and false detections make data association (DA) highly ambiguous. To this end, a robust semantic feature matching method based on a new local semantic descriptor which encodes the spatial and normal relationship between semantic features is performed. Further, we introduce an accurate, efficient, yet simple outlier removal method which works by assessing the local and global geometric consistencies and temporal consistency of semantic matching pairs. The experimental results on our urban dataset demonstrate that our approach performs better in accuracy and robustness compared with the current state-of-the-art methods.

DA-LMR: A Robust Lane Marking Representation for Data Association

多段直线的表示和对应的数据关联

While complete localization approaches are widely studied in the literature, their data association and data representation subprocesses usually go unnoticed. However, both are a key part of the final pose estimation. In this work, we present DA-LMR (Delta-Angle Lane Marking Representation), a robust data representation in the context of localization approaches. We propose a representation of lane markings that encodes how a curve changes in each point and includes this information in an additional dimension, thus providing a more detailed geometric structure description of the data. We also propose DC-SAC (Distance-Compatible Sample Consensus), a data association method. This is a heuristic version of RANSAC that dramatically reduces the hypothesis space by distance compatibility restrictions. We compare the presented methods with some state-of-the-art data representation and data association approaches in different noisy scenarios. The DA-LMR and DC-SAC produce the most promising combination among those compared, reaching 98.1% in precision and 99.7% in recall for noisy data with 0.5 m of standard deviation.

SemLoc: Accurate and Robust Visual Localization with Semantic and Structural Constraints from Prior Maps

基于语义地图定位，EM算法

Semantic information and geometrical structures of a prior map can be leveraged in visual localization to bound drift errors and improve accuracy. In this paper, we propose SemLoc, a pure visual localization system, for accurate localization in a prior semantic map. To tightly couple semantic and structure information from prior maps, a hybrid constraint is presented by using the Dirichlet distribution. Then, with the local landmarks and their semantic states tracked in the frontend, the camera poses and data associations are jointly optimized through Expectation-Maximization (EM) algorithm. We validate the effectiveness of our approach in both monocular and stereo modes on the public KITTI dataset. Experimental results demonstrate that our system can greatly reduce drift errors with an satisfying real-time performance. Compared with several state-of-the-art visual localization systems, the proposed framework achieves a competitive localization performance.

Robust Monocular Localization in Sparse HD Maps Leveraging Multi-Task Uncertainty Estimation

learning 方法

Robust localization in dense urban scenarios using a low-cost sensor setup and sparse HD maps is highly relevant for the current advances in autonomous driving, but remains a challenging topic in research. We present a novel monocular localization approach based on a sliding-window pose graph that leverages predicted uncertainties for increased precision and robustness against challenging scenarios and per-frame failures. To this end, we propose an efficient multi-task uncertainty-aware perception module, which covers semantic segmentation, as well as bounding box detection, to enable the localization of vehicles in sparse maps, containing only lane borders and traffic lights. Further, we design differentiable cost maps that are directly generated from the estimated uncertainties. This opens up the possibility to minimize the reprojection loss of amorphous map elements in an association-free and uncertainty-aware manner. Extensive evaluation on the Lyft 5 dataset shows that, despite the sparsity of the map, our approach enables robust and accurate 6D localization in challenging urban scenarios using only monocular camera images and vehicle odometry.

Object-Based Visual-Inertial Navigation System on Matrix Lie Group

通过tracking 车辆bounding box来辅助进行定位

In this paper, we propose a novel object-based visual-inertial navigation system fully embedded in a matrix Lie group and built upon the invariant Kalman filtering theory. Specifically, we focus on relative pose measurements of objects and derive an error equation at the associated tangent space. We prove that the observability property does not suffer from the filter inconsistency and nonlinear error terms are identically zero at the object initialization. A thorough Monte-Carlo simulation reveals that our approach yields consistent estimates and is very robust to a large initial state uncertainty. Furthermore, we demonstrate a real-world application to the KITTI dataset with a deep neural network-based 3D object detector. Experimental results report that noises on pose measurements follow a Gaussian-like density matching our assumption. The proposed method improves the localization and object global mapping accuracy by probabilistically accounting for inertial readings and object pose uncertainties at multiple views.

Towards Accurate Loop Closure Detection in Semantic SLAM with 3D Semantic Covisibility Graphs

语义slam之闭环检测Loop closure is necessary for correcting errors accumulated in simultaneous localization and mapping (SLAM) in unknown environments. However, conventional loop closure methods based on low-level geometric or image features may cause high ambiguity by not distinguishing similar scenarios. Thus, incorrect loop closure can occur. Though semantic 2D image information is considered in some literature to detect loop closures, there is little work that compares 3D scenes as an integral part of a semantic SLAM system. This paper introduces an approach integrated in a semantic SLAM system to combine high-level 3D semantic information and low-level feature information to conduct accurate loop closure detection and effective drift reduction. The effectiveness of our approach is demonstrated in testing results.

A Right Invariant Extended Kalman Filter for Object Based SLAM

With the recent advance of deep learning based object recognition and estimation, it is possible to consider object level SLAM where the pose of each object is estimated in the SLAM process. In this paper, based on a novel Lie group structure, a right invariant extended Kalman filter (RI-EKF) for object based SLAM is proposed. The observability analysis shows that the proposed algorithm automatically maintains the correct unobservable subspace, while standard EKF (Std-EKF) based SLAM algorithm does not. This results in a better consistency for the proposed algorithm comparing to Std-EKF. Finally, simulations and real world experiments validate not only the consistency and accuracy of the proposed algorithm, but also the practicability of the proposed RI-EKF for object based SLAM problem. The MATLAB code of the algorithm is made publicly available.

LaneMatch: A Practical Real-Time Localization Method Via Lane-Matching

This paper presents LaneMatch, a localization method for use in autonomous vehicles (AVs). It utilizes lane matching to obtain an AV’s lane occupancy and current pose estimation. Matching is performed on a compact low-resolution road map generated from satellite images. Our approach addresses the following challenges for AVs using this map: (1) misalignment between roads on the satellite images and their global coordinates, and (2) incomplete or incorrect lane detection outputs. First, LaneMatch estimates the offset between the AV‚Äôs global pose in the global coordinate system and its local map pose on the map. Secondly, LaneMatch utilizes a spatio-temporal integration of a particle filter and a factor graph to resolve lane-matching ambiguities. It strategically constrains the dimensionality of variables to obtain real-time performance. We use highway experiments to evaluate the processing time, occupancy accuracy, lateral/longitudinal and position/heading errors. These experiments show that LaneMatch localizes our AV rather precisely on the road map in real-time and can be used for navigation and planning purposes even in GNSS-unfriendly areas.

Binary Graph Descriptor for Robust Relocalization on Heterogeneous Data

In this paper, we propose a novel binary graph descriptor to improve loop detection for visual SLAM systems. Our contribution is fourfold: i) a graph embedding technique for generating binary descriptors which conserve both spatial and histogram frequential information extracted from images; ii) a generic mean of combining multiple layers of heterogeneous information into the proposed binary graph (BiG) descriptor, coupled with a matching and geometric checking method; iii) an implementation of our descriptor into an incremental Bag-of-Words (iBoW) structure without pre

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

ICRA2022 SLAM相关论文整理

目录

相关文章