基于视角特征提取的3D检测方法汇总

时间：2022-08-28 22:00:01 继电器rs1a23d25 zc01029950传感器

作者丨柒柒@知乎

来源丨https://zhuanlan.zhihu.com/p/458068647

编辑丨3D视觉工坊

本文主要梳理了最近的3篇文章D Detection分类列出了一些我认为更重要和有代表性的工作。主要解释基于视角特征的3D欢迎补充指正检测方法。

一、论文分类汇总

1. 基于激光雷达点云的3D检测方法（LiDAR only）

n方法名称	方法类别	发表年份	代码是否开源
Part-A^2	LiDAR only	TPAMI 2020	√
PointRCNN	LiDAR only	CVPR 2019	√
STD	LiDAR only	ICCV 2019
PV-RCNN	LiDAR only	CVPR 2020	√
PointPillar	LiDAR only	CVPR 2019
MVP	LiDAR only	NIPS 2021	√
SE-SSD	LiDAR only	CVPR 2021	√
SA-SSD	LiDAR only	CVPR 2020	√
HVPR	LiDAR only	CVPR 2021	√
LiDAR RCNN	LiDAR only	CVPR 2021	√
SECOND	LiDAR only	Sensors 2018	√
3DIoUMatch	LiDAR only	CVPR 2021	√
CenterPoint	LiDAR only	CVPR 2021	√
3DSSD	LiDAR only	CVPR 2021	√
CIA-SSD	LiDAR only	AAAI 2021

2. 3.基于多模态集成D检测方法（LiDAR RGB）

方法名称	方法类别	发表年份	代码是否开源
AVOD-FPN	LiDAR RGB	IROS 2018	√
F-PointNet	LiDAR RGB	CVPR 2018
F-ConvNet	LiDAR RGB	IROS 2019	√
4D-Net	LiDAR RGB	ICCV 2021
MV3D	LiDAR RGB	CVPR 2017	√
CM3D	LiDAR RGB	WACV 2021
H^2 3D RCNN	LiDAR RGB	TCSVT 2021	√

3. 3.基于单目图像D检测方法（Monocular）

td height="24">Monocular

方法名称	方法类别	发表年份	代码是否开源
AutoShape	Monocular	ICCV 2021	√
CaDDN	Monocular	CVPR 2021	√
MonoDLE	CVPR 2021	√
DDMP	Monocular	CVPR 2021	√
GUPNet	Monocular	ICCV 2021
FCOS3D	Monocular	ICCVW 2021	√
PGD	Monocular	CoRL 2021	√
MonoGRNet	Monocular	TPAMI 2021

4. 基于双目图像的3D检测方法（Stereo）

方法名称	方法类别	发表年份	代码是否开源
SIDE	Stereo	WACV 2022
LIGA-Stereo	Stereo	ICCV 2021	√
E2E-PL	Stereo	CVPR 2020	√

5. 基于视角特征提取的3D检测方法

方法名称	方法类别	发表年份	代码是否开源
H^2 3D RCNN	Front & Bird view	TCSVT 2021	√
PointPillar	Bird view	CVPR 2019	√
F-PointNet	Frustum	CVPR 2018
F-ConvNet	Frustum	IROS 2019	√
TANet	Bird view	AAAI 2020	√

6. 基于特征补充/伪点云生成的3D检测方法（pseudo augment）

方法名称	方法类别	发表年份	代码是否开源
PointPainting	pseudo augment	CVPR 2020
PointAugmenting	pseudo augment	CVPR 2021
E2E-PL	pseudo augment	CVPR 2020	√
Pseudo-LiDAR	pseudo augment	CVPR 2019	√
Pseudo-LiDAR++	pseudo augment	ICLR 2020	√
MVP	pseudo augment	NIPS 2021	√

7. 基于transformer的3D检测方法（Transformer）

方法名称	方法类别	发表年份	代码是否开源
VoTr	Transformer	ICCV 2021
CT3D	Transformer	ICCV 2021	√
M3DETR	Transformer	Arxiv
DETR3D	Transformer	CoRL 2021	√
PoinTr	Transformer	ICCV 2021	√

8. 基于半监督学习的3D检测方法（Semi supervised）

方法名称	方法类别	发表年份	代码是否开源
3DAL	Semi supervised	Arxiv
3DIoUMatch	Semi supervised	CVPR 2021	√
WS3D	Semi supervised	TPAMI 2021	√

二、论文分类解读

1. H^2 3D RCNN （TCSVT 2021）

论文地址： https://arxiv.org/pdf/2107.14391.pdf
作者单位：University of Science and Technology of China
代码地址： https://github.com/djiajunustc/H-23D_R-CNN
一句话读论文：结合不同投影视角下的特征特性构成空间结构信息更丰富的3D feature。

stone：2021TCSVT-From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection地址：https://zhuanlan.zhihu.com/p/396436677

网络框架图

KITTI testset 实验结果

整体网络依然属于two-stage范畴，the first stage提取proposals，the second stage进一步预测。

主要有两点需要注意：

其一，在第一级也就是proposal生成部分，作者将点云数据分别投影到正视图和俯视图。在正视图利用圆柱/圆筒提取特征，也就是文中的Cylindrical Coordinates 。在俯视图利用类似pillar的方式提取特征，也就是文中的Cartesian Coordinates。

Given the input point clouds, we sequentially project the points into the perspective view and the bird-eye view, and leverage 2D backbone networks for feature extraction.

其二，在第二级也就是微调阶段，作者设计了HV RoI Pooling模块。作者认为voxel RoI Pooling存在的问题是，对于同一个grid point，根据不同的query radius进行 group（不同半径的同心圆可能会包括相同voxel feature），可能导致非空体素被重复采样。此外，随着网格数G和query radius r的增长，计算量成三次方增加。

the non-empty voxels within a small and a large query range centered on the same grid point can be highly reduplicate, which may incur the significance of multi-scale grouping.Besides, the computational cost of grouping neighbor voxels grows cubically with the partition grid number G and the query range r.

2. PointPillar（CVPR 2019）

论文地址： https://arxiv.org/pdf/1812.05784.pdf
作者单位：Oscar Beijbom and nuTonomy: an APTIV company
一句话读论文：A novel encoder which utilizes PointNets to learn a representation of point clouds organized in vertical columns (pillars).

柒柒：Arxiv | 3D Detection | PointPillars地址：https://zhuanlan.zhihu.com/p/389034609

网络框架

KITTI testset 实验结果

这篇文章的核心内容是Pillar Feature Network （Pillar点云提取）。主要作用是将点云数据处理为常见的三维数据，也就是point clouds → C×H×W。具体流程为：point clouds → D×P×N → C×P×N → C×P → C×H×W，其中：

D=9，包括每个点坐标(x, y, z)，对应的pillar中心点坐标(x_c, y_c, z_c)，该点到中心点的偏移(x_p, y_p)，以及反射值r。P=12000，表示单场景中的pillar数目。N=100，表示每个pillar中采样的point数目。

The points in each pillar are then augmented with xc, yc, zc, xp and yp where the c subscript denotes distance to the arithmetic mean of all points in the pillar and the p subscript denotes the offset from the pillar x; y center.

2. D×P×N → C×P×N 通过卷积运算实现。

Next, we use a simplified version of PointNet where, for each point, a linear layer is applied followed by Batch- Norm and ReLU to generate a C×P×N sized tensor.

3. C×P×N → C×P是对N维度做max运算，类似max pooling。

This is followed by a max operation over the channels to create an output tensor of size C×P.

4. C×P → C×H×W是对P维度的展开，因为P=12000，表示该场景中的pillar数目，因此作者认为可以一定程度上描述完整场景，所以将P维度直接展开，可以得到我们熟悉的C×H×W三维数据。

Once encoded, the features are scattered back to the original pillar locations to create a pseudo-image of size C×H×W where H and W indicate the height and width of the canvas.

至此，其实核心部分已经处理完毕，接下来可以用我们熟知的检测方法处理此点云数据。

3. F-PointNet （CVPR 2018）

论文地址： https://arxiv.org/pdf/1711.08488.pdf
作者单位：Stanford University
代码地址： https://github.com/charlesq34/frustum-pointnets
一句话读论文：利用2D detector构造视锥区域约束3D检测。

网络框架图

KITTI testset 实验结果

从网络框图上看，显然本文属于LiDAR+RGB fusion，不同于上文提到的AVOD-FPN基于feature层面的融合，F-PointNet的融合属于results层面的。换句话说就是，LiDAR数据并没有直接fuse2D图像特征，而仅仅是采用了2D图像检测映射出的视锥区域。

具体地，整体网络框架流程可以描述为：输入RGB图像得到2D检测结果，将其映射为3D视锥区域，在视锥区域内进行前背景点及3D box预测。

As shown in the Figure, our system for 3D object detection consists of three modules: frustum proposal, 3D instance segmentation, and 3D amodal bounding box estimation.

主要有两点需要强调：

其一，为什么视锥是有效的（具体可以参考论文4.1节 Frustum Proposal）？首先，目前的2D检测器在通用场景下的检测相对成熟；其次，利用2D检测结果约束3D box极大地缩减了点云搜索空间。

The resolution of data produced by most 3D sensors, especially real-time depth sensors, is still lower than RGB images from commodity cameras. Therefore, we leverage mature 2D object detector to propose 2D object regions in RGB images as well as to classify objects.

其二，一个视锥区域仅检测一个物体。作者的解释是：一个视锥对应一个2D bounding box，而一个2D bounding box有且预测一个object。

Note that each frustum contains exactly one object of interest. Here those "other" points could be points of non-relevant areas (such as ground, vegetation) or other instances that occlude or are behind the object of interest.

4. F-ConvNet （IROS 2019）

论文地址： https://arxiv.org/abs/1903.01864
作者单位：South China University of Technology
代码地址： https://github.com/zhixinwang/frustum-convnet
一句话读论文：利用视锥组而非单一视锥特征进行3D box预测。

网络框架图

KITTI testset 实验结果

从网络框架图也能看出来，文章核心思路和上文中的F-PointNet类似，均是利用2D detector构造视锥区域约束3D检测。不同点在于：F-PointNet使用的是point segmentation feature，也就是point-level feature，F-ConvNet使用的是Frustum feature，也就是frustum-level的。

Given 2D region proposals in an RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustumlevel feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustum-level features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space.

那么相比point-level，frustum level有哪些优势呢？

其一，F-PointNet并非为end-to-end的方式；

it is not of end-to-end learning to estimate oriented boxes.

其二，视锥当中前景点往往很少，因此难以支撑准确的3D检测。

Final estimation relies on too few foreground points which themselves are possibly segmented wrongly.

其三，（这点是个人理解，大家见仁见智），point-level往往是local的也就是局部的，而frustum则更能表征object structure。

5. TANet （AAAI2020）

论文地址： https://arxiv.org/abs/1912.05163
作者单位：Huazhong University of Science and Technology
代码地址： https://github.com/happinesslz/TANet

stone：2020AAAI——TANet: Robust 3D Object Detection from Point Clouds with Triple Attentionhttps://zhuanlan.zhihu.com/p/372725085

本文仅做学术分享，如有侵权，请联系删文。

3D视觉精品课程推荐：

1.面向自动驾驶领域的多传感器数据融合技术

2.面向自动驾驶领域的3D点云目标检测全栈学习路线！(单模态+多模态/数据+代码)
3.彻底搞透视觉三维重建：原理剖析、代码讲解、及优化改进
4.国内首个面向工业级实战的点云处理课程
5.激光-视觉-IMU-GPS融合SLAM算法梳理和代码讲解
6.彻底搞懂视觉-惯性SLAM：基于VINS-Fusion正式开课啦
7.彻底搞懂基于LOAM框架的3D激光SLAM: 源码剖析到算法优化
8.彻底剖析室内、室外激光SLAM关键算法原理、代码和实战(cartographer+LOAM +LIO-SAM)

9.从零搭建一套结构光3D重建系统[理论+源码+实践]

10.单目深度估计方法：算法梳理与代码实现

11.自动驾驶中的深度学习模型部署实战

12.相机模型与标定(单目+双目+鱼眼）

13.重磅！四旋翼飞行器：算法与实战

14.ROS2从入门到精通：理论与实战

重磅！3DCVer-学术论文写作投稿 交流群已成立

扫码添加小助手微信，可申请加入3D视觉工坊-学术论文写作与投稿微信交流群，旨在交流顶会、顶刊、SCI、EI等写作与投稿事宜。

同时也可申请加入我们的细分方向交流群，目前主要有3D视觉、CV&深度学习、SLAM、三维重建、点云后处理、自动驾驶、多传感器融合、CV入门、三维测量、VR/AR、3D人脸识别、医疗影像、缺陷检测、行人重识别、目标跟踪、视觉产品落地、视觉竞赛、车牌识别、硬件选型、学术交流、求职交流、ORB-SLAM系列源码交流、深度估计等微信群。

一定要备注：研究方向+学校/公司+昵称，例如：”3D视觉 + 上海交大 + 静静“。请按照格式备注，可快速被通过且邀请进群。原创投稿也请联系。

▲长按加微信群或投稿

▲长按关注公众号

3D视觉从入门到精通知识星球：针对3D视觉领域的视频课程（三维重建系列、三维点云系列、结构光系列、手眼标定、相机标定、激光/视觉SLAM、自动驾驶等）、知识点汇总、入门进阶学习路线、最新paper分享、疑问解答五个方面进行深耕，更有各类大厂的算法工程人员进行技术指导。与此同时，星球将联合知名企业发布3D视觉相关算法开发岗位以及项目对接信息，打造成集技术与就业为一体的铁杆粉丝聚集区，近4000星球成员为创造更好的AI世界共同进步，知识星球入口：

学习3D视觉核心技术，扫描查看介绍，3天内无条件退款

圈里有高质量教程资料、答疑解惑、助你高效解决问题

觉得有用，麻烦给个赞和在看~

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

基于视角特征提取的3D检测方法汇总

一、论文分类汇总

1. 基于激光雷达点云的3D检测方法（LiDAR only）

2. 3.基于多模态集成D检测方法（LiDAR RGB）

3. 3.基于单目图像D检测方法（Monocular）

4. 基于双目图像的3D检测方法（Stereo）

5. 基于视角特征提取的3D检测方法

6. 基于特征补充/伪点云生成的3D检测方法（pseudo augment）

7. 基于transformer的3D检测方法 （Transformer）

8. 基于半监督学习的3D检测方法（Semi supervised）

二、论文分类解读

1. H^2 3D RCNN （TCSVT 2021）

2. PointPillar（CVPR 2019）

3. F-PointNet （CVPR 2018）

4. F-ConvNet （IROS 2019）

5. TANet （AAAI2020）

相关文章

7. 基于transformer的3D检测方法（Transformer）