多模态语义分割基础
时间:2022-12-02 17:30:01
文章目录
- 1 多传感模式的特点
- 2 深度语义分割
- 3 多模态语义分割
-
- 3.1 MULTI-MODAL DATASETS
- 3.2 多模态语义分割的挑战和问题
- 参考
语义分割的目标:是将一个场景分割成几个有意义的部分,通常是用语义标记图像中的每个像素(pixel-level semantic segmentation),或同时检测对象并标记逐像素(instance-level semantic segmentation)。
最近,为了统一pixel-level semantic segmentation和instance-level semantic segmentation,提出全景分割(panoptic segmentation)。
1 多传感模式的特点
- 视觉和热成像相机:视觉(visual camera)和热成像相机(thermal camera)捕获的图像可以为车辆周围环境提供详细的纹理信息。视觉相机对光线和天气条件非常敏感;热成像相机对白天/晚上的变化更敏感,因为它们可以检测到与物体热量相关的红外辐射。然而,这两种相机都不能直接提供深度信息。
- LIDAR(Light Detection And Ranging):以三维点的形式给出周围环境的准确深度信息。LIDAR它是一种主动摄影,它测量了一定频率发射的激光束的反射。激光雷达对不同照明条件的影响较小,受雾、雨等各种天气条件的影响较小。典型的激光雷达无法捕捉到物体的精细纹理,当物体距离较远时,激光雷达的点会变得稀疏。
- Radar(无线电探测和测距):Radar通过多普勒效应估计物体的径向速度、距离和角度,发射被障碍物反射的电磁波,测量信号的运行时间。它们在各种光照和天气条件下都很好,但由于分辨率低,通过雷达对物体进行分类是非常具有挑战性的。radar广泛应用于自适应巡航控制和交通拥堵辅助系统。(mmWave)是短波雷达技术。
2 深度语义分割
深度语义分割的数据集 | ||
---|---|---|
Cityscape | KITTI | Toronto City |
Mapillary远景 | ApolloScape |
像素级语义分割专注于分类 | 3/4/5 |
---|---|
专注于路端语义分割 | 【6】/【7】 |
专注于不同交通参与者的实例级语义分割 | 8/9/10 |
语义分割信息的语义分割 | 扩展卷积1112,多尺度预测13,随机场增加条件(CRFs)后处理步骤14 |
注重语义分割的实时性 | 从操作(GFLOPs)和推理速度(fps)对比研究了15几种语义分词架构的实时性 |
3 多模态语义分割
3.1 MULTI-MODAL DATASETS
3.2 多模态语义分割的挑战和问题
参考
- 数据集、方法和挑战
- Deep Multi-modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges
- A. Dewan, G. L. Oliveira, and W. Burgard, “Deep semantic classification for 3d lidar data,” in IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2017, pp. 3544–3549.
- L. Schneider et al., “Multimodal neural networks: RGB-D for semantic segmentation and object detection,” in Scandinavian Conf. Image Analysis. Springer, 2017, pp. 98–109.
- LV. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., no. 12, pp. 2481–2495, 2017.
- L. Caltagirone, S. Scheidegger, L. Svensson, and M. Wahde, “Fast lidar-based road detection using fully convolutional neural networks,” in IEEE Intelligent Vehicles Symp., 2017, pp. 1019–1024.
- M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, “MultiNet: Real-time joint semantic reasoning for autonomous driving,” in IEEE Intelligent Vehicles Symp., 2018.
- B. Wu, A. Wan, X. Yue, and K. Keutzer, “SqueezeSeg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3d lidar point cloud,” in IEEE Int. Conf. Robotics and Automation, May 2018, pp. 1887–1893.
- K. He, G. Gkioxari, P. Doll ? ar, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Conf. Computer Vision, 2017, pp. 2980–2988.
- J. Uhrig, E. Rehder, B. Fr ? ohlich, U. Franke, and T. Brox, “Box2Pix: Single-shot instance segmentation by assigning pixels to object boxes,” in IEEE Intelligent Vehicles Symp., 2018.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018.
- A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “ENet: A deep neural network architecture for real-time semantic segmentation,” arXiv:1606.02147 [cs.CV], 2016.
- A. Roy and S. Todorovic, “A multi-scle CNN for affordance segmentation in RGB images,” in Proc. Eur. Conf. Computer Vision. Springer, 2016, pp. 186–201.
- S. Zheng et al., “Conditional random fields as recurrent neural networks,” in Proc. IEEE Conf. Computer Vision, 2015, pp. 1529–1537.
- M. Siam, M. Gamal, M. Abdel-Razek, S. Yogamani, M. Jagersand, and H. Zhang, “A comparative study of real-time semantic segmentation for autonomous driving,” in Workshop Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 587–597.