基于MLDA-Net实现单目深度估计

时间：2022-08-14 06:30:01 abs轮速传感器各自的优缺点

基于MLDA-Net实现单目深度估计

1. 项目背景

在计算机视觉领域，深度指图像中表示的像素3D点，从三维空间到相机镜头的距离。而深度估计它是场景感知的重要组成部分。测量与物体之间的距离是所有生物的生存技能。在计算机视觉领域，深度估计也有很多高层任务的基石，其结果广泛用于视觉导航、障碍物检测、三维重建等方向

视觉导航	障碍物规避	三维立体重建

传统的方法是利用激光雷达或结构光在物体表面的反射来获取深度信息，但由于获得密集准确深度图的成本太高，不能广泛使用。相比之下，基于图像的深度估计该方法直接基于输入(R, G, B)不需要昂贵的设备来估计场景的深度信息，应用场景更广
基于图像的深度估计方法可根据输入图像的数量分为多目深度估计和单目深度估计
- 多目深度估计是通过观察获得的多个像片来估计深度的。更经典的算法是从运动中恢复(SfM)、多视图重建(MVS)等等。这些方法大多需要成对图像或图像序列作为输入，并已知相机参数，对输入有很强的限制，结果受到特征提取和特征匹配的影响
- 单目深度估计只需要一个像片来估计深度，比多目深度估计更灵活，但是由于缺乏深度线索，这是一个不稳定的问题

双眼深度估计	单目深度估计

虽然很难从单目图中获得准确的深度信息，但单目深度估计算法给出的粗略预测结果可以作为先验知识，保证算法的收敛性和鲁棒性；工业生产中有许多硬件可以直接获得深度，但各有各自的缺陷：Lidar设备昂贵；基于结构光的深度摄像头，如Kinect它不能在户外使用，深度图也有很多噪音；双目相机需要三维匹配算法，计算量大，在低纹理区域效果差。单目相机，成本最低，设备普及广泛，单目深度估计应用潜力大于其他方案
基于深度学习的单目深度估计可分为两类：监督单目有深入估计和自监督单目深度估计。监督单目的深度估计方法需要地面真实深度作为监督，通常依靠高精度深度传感器捕获地面真实深度信息，这极大地限制了这些方法的使用。最近的发展自监督单目深度估计其目的是部分缓解这些缺点使用连续帧之间的约束来预测深度信息，相比之下，自监督单目的深度估计方法并不依赖于地面的真实深度信息，这使得它们更容易应用于许多应用。
这个项目展示IEEE TRANSACTIONS ON IMAGE PROCESSING 2021的论文《MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation》为参考，基于多层次双重注意力的自监督深度估计网络MLDA-Net，针对自监督方法预测的深度图，往往比较模糊，导致很多深度细节丢失的问题，提出问题框架的自监督深度估计MLDA-Net，像素级深度信息，边界准确，细节丰富。 在KITTI在基准数据集集SOTA，该模型的优势也可以反映在其他基准数据集中

2. 技术难点

有监督的深度估计，获得真实价值需要大量的物力和财力，成本高
目前自监督深度估计特征提取不足，限制了自监督深度估计方法的性能
目前方法获得的深度估计图深度模糊，例如图中的(b)如部分所示，可以看到闸门图案、植物叶片区域等一些小目标的深度估计信息细节容易丢失

3.解决方案

框架采用自监督的深度估计MLDA-Net，以低分辨率的彩色图像为输入，提供深度信息估计能力通过自我监督估计相应的深度信息，有效减少对深度采集设备的依赖
采用多级特征融合(multi-level feature extraction, MLFE)的策略，丰富的层次从不同的尺度层中提取，用于高质量的深度预测
采用双重注意策略来获得有效的特征结合全局和局部注意力模块，增强全局和局部结构信息
重新加权(re-weighted)策略计算损失函数，重新加权不同层次输出的深度信息，有效监督最终输出的深度信息
模型的总体结构如上图所示。
- 最左边列出多个不同尺度的输入数据，该尺度为可选参数scales。通过两个卷积网络输入数据提取特征，然后注意网络GA集成并进一步提取特征。这部分是该模型的第一个网络结构，在代码中被称为models[“encoder”]
- 在提取特征后，该模型就进入第二个网络结构models[“depth]，即上图右侧的一半结构。该网络主要基于两个注意力块，最终输出不同尺度输入图对应的深度图

4. 数据准备

数据集介绍

本项目使用的数据集是KITTI数据集，KITTI数据集卡尔斯鲁厄理工学院和丰田理工大学芝加哥分校共同赞助的自动驾驶研究数据集
数据集作者收集真实的交通环境持续6小时，数据集有校正和同步图像、雷达扫描、高精度GPS信息和IMU加速信息各种模式的信息组成。作者还在数据集官方网站上提供各种任务，如光流、物体检测和深度估计Benchmark
更多信息可供参考KITTI，图源深度估计KITTI数据集介绍及使用说明

训练数据集

使用的训练数据集是KITTI，该数据集由校准视频组成，与激光雷达测量相匹配。深度信息评估为激光雷达点云，数据集大小为176G
将KITTI数据集分为39810个3个单目图像集，测试集和验证集的数量分别为4424和697。自监督学习的单目深度估计有三种训练模式:
- 单目视觉
- 双眼立体视觉
- 单目双眼立体视觉
文章中三种模式分别试验了，本项目采用第3种单目双眼立体视觉具体做法可参考论文原文。同时，需要生成数据集depth_hints，实验结果表明使用depth_hints如何生成能够提高深度估计的结果可供参考：https://github.com/nianticlabs/depth-hints
该项目还提供了较小规模的项目kitti这个数据集是数据集kitti数据集中的10_03部分，并提供了生成的depth_hint简化训练。
- 可通过以下链接和验证码获取该数据集，链接：https://pan.baidu.com/s/1Rj5biYkAR2pURuzR881GQw ，提取码：n5wa
- 也可以通过解压项目挂载的数据集得到10_03部分数据，操作如下所示

# 解压文件到dataset文件夹下，尽管是一部份，由于数据集比较大，解压时间依然超过10分钟
# 如果不打算训练可跳过这步
!unzip -oq /home/aistudio/data/data148774/10_03.zip -d ./dataset/

数据格式

解压到dataset文件夹中的数据格式如下

!tree -L 3 ./dataset

./dataset
├── 2011_10_03
│   ├── 2011_10_03_drive_0027_sync
│   │   ├── image_00
│   │   ├── image_01
│   │   ├── image_02
│   │   ├── image_03
│   │   ├── oxts
│   │   └── velodyne_points
│   ├── 2011_10_03_drive_0034_sync
│   │   ├── image_00
│   │   ├── image_01
│   │   ├── image_02
│   │   ├── image_03
│   │   ├── oxts
│   │   └── velodyne_points
│   ├── 2011_10_03_drive_0042_sync
│   │   ├── image_00
│   │   ├── image_01
│   │   ├── image_02
│   │   ├── image_03
│   │   ├── oxts
│   │   └── velodyne_points
│   ├── 2011_10_03_drive_0047_sync
│   │   ├── image_00
│   │   ├── image_01
│   │   ├── image_02
│   │   ├── image_03
│   │   ├── oxts
│   │   └── velodyne_points
│   ├── calib_cam_to_cam.txt
│   ├── calib_imu_to_velo.txt
│   └── calib_velo_to_cam.txt
└── depth_hints
    └── 2011_10_03
        ├── 2011_10_03_drive_0034_sync
        └── 2011_10_03_drive_0042_sync

33 directories, 3 files

5. 环境准备

运行本项目需要clone仓库、安装依赖项、准备本项目提供的预训练模型
预训练模型文件夹有4个文件，2个是预测深度信息的权重，另外两个是预测位姿的权重。因为单目视觉的训练需要模型同时估计深度信息和相机位姿，这样才能计算损失函数，具体细节见论文

# 克隆仓库
!git clone https://github.com/bitcjm/MLDA-Net-repo

正克隆到 'MLDA-Net-repo'...
remote: Enumerating objects: 54, done.[K
remote: Counting objects: 100% (54/54), done.[K
remote: Compressing objects: 100% (43/43), done.[K
remote: Total 54 (delta 3), reused 51 (delta 3), pack-reused 0[K
展开对象中: 100% (54/54), 完成.
检查连接... 完成。

#安装必要的库
!pip install munch
!pip install scikit-image
!pip install natsort

# 将权重转移到新建的文件夹
!mkdir -p ./weights
!cp -r data/data149788/*.pdparams ./weights

6. 模型训练

运行如下代码开始训练模型：默认使用一个gpu训练
训练过程中会在MLDA-Net-repo/log_train/文件夹下生成train.log文件夹，用于保存训练日志。 模型训练需使用paddle2.2版本，paddle2.3版本由于paddle.cumsum函数存在问题，会输出错误结果

%cd MLDA-Net-repo
!python train.py --data_path ../dataset --depth_hint_path ../dataset/depth_hints

7. 模型评估

MLDA-Net使用单张GPU通过如下命令一键式启动评估
注：如果要在自己提供的模型上进行测试，请将修改参数 --load_weights_folder your_weight_folder

!python test.py --data_path ../dataset/ --depth_hint_path ../dataset/depth_hints --load_weights_folder ../weights/

W0708 13:21:27.491781 21099 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0708 13:21:27.497159 21099 device_context.cc:465] device: 0, cuDNN Version: 7.6.
loading model from folder ../weights/
Loading encoder weights...
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py:415: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterable) and not isinstance(obj, (
Loading depth weights...
Loading pose_encoder weights...
Loading pose weights...
Epoch 0: StepDecay set learning rate to 0.0001.
Models and images files are saved to:
   ./log_test
There are 11220 training items and 438 validation items

[2022-07-08 13:23:42,986][INFO] val_dateset: rmse_avg=4.196064473287156,rmse_min=1.2780027389526367 ,rmse_max=9.574527740478516 
eval: ...........................................................................................................................................................................................................................

8. 模型预测

使用测试集中的单张图像预测

对模型进行单图像的简单测试时，运行以下命令
输出深度信息结果保存在MLDA-Net-repo/predict_figs文件夹下，名字为depth_predict.jpg

!python predict.py --load_weights_folder ../weights/

W0708 13:27:25.425439 21837 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0708 13:27:25.430135 21837 device_context.cc:465] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py:415: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterable) and not isinstance(obj, (
predict_img saved to ./predict_figs/depth_predict.jpg
eval
   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.120  &   0.765  &   4.355  &   0.170  &   0.848  &   0.973  &   0.992  \\
-> Done!

# 展示结果，图像上方为RGB图，下方为对应的预测深度图
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline

img_depth = Image.open(r"./predict_figs/depth_predict.jpg")
plt.figure(figsize=(10,8))
plt.imshow(img_depth)
plt.axis("off")
plt.show()

使用自己的数据进行预测

如果要在自己提供的图片上进行测试，执行以下命令
color_path参数为要预测的图像路径

!python predict.py --load_weights_folder ../weights/ --color_path ../work/color/1.jpg --no_rmse True

W0708 14:43:47.578425 30959 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0708 14:43:47.582940 30959 device_context.cc:465] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py:415: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterable) and not isinstance(obj, (
predict_img saved to ./predict_figs/depth_predict.jpg
can't find depth files, can't count rmse

9. 模型导出

以下命令将训练好的模型导出成预测所需的模型结构文件model.pdmodel和模型权重文件model.pdiparams以及model.pdiparams.info文件，均存放在MLDA-Net-repo\inference\目录下。
注：由于该模型的主体实际上是两组模型，所以会生成两组文件model_encoder和model_depth.

!python export.py --load_weights_folder ../weights/ --save_dir ./inference

W0708 14:49:55.859138 31727 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0708 14:49:55.863909 31727 device_context.cc:465] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/framework/io.py:415: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterable) and not isinstance(obj, (
Loaded trained params of model successfully.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and
Model is saved in ./inference.

10. 模型部署

利用导出的模型文件和paddle inference进行python端的部署

!python inference.py

使用自己的数据，则执行以下命令

!python inference.py --color_path ../work/color/1.jpg --no_rmse True

11. 项目参考

【第六届论文复现赛103题】 MLDA-Net深度估计模型 paddle复现
MLDA-Net: Multi-Level Dual Attention-Based Network for Self-Supervised Monocular Depth Estimation
基于深度学习的单目深度估计方法综述

项目仅为搬运，原作链接：

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

基于MLDA-Net实现单目深度估计

相关文章