锐单电子商城 , 一站式电子元器件采购平台!
  • 电话:400-990-0325

语义分割的三点奇技淫巧

时间:2022-08-11 03:30:00 e0e1e2传感器uk传感器

作者丨Peiyuan Liao@知乎

来源丨https://www.zhihu.com/question/272988870/answer/562262315

极市平台编辑

本文经作者授权,未经作者授权,不得二次转载。

本人才学浅,分享一些水Kaggle看和用tricks嗯,纯工程没有理论,不能保证可以使用hhh

代码取自在Kaggle论坛上看到的帖子个人做过的project

1. 如何去优化IoU

我们有时会在分割中使用它intersection over union具体定义如下:

有了这个定义,我们可以规定,比如predicted instance和actual instance,IoU大于0.5算一个positive。在此基础上可以做一些事情F1,F其他比较宏观的,比如2metric。

所以如何优化IoU呢??_?

以二分类问题为例baseline先扔一个binary-crossentropy看看效果,所以有以下实现(PyTorch):

 
     
class BCELoss2d(nn.Module):      def __init__(self, weight=None, size_average=True):          super(BCELoss2d, self).__init__()          self.bce_loss = nn.BCELoss(weight, size_average)        def forward(self, logits, targets):          probs        = F.sigmoid(logits)          probs_flat   = probs.view (-1)          targets_flat = targets.view(-1)          return self.bce_loss(probs_flat, targets_flat)

但问题是,优化BCE不等于优化IoU。这篇文章(https://arxiv.org/pdf/1705.08790.pdf)说的显然比我好,但直观来说就是一个minibatch里每个pixel权重其实是不一样的。两张照片,一张样本有1000张pixels,另一个只有四个,第二个只有一个pixel带来的IoU损失可以在第一张中排名250pixel的损失。

可以直接优化吗?IoU?“

是的,但这绝对不是最好的:

 
     
def iou_coef(y_true, y_pred, smooth=1):     """     IoU = (|X & Y|)/ (|X or Y|)     """     intersection = K.sum(K.abs(y_true * y_pred), axis=-1)     union = K.sum((y_true,-1)   K.sum(y_pred,-1) - intersection     return (intersection   smooth) / ( union   smooth)   def iou_coef_loss(y_true, y_pred):     return -iou_coef(y_true, y_pred)

这个问题练过程不稳定。我们希望监督一个模型从坏到好。loss/metric过渡平稳,但直接暴力应用IoU显然不行。。

所以我们有Lovász-Softmax!A tractablesurrogatefor the optimization of the intersection-over-union measure in neural networks

https://github.com/bermanmaxim/LovaszSoftmax

为什么会这样?loss比BCE/Jaccard我不敢胡说八道...但从个人使用经验来看,效果是拔群 \ (???) /

另一个有趣的细节是:原来的implementation中这一段:

loss = torch.dot(F.relu(errors_sorted), Variable(grad))

如果把relu换成elu 1有时效果更好。我猜可能是因为elu 1比relu更光滑?

1.1 如果不在乎训练时间,

试试这个:

 
     
def symmetric_lovasz(outputs, targets):         return (lovasz_hinge(outputs, targets)   lovasz_hinge(-outputs, 1 - targets)) / 2

1.2 假如你的模型打不过Hard Examples的话

在你的loss后面加这个:

 
     
def focal_loss(self, output, target, alpha, gamma, OHEM_percent):         output = output.contiguous().view(-1)         target = target.contiguous().view(-1)           max_val = (-output).clamp(min=0)         loss = output - output * target   max_val   ((-max_val).exp()   (-output - max_val).exp()).log()           # This formula gives us the log sigmoid of 1-p if y is 0 and of p if y is 1         invprobs = F.logsigmoid(-output * (target * 2 - 1))         focal_loss = alpha * (invprobs * gamma).exp() * loss           # Online Hard Example Mining: top x% losses (pixel-wise). Refer to http://www.robots.ox.ac.uk/~tvg/publications/2017/0026.pdf         OHEM, _ = focal_loss.topk(k=int(OHEM_percent * [*focal_loss.shape][0]))         return OHEM.mean()

2. 魔改U-Net

原始Unet长这样子(Keras):

 
     
def conv_block(neurons, block_input, bn=False, dropout=None):     conv1 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(block_input)     if bn:         conv1 = BatchNormalization()(conv1)     conv1 = Activation('relu')(conv1)     if dropout is not None:         conv1 = SpatialDropout2D(dropout)(conv1)     conv2 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(conv1)     if bn:         conv2 = BatchNormalization()(conv2)     conv2 = Activation('relu')(conv2)     if dropout is not None:         conv2 = SpatialDropout2D(dropout)(conv2)     pool = MaxPooling2D((2,2))(conv2)     return pool, conv2  # returns the block output and the shortcut to use in the uppooling blocks   def middle_block(neurons, block_input, bn=False, dropout=None):     conv1 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(block_input)     if bn:         conv1 = BatchNormalization()(conv1)     conv1 = Activation('relu')(conv1)     if dropout is not None:         conv1 = SpatialDropout2D(dropout)(conv1)    conv2 = Conv2D(neurons, (3,3), padding='same', kernel_initializer='glorot_normal')(conv1)
    if bn:
        conv2 = BatchNormalization()(conv2)
    conv2 = Activation('relu')(conv2)
    if dropout is not None:
        conv2 = SpatialDropout2D(dropout)(conv2)


    return conv2


def deconv_block(neurons, block_input, shortcut, bn=False, dropout=None):
    deconv = Conv2DTranspose(neurons, (3, 3), strides=(2, 2), padding="same")(block_input)
    uconv = concatenate([deconv, shortcut])
    uconv = Conv2D(neurons, (3, 3), padding="same", kernel_initializer='glorot_normal')(uconv)
    if bn:
        uconv = BatchNormalization()(uconv)
    uconv = Activation('relu')(uconv)
    if dropout is not None:
        uconv = SpatialDropout2D(dropout)(uconv)
    uconv = Conv2D(neurons, (3, 3), padding="same", kernel_initializer='glorot_normal')(uconv)
    if bn:
        uconv = BatchNormalization()(uconv)
    uconv = Activation('relu')(uconv)
    if dropout is not None:
        uconv = SpatialDropout2D(dropout)(uconv)


    return uconv


def build_model(start_neurons, bn=False, dropout=None):    
    input_layer = Input((128, 128, 1))
    # 128 -> 64
    conv1, shortcut1 = conv_block(start_neurons, input_layer, bn, dropout)
    # 64 -> 32
    conv2, shortcut2 = conv_block(start_neurons * 2, conv1, bn, dropout)
    # 32 -> 16
    conv3, shortcut3 = conv_block(start_neurons * 4, conv2, bn, dropout)
    # 16 -> 8
    conv4, shortcut4 = conv_block(start_neurons * 8, conv3, bn, dropout)
    #Middle
    convm = middle_block(start_neurons * 16, conv4, bn, dropout)
    # 8 -> 16
    deconv4 = deconv_block(start_neurons * 8, convm, shortcut4, bn, dropout)
    # 16 -> 32
    deconv3 = deconv_block(start_neurons * 4, deconv4, shortcut3, bn, dropout)
    # 32 -> 64
    deconv2 = deconv_block(start_neurons * 2, deconv3, shortcut2, bn, dropout)
    # 64 -> 128
    deconv1 = deconv_block(start_neurons, deconv2, shortcut1, bn, dropout)
    #uconv1 = Dropout(0.5)(uconv1)
    output_layer = Conv2D(1, (1,1), padding="same", activation="sigmoid")(deconv1)
    model = Model(input_layer, output_layer)
    return model

但一般与其是用transposed convolution我们会选择用upsampling+3*3 conv(Deconvolution and Checkerboard Artifacts),具体原因请见这篇文章:Deconvolution and Checkerboard Artifacts (强烈安利distill,blog质量奇高)

再往下说,在实际做project的时候往往没有那么多的训练资源,所以我们得想办法把那些classification预训练模型嵌入到Unet中。ʕ•ᴥ•ʔ

把encoder替换预训练的模型的诀窍在于,如何很好的提取出pretrained models在不同尺度上提取出来的信息,并且如何把它们高效的接在decoder上。常见的用于嫁接的模型有Inception和Mobilenet,但我在这里就分析一下更直观一些的ResNet/ResNeXt这一类的模型:

 
     
def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)


        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)


        x = self.avgpool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)


        return x

我们可以很明显的看出不同尺度的feature map分别是由不同的layer来提取的,我们就可以从中选出几个来做concat,upsample,conv。唯一一点要注意的是千万不要错位concat,否则最后出来的output可能会和输入图大小不同。下面分享一个可行的搭法,其中为了提升各feature map的resolution我移去了原resnet conv1中的pool:

 
     
def __init__(self):
        super().__init__()
        self.resnet = models.resnet34(pretrained=True)


        self.conv1 = nn.Sequential(
            self.resnet.conv1,
            self.resnet.bn1,
            self.resnet.relu,
        )


        self.encoder2 = self.resnet.layer1 # 64
        self.encoder3 = self.resnet.layer2 #128
        self.encoder4 = self.resnet.layer3 #256
        self.encoder5 = self.resnet.layer4 #512


        self.center = nn.Sequential(
            ConvBn2d(512,512,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            ConvBn2d(512,256,kernel_size=3,padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2,stride=2),
        )


        self.decoder5 = Decoder(256+512,512,64)
        self.decoder4 = Decoder(64 +256,256,64)
        self.decoder3 = Decoder(64 +128,128,64)
        self.decoder2 = Decoder(64 +64 ,64 ,64)
        self.decoder1 = Decoder(64     ,32 ,64)


        self.logit = nn.Sequential(
            nn.Conv2d(384, 64, kernel_size=3, padding=1),
            nn.ELU(inplace=True),
            nn.Conv2d(64, 1, kernel_size=1, padding=0),
        )


    def forward(self, x):
        mean=[0.485, 0.456, 0.406]
        std=[0.229,0.224,0.225]
        x=torch.cat([
           (x-mean[2])/std[2],
           (x-mean[1])/std[1],
           (x-mean[0])/std[0],
        ],1)


        e1 = self.conv1(x)
        e2 = self.encoder2(e1)
        e3 = self.encoder3(e2)
        e4 = self.encoder4(e3)
        e5 = self.encoder5(e4)


        f = self.center(e5)
        d5 = self.decoder5(f, e5)
        d4 = self.decoder4(d5,e4)
        d3 = self.decoder3(d4,e3)
        d2 = self.decoder2(d3,e2)
        d1 = self.decoder1(d2)

关于decoder的设计方法,还有两个可以参考的小技巧:

一是 Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks(https://arxiv.org/pdf/1803.02579.pdf),可以理解为是一种attention,用很少的参数来校准feature map,详情请见论文,但实现细节可参考以下的PyTorch代码:

 
     
class sSE(nn.Module):
    def __init__(self, out_channels):
        super(sSE, self).__init__()
        self.conv = ConvBn2d(in_channels=out_channels,out_channels=1,kernel_size=1,padding=0)
    def forward(self,x):
        x=self.conv(x)
        #print('spatial',x.size())
        x=F.sigmoid(x)
        return x


class cSE(nn.Module):
    def __init__(self, out_channels):
        super(cSE, self).__init__()
        self.conv1 = ConvBn2d(in_channels=out_channels,out_channels=int(out_channels/2),kernel_size=1,padding=0)
        self.conv2 = ConvBn2d(in_channels=int(out_channels/2),out_channels=out_channels,kernel_size=1,padding=0)
    def forward(self,x):
        x=nn.AvgPool2d(x.size()[2:])(x)
        #print('channel',x.size())
        x=self.conv1(x)
        x=F.relu(x)
        x=self.conv2(x)
        x=F.sigmoid(x)
        return x


class Decoder(nn.Module):
    def __init__(self, in_channels, channels, out_channels):
        super(Decoder, self).__init__()
        self.conv1 = ConvBn2d(in_channels, channels, kernel_size=3, padding=1)
        self.conv2 = ConvBn2d(channels, out_channels, kernel_size=3, padding=1)
        self.spatial_gate = sSE(out_channels)
        self.channel_gate = cSE(out_channels)


    def forward(self, x, e=None):
        x = F.upsample(x, scale_factor=2, mode='bilinear', align_corners=True)
        #print('x',x.size())
        #print('e',e.size())
        if e is not None:
            x = torch.cat([x,e],1)


        x = F.relu(self.conv1(x),inplace=True)
        x = F.relu(self.conv2(x),inplace=True)
        #print('x_new',x.size())
        g1 = self.spatial_gate(x)
        #print('g1',g1.size())
        g2 = self.channel_gate(x)
        #print('g2',g2.size())
        x = g1*x + g2*x
        return x

还有一个就是为了进一步鼓励模型在多尺度上的鲁棒性,我们可以引入Hypercolumn去直接把各个scale的feature map concatenate起来:

 
     
f = torch.cat((
            F.upsample(e1,scale_factor= 2, mode='bilinear',align_corners=False),
            d1,
            F.upsample(d2,scale_factor= 2, mode='bilinear',align_corners=False),
            F.upsample(d3,scale_factor= 4, mode='bilinear',align_corners=False),
            F.upsample(d4,scale_factor= 8, mode='bilinear',align_corners=False),
            F.upsample(d5,scale_factor=16, mode='bilinear',align_corners=False),
        ),1)


f = F.dropout2d(f,p=0.50)
logit = self.logit(f)

更神奇的方法就是直接把每个scale的feature map和downsized gt进行比较计算loss,最后各个尺度的loss进行加权平均。详情请见这里的讨论:Deep semi-supervised learning | Kaggle(https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/63715)这里就不再赘述了。

3. Training

其实训练我觉得真的是case by case,在task A上用的heuristics放到task B效果就反而没那么好,所以我就介绍一个大多场合下都能用的trick:Cosine Annealing w. Snapshot Ensemble(https://arxiv.org/abs/1704.00109)

听上去听酷炫的,实际上就是每个一段时间warm restart学习率,这样在单位时间内能得到多个而不是一个converged local minina,做融合的话手上的模型会多很多。放几张图上来感受一下:

e42d98a9e4d5acda7021550a70e1f65e.png

实现的话,其实挺简单的:

 
     
CYCLE=8000
LR_INIT=0.1
LR_MIN=0.001
scheduler = lambda x: ((LR_INIT-LR_MIN)/2)*(np.cos(PI*(np.mod(x-1,CYCLE)/(CYCLE)))+1)+LR_MIN

然后每个batch/epoch去用scheduler(iteration)去更新学习率就可以了

4. 其他的一些小tricks(持续更新)

目前能想到的就是DSB2018 第一名的solution。与其是用mask rcnn去做instance segmentation,他们选择了U-Net生成class probability map+watershed小心翼翼分离离得比较近的instances。最后也是取得了领先第二名一截的成绩。不得不说有时候比起研究模型,研究数据并精炼出关键的insight往往能带来更多的收益......

最后安利一下我自己的repo:liaopeiyuan/ml-arsenal-public(https://github.com/liaopeiyuan/ml-arsenal-public) ,里面会有我所有参与过的Kaggle竞赛的源代码,目前有两个Top 1% solution:TGS Salt和Quick Draw Doodle。欢迎大家提issues/pull requests! XD

本文仅做学术分享,如有侵权,请联系删文。

干货下载与学习

后台回复:巴塞罗自治大学课件,即可下载国外大学沉淀数年3D Vison精品课件

后台回复:计算机视觉书籍,即可下载3D视觉领域经典书籍pdf

后台回复:3D视觉课程,即可学习3D视觉领域精品课程

计算机视觉工坊精品课程官网:3dcver.com

1.面向自动驾驶领域的多传感器数据融合技术

2.面向自动驾驶领域的3D点云目标检测全栈学习路线!(单模态+多模态/数据+代码)
3.彻底搞透视觉三维重建:原理剖析、代码讲解、及优化改进
4.国内首个面向工业级实战的点云处理课程
5.激光-视觉-IMU-GPS融合SLAM算法梳理和代码讲解
6.彻底搞懂视觉-惯性SLAM:基于VINS-Fusion正式开课啦
7.彻底搞懂基于LOAM框架的3D激光SLAM: 源码剖析到算法优化
8.彻底剖析室内、室外激光SLAM关键算法原理、代码和实战(cartographer+LOAM +LIO-SAM)

9.从零搭建一套结构光3D重建系统[理论+源码+实践]

10.单目深度估计方法:算法梳理与代码实现

11.自动驾驶中的深度学习模型部署实战

12.相机模型与标定(单目+双目+鱼眼)

13.重磅!四旋翼飞行器:算法与实战

14.ROS2从入门到精通:理论与实战

15.国内首个3D缺陷检测教程:理论、源码与实战

重磅!计算机视觉工坊-学习交流群已成立

扫码添加小助手微信,可申请加入3D视觉工坊-学术论文写作与投稿 微信交流群,旨在交流顶会、顶刊、SCI、EI等写作与投稿事宜。

同时也可申请加入我们的细分方向交流群,目前主要有ORB-SLAM系列源码学习、3D视觉CV&深度学习SLAM三维重建点云后处理自动驾驶、CV入门、三维测量、VR/AR、3D人脸识别、医疗影像、缺陷检测、行人重识别、目标跟踪、视觉产品落地、视觉竞赛、车牌识别、硬件选型、深度估计、学术交流、求职交流等微信群,请扫描下面微信号加群,备注:”研究方向+学校/公司+昵称“,例如:”3D视觉 + 上海交大 + 静静“。请按照格式备注,否则不予通过。添加成功后会根据研究方向邀请进去相关微信群。原创投稿也请联系。

▲长按加微信群或投稿

▲长按关注公众号

3D视觉从入门到精通知识星球:针对3D视觉领域的视频课程(三维重建系列三维点云系列结构光系列手眼标定相机标定、激光/视觉SLAM、自动驾驶等)、知识点汇总、入门进阶学习路线、最新paper分享、疑问解答五个方面进行深耕,更有各类大厂的算法工程人员进行技术指导。与此同时,星球将联合知名企业发布3D视觉相关算法开发岗位以及项目对接信息,打造成集技术与就业为一体的铁杆粉丝聚集区,近4000星球成员为创造更好的AI世界共同进步,知识星球入口:

学习3D视觉核心技术,扫描查看介绍,3天内无条件退款

 圈里有高质量教程资料、可答疑解惑、助你高效解决问题

觉得有用,麻烦给个赞和在看~

锐单商城拥有海量元器件数据手册IC替代型号,打造电子元器件IC百科大全!

相关文章