【少样本】学习综述：小样本学习研究综述

时间：2022-09-28 20:30:00 jy131变送器

点击下面卡片，关注我每天都给你送来AI技术干货！

来源：知乎—Jy的炼丹炉

地址：https://zhuanlan.zhihu.com/p/389781532(侵删)

随着大数据时代的到来，深度学习模型在图像分类、文本分类等任务中取得了先进成果。但深度学习模型的成功在很大程度上取决于大量的训练数据。在现实世界的真实场景中，有些类别只有少量的数据或少量的标记数据，而标记无标签数据将消耗大量的时间和人力。相反，人类只需要通过少量的数据快速学习。小样本学习(few-shot learning)提出了[2，3]的概念，使机器学习更接近人类思维.

本文将小样本学习分为模型微调、数据增强、迁移学习三类

论文地址：

http://www.jos.org.cn/jos/ch/reader/create_pdf.aspx?file_no=6138&flag=1&journal_id=jos&year_id=2021

学习基于模型微调的小样本

基于模型微调的方法是学习小样本的传统方法。通常在大规模数据上进行预训练模型，在目标小样本数据集上对神经网络模型的全连接层或顶层进行参数微调，获得微调后的模型。(一般来说，当目标数据集与源数据集分布相似时，可以使用)

2018年，ULMFit微调语言模型：该模型分为 3 阶段:(1) (2)语言模型预训练微调语言模型；(3) 分类器微调.该模型的创新点是微调语言模型，以改变学习速度。[1] Howard J, Ruder S. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018.

2019年，另一种微调模式：主要包括以下机器制:(1) 在小样本类别中使用较低的学习率；(2) 在微调阶段使用自适应梯度优化器；3) 当源数据集与目标数据集之间存在较大差异时，可以通过调整整个网络来实现. [2] Nakamura A, Harada T. Revisiting fine-tuning for few-shot learning. arXiv preprint arXiv:1910.00216, 2019.

基于模型微调的方法相对简单，但在真实场景中，目标数据集和源数据集往往不相似，模型微调的方法会导致模型在目标数据集中过度拟合。

学习基于数据增强的小样本

小样本学习的根本问题是样本量过少，导致样本多样性降低。当数据量有限时，可以通过数据增强(data augmentation)为了提高样本的多样性。本文将基于数据增强的方法分为三种基于无标签数据、基于数据合成和基于特征增强的方法。

2.1 基于无标签数据的方法：利用无标签数据扩展小样本数据集。

半监督学习

2016 年,Wang 等人：Wang YX, Hebert M. Learning from small sample sets by combining unsupervised meta-training with CNNs. In: Advances in Neural Information Processing Systems. 2016. 244?252.
2018年，改进MAML 半监督学习：Boney R, Ilin A. Semi-supervised few-shot learning with MAMLl. In: Proc. of the ICLR (Workshop). 2018.
2018年，改进原型网络无标签数据：Ren MY, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803. 00676, 2018.

直推式学习

直接学习可以看作是半监督学习的子问题。直接学习假设未标记的数据是测试数据，以获得最佳的泛化能力。

2019年，转导传播网络(transductive propagation network)Liu Y, Lee J, Park M, et al. Learning to propagate labels: Transductive propagation network for few-shot learning. arXiv preprint arXiv:1805.10002, 2018.
2019年，交叉注意网络：Hou RB, Chang H, Ma BP, et al. Cross attention network for few-shot classification. In: Advances in Neural Information Processing Systems. 2019. 4003?4014.

2.2 基于数据合成的方法

基于数据合成的方法是为小样本类合成新的带标签数据来扩展训练数据

产生对抗网络（GAN）：Mehrotra A, Dukkipati A. Generative adversarial residual pairwise networks for one shot learning. arXiv preprint arXiv:1703.08033, 2017.
表示学习小样本学习：在含有大量数据的源数据集上学习通用的表示模型，之后在少量数据新类别中微调模型。Hariharan B, Girshick R. Low-shot visual recognition by shrinking and hallucinating features. In: Proc. of the IEEE Int’l Conf. on Computer Vision. 2017. 3018?3027.

元学习数据生成：通过数据生成模型生成虚拟数据来扩展样本的多样性，结合元学习方法，通过端到端方法共同训练生成模型和分类算法.Wang YX, Girshick R, Hebert M, et al. Low-shot learning from imaginary data. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 7278?7286.

变分编码器(VAE) GAN：充分利用两者的优势，整合新网络 f-VAEGAN-D2. Xian Y, Sharma S, Schiele B, et al. f-VAEGAN-D2: A feature generating framework for any-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 10275?10284.

元学习：利用元学习插入训练集的图像，形成扩展的支持集 Chen Z, Fu Y, Kim YX, et al. Image deformation meta-networks for one-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 8680?8689.

2.3 增强特征的方法

以上两种方法都是利用辅助数据来增强样本空间。此外，样本的多样性也可以通过增强样本特征空间来提高，因为小样本学习的关键之一是如何获得良好的泛化特征提取器.

2017,AGA模型：学习合成数据的映射，使样本的属性处于预期值或强度. Dixit M, Kwitt R, Niethammer M, et al. AGA: Attribute guided augmentation. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 7455?7463.

特征迁移网络(FATTEN)：描述物体姿势变化引起的运动轨迹变化 Liu B, Wang X, Dixit M, et al. Feature space transfer for data augmentation. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 9090?9098.

Delta 编码器：合成样本通过看到少量样本合成为不可见的类别训练分类器.该模型不仅可以提取类似训练样本之间可转移的类内变形，还可以将这些增量应用到新类别的小样本中，从而有效地合成新样本. Schwartz E, Karlinsky L, Shtok J, et al. Delta-encoder: An effective sample synthesis method for few-shot object recognition. In: Advances in Neural Information Processing Systems. 2018. 2845?2855.

双向网络TriNet：每一类图像在语义空间中都有更丰富的特征，因此通过标签语义空间和图像征空间的相互映射,可以对图像的特征进行增强 Chen Z, Fu Y, Zhang Y, et al. Semantic feature augmentation in few-shot learning. arXiv preprint arXiv:1804.05298, 2018.

对抗特征：提出可以把固定的注意力机制换成不确定的注意力机制 M.输入的图像经提取特征后进行平均池化,分类得到交叉熵损失 l.用 l 对 M 求梯度,得到使 l 最大的更新方向从而更新 M. Shen W, Shi Z, Sun J. Learning from adversarial features for few-shot classification. arXiv preprint arXiv:1903.10225, 2019.

通过梳理基于数据增强的小样本学习模型的研究进展,可以思考未来的两个改进方向.

1) 更好地利用无标注数据:：由于真实世界中存在着大量的无标注数据,不利用这些数据会损失很多信息,更好、更合理地使用无标注数据,是一个非常重要的改进方向.

2) 更好地利用辅助特征：小样本学习中,由于样本量过少导致特征多样性降低.为提高特征多样性,可利用辅助数据集或者辅助属性进行特征增强,从而帮助模型更好地提取特征来提升分类的准确率。

基于迁移学习的小样本学习

迁移学习：利用旧知识来学习新知识,主要目标是将已经学会的知识很快地迁移到一个新的领域中

3.1 度量学习

通过计算待分类样本和已知分类样本之间的距离,找到邻近类别来确定待分类样本的分类结果。基于度量学习方法的通用流程具有两个模块:嵌入模块和度量模块,将样本通过嵌入模块嵌入向量空间,再根据度量模块给出相似度得分.

孪生神经网络(siamese neural network)：孪生神经网络从数据中学习度量,进而利用学习到的度量比较和匹配未知类别的样本,两个孪生神经网络共享一套参数和权重. Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: Proc. of the ICML Deep Learning Workshop. 2015

匹配网络：Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Advances in Neural Information Processing Systems. 2016. 3630−3638.

LSTM+匹配网络：Jiang LB, Zhou XL, Jiang FW, Che L. One-shot learning based on improved matching network. Systems Engineering and Electronics, 2019,41(6):1210−1217

多注意力网络模型：Wang P, Liu L, Shen C, et al. Multi-attention network for one shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2017. 2721−2729.

原型网络（prototypical networks）：Snell J, Swersky K, Zemel RS. Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems. 2017. 4077−4087.

2.1中原型网络+无标签数据

基于人工注意力的原型网络：Gao TY, Han X, Liu ZY, Sun MS. Hybrid attention-based prototypical networks for noisy few-shot relation classification. In: Proc. of the AAAI Conf. on Artificial Intelligence. 2019. 6407−6414.

层次注意力原型网络(HAPN)：Sun SL, Sun QF, Zhou K, Lv TC. Hierarchical attention prototypical networks for few-shot text classification. In: Proc. of the Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). 2019. 476−485.

关系网络：Sung F, Yang Y, Zhang L, et al. Learning to compare: Relation network for few-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 1199−1208.

深度比较网络：Zhang X, Sung F, Qiang Y, et al. Deep comparison: Relation columns for few-shot learning. arXiv preprint arXiv:1811.07100, 2018.

Hilliard N, Phillips L, Howland S, et al. Few-shot learning with metric-agnostic conditional embeddings. arXiv preprint arXiv:1802.04376, 2018.

协方差度量网络(CovaMNet)：Li W, Xu J, Huo J, Wang L, Yang G, Luo J. Distribution consistency based covariance metric networks for few-shot learning. In: Proc. of the AAAI Conf. on Artificial Intelligence. 2019. 8642−8649.

深度最近邻神经网络（DN4）：Li W, Wang L, Xu J, Huo J, Gao Y, Luo J. Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 7260−7268.

Li H, Eigen D, Dodge S , et al. Finding task-relevant features for few-shot learning by category traversal. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 1−10.

3.2 基于元学习的方法

元学习的目的是让模型获得一种学习能力，这种学习能力可以让模型自动学习到一些元知识。元知识指在模型训练过程之外可以学习到的知识，比如模型的超参数、神经网络的初始参数、神经网络的结构和优化器等。

神经图灵机：Graves A, Wayne G, Danihelka I. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.

基于记忆增强的神经网络（MANN）：Santoro A, Bartunov S, Botvinick M, et al. One-shot learning with memory-augmented neural networks. arXiv preprint arXiv: 1605.06065, 2016.

元网络：Munkhdalai T, Yu H. Meta networks. International Conference on Machine Learning. In: Proc. of the PMLR. 2017. 2554−2563.

未知模型的元学习方法（MAML）：Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proc. of the 34th Int’l Conf. on Machine Learning, Vol.70. 2017. 1126−1135.

未知任务元学习法（TAML）：Jamal MA, Qi GJ. Task agnostic meta-learning for few-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 11719−11727.

基于注意力机制的未知任务元学习法（ATAML）：Xiang J, Havaei M, Chartrand G, et al. On the importance of attention in meta-learning for few-shot text classification. arXiv preprint arXiv:1806.00852, 2018.

MAML改进：Sun Q, Liu Y, Chua TS, et al. Meta-transfer learning for few-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 403−412.

MAML改进：Liu Y, Sun Q, Liu AA, et al. LCC: Learning to customize and combine neural networks for few-shot learning. arXiv preprint arXiv:1904.08479, 2019.

任务感知特征嵌入网络(TAFE-Net)：Wang X, Yu F, Wang R, et al. TAFE-Net: Task-aware feature embeddings for low shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 1831−1840.

利用优化器的元学习模型：Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: Proc. of the ICLR. 2016.

基于注意力机制的权重生成器：Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2018. 4367−4375.

多任务聚类的元学习法：Yu M, Guo X, Yi J, et al. Diverse few-shot text classification with multiple metrics. In: Proc. of the NAACL-HLT. 2018. 1206−1215.

3.3 基于图神经网络的方法

图神经网络是一种基于深度学习的处理图领域信息的模型,由于其较好的性能和可解释性,它最近已成为一种广泛应用的图分析方法。

GNN：Garcia V, Bruna J. Few-shot learning with graph neural networks. In: Proc. of the Int’l Conf. on Learning Representations. 2018.

EGNN：Kim J, Kim T, Kim S, et al. Edge-labeling graph neural network for few-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 11−20.

GNN+DAE：Gidaris S, Komodakis N. Generating classification weights with GNN denoisingautoencoders for few-shot learning. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. 2019. 21−30.

数据集与实验

(1) Omniglot 包含50 个字母的1623 个手写字符,每一个字符都是由20 个不同的人通过亚马逊的Mechanical Turk在线绘制的.

(2) miniImageNet 是从ImageNet分割得到的,是ImageNet 的一个精缩版本,包含ImageNet 的100 个类别,每个类别含有600 个图像.一般64 类用于训练,16 类用于验证,20 类用于测试.

(3) tieredImageNet 是Mengye 等人在2018 年提出的新数据集,也是ImageNet 的子集.与miniImageNet不同的是,tieredImageNet 中类别更多,有608 种.

(4) CUB(caltech-UCSD birds)是一个鸟类图像数据集,包含200 种鸟类,共计11788 张图像.一般130 类用于训练,20 类用于验证,50 类用于测试.

(5) CIFAR-100 数据集:共100 个类,每个类包含600 个图像,分别包括500 个训练图像和100 个测试图像.CIFAR-100 中的100 个子类所属于20 个父类,每个图像都带有一个子类标签和一个父类标签.

(6) Stanford Dogs:一般用于细粒度图像分类任务.包括120 类狗的样本共计20580 个图像,一般70 类用于训练,20 类用于验证,30 类用于测试.

(7) Stanford Cars:一般用于细粒度图像分类任务.包括196 类车的样本共计16185 个图像,一般130 类用于训练,17 类用于验证,49 类用于测试.

对比总结

投稿或交流学习，备注：昵称-学校（公司）-方向，进入DL&NLP交流群。

方向有很多：机器学习、深度学习，python，情感分析、意见挖掘、句法分析、机器翻译、人机对话、知识图谱、语音识别等。

记得备注呦

点击上面卡片，关注我呀，每天推送AI技术干货~

整理不易，还望给个在看！

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

【少样本】学习综述：小样本学习研究综述

3.2 基于元学习的方法

3.3 基于图神经网络的方法

相关文章