PyTorch Python API详解大全（持续更新ing...）

时间：2023-08-07 22:07:00 6431连接器

诸神沉默不语-个人CSDN博文目录

具体内容以官方文件为准。
最早更新时间：2021.4.23
最新更新时间：2022年.6.30

文章目录

0. 参与函数的统一解释通常用于参与函数的统一解释
1. torch
- 1.1 Tensors
- - 1.1.1 Creation Ops
  - 1.1.2 Indexing, Slicing, Joining, Mutating Ops
- 1.2 Generators
- 1.3 Random Sampling
- - 1.3.1 torch.default_generator
  - 1.3.2 In-place random sampling
  - 1.3.3 Quasi-random sampling
- 1.4 Serialization
- 1.5 Parallelism
- 1.6 Locally disabling gradient computation
- 1.7 Math operations
- - 1.7.1 Pointwise Ops
  - 1.7.2 Reduction Ops
  - 1.7.3 Comparison Ops
  - 1.7.4 Spectral Ops
  - 1.7.5 Other Operations
  - 1.7.6 BLAS and LAPACK Operations
- 1.8 Utilities
2. torch.nn
- 2.1 Containers
- 2.2 Convolution Layers
- 2.3 Pooling layers
- 2.4 Padding Layers
- 2.5 Non-linear Activations (weighted sum, nonlinearity)
- 2.6 Non-linear Activations (other)
- 2.7 Normalization Layers
- 2.8 Recurrent Layers
- 2.9 Transformer Layers
- 2.10 Linear Layers
- 2.11 Dropout Layers
- 2.12 Sparse Layers
3. torch.nn.functional
- 3.1 Convolution functions
- 3.2 Pooling functions
- 3.3 Non-linear activation functions
- 3.4 Linear functions
- 3.5 Dropout functions
4. torch.Tensor
5. Tensor Attributes
- 5.1 `torch.dtype`
- 5.2 `torch.device`
- 5.3 `torch.layout`
- 5.4 `torch.memory_format`
- 5.5 其他文档中未写的属性
6. Tensor Views
7. torch.autograd
- 7.1 Functional higher level API
- 7.2 Locally disabling gradient computation
- 7.3 Default gradient layouts
- 7.4 In-place operations on Tensors
- 7.5 Variable (deprecated)
- 7.6 Tensor autograd functions
- 7.7 Function
- 7.8 Context method mixins
- 7.9 Numerical gradient checking
- 7.10 Profiler
- 7.11 Anomaly detection
- 7.12 Saved tensors default hooks
8. torch.cuda
- 8.1 Random Number Generator
9. torch.cuda.amp
10. torch.backends
- 10.1 torch.backends.cuda
- 10.2 torch.backends.cudnn
11. torch.distributed
12. torch.distributions
13. torch.fft
14. torch.futures
15. torch.fx
16. torch.hub
17. torch.jit
18. torch.linalg
19. torch.overrides
20. torch.profiler
21. torch.nn.init
22. torch.onnx
22. torch.optim
- 22.1 How to use an optimizer
- 22.2 Algorithms
23. Complex Numbers
24. DDP Communication Hooks
25. Pipeline Parallelism
26. Quantization
27. Distributed RPC Framework
28. torch.random
29. torch.sparse
30. torch.Storage
31. torch.utils.benchmark
32. torch.utils.bottleneck
33. torch.utils.checkpoint
34. torch.utils.cpp_extension
35. torch.utils.data
36. 其他文本和尾注中未提及的参考资料

0. 常用入参及函数统一解释

函数常见入参
1. input：Tensor格式
2. requires_grad：布尔值，aotugrad这里是否需要记录Tensor上的操作
3. size：一般来说，测量尺寸的数据可以是多个数字或collection格式（如list或tuple等）
4. device：Tensor所处的设备（cuda或CPU），可以用torch.device（见5.2部分)或直接使用字符串和数值（torch.device替代。
  使用torch.device以入参为例：torch.randn((2,3), device=torch.device('cuda:1'))
  以字符串直接作为入参的示例：torch.randn((2,3), device='cuda:1')
  以数值直接作为入参示例：torch.rand((2,3), device=1)
函数名前加_是原地操作
Parameters是可以直接按照顺序放的，Keyword Arguments则必须指定参数名（用*作为区分）

1. torch

1.1 Tensors

is_tensor(obj) 如果obj是Tensor，就返回True
注意：官方建议使用 isinstance(obj, Tensor) 作为代替

1.1.1 Creation Ops

注意：通过随机取样生成Tensor的函数放在了Random sampling部分。

tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False)
将data转换为Tensor。data可以是list, tuple, NumPy ndarray, scalar等呈现数组形式的数据
from_numpy(ndarray)
将一个numpy.ndarray转换为Tensor。注意这一函数的两个数据对象占用同一储存空间，修改后变化也会体现在另一对象上
zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor，所有元素都为0
ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor，所有元素都为1
ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)
返回一个与input有相同尺寸的Tensor，所有元素都为1
arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
示例：

>>> torch.arange(5)
tensor([ 0,  1,  2,  3,  4])
>>> torch.arange(1, 4)
tensor([ 1,  2,  3])
>>> torch.arange(1, 2.5, 0.5)
tensor([ 1.0000,  1.5000,  2.0000])

1.1.2 Indexing, Slicing, Joining, Mutating Ops

cat(tensors, dim=0, *, out=None)
串接tensors（一串Tensor，非空Tensor在非dim维度必须形状相同），返回结果
reshape(input, shape)
示例：

>>> a = torch.arange(4.)
>>> torch.reshape(a, (2, 2))
tensor([[ 0.,  1.],
        [ 2.,  3.]])
>>> b = torch.tensor([[0, 1], [2, 3]])
>>> torch.reshape(b, (-1,))
tensor([ 0,  1,  2,  3])

squeeze(input, dim=None, *, out=None)
去掉input（Tensor）中长度为1的维度，返回这个Tensor。如果有dim就只对指定维度进行squeeze操作。
返回值与input共享储存空间。
示例代码：

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])

stack(tensors, dim=0, *, out=None)
连接tensors（一串形状相同的Tensor），返回结果
t(input)
零维和一维input不变，二维input转置，返回结果
transpose(input, dim0, dim1)
返回input转置的Tensor，dim0和dim1交换。
返回值与input共享储存空间。
示例代码：

>>> x = torch.randn(2, 3)
>>> x
tensor([[ 1.0028, -0.9893,  0.5809],
        [-0.1669,  0.7299,  0.4942]])
>>> torch.transpose(x, 0, 1)
tensor([[ 1.0028, -0.1669],
        [-0.9893,  0.7299],
        [ 0.5809,  0.4942]])

unsqueeze(input, dim)
在input指定维度插入一个长度为1的维度，返回Tensor
示例代码：

>>> x = torch.tensor([1, 2, 3, 4])
>>> torch.unsqueeze(x, 0)
tensor([[ 1,  2,  3,  4]])
>>> torch.unsqueeze(x, 1)
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])

nonzero(input, *, out=None, as_tuple=False)

①as_tuple=False：返回一个二维Tensor，每一行是一个input非零元素的索引
示例代码：

>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
tensor([[ 0],
        [ 1],
        [ 2],
        [ 4]])
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
...                             [0.0, 0.4, 0.0, 0.0],
...                             [0.0, 0.0, 1.2, 0.0],
...                             [0.0, 0.0, 0.0,-0.4]]))
tensor([[ 0,  0],
        [ 1,  1],
        [ 2,  2],
        [ 3,  3]])

②as_tuple=True：返回一个由一维索引Tensor组成的tuple（每个元素是一个维度上的索引）
示例代码：

>>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]), as_tuple=True)
(tensor([0, 1, 2, 4]),)
>>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
...                             [0.0, 0.4, 0.0, 0.0],
...                             [0.0, 0.0, 1.2, 0.0],
...                             [0.0, 0.0, 0.0,-0.4]]), as_tuple=True)
(tensor([0, 1, 2, 3]), tensor([0, 1, 2, 3]))
>>> torch.nonzero(torch.tensor(5), as_tuple=True)
(tensor([0]),)

where()
1. where(condition) 和 torch.nonzero(condition, as_tuple=True) 相同

1.2 Generators

1.3 Random Sampling

manual_seed(seed)
randperm(n, *, generator=None, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False, pin_memory=False)：返回 0 - n-1 整数的一个随机permutation
示例：

>>> torch.randperm(4)
tensor([2, 1, 0, 3])

1.3.1 torch.default_generator

返回默认的CPU torch.Generator

rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)
返回一个尺寸为size的Tensor，所有元素通过[0,1)的均匀分布采样生成
rand_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)
返回一个跟input有相同尺寸的Tensor，所有元素通过[0,1)的均匀分布采样生成

1.3.2 In-place random sampling

1.3.3 Quasi-random sampling

1.4 Serialization

save(obj, f, pickle_module=pickle, pickle_protocol=2, _use_new_zipfile_serialization=True)
load(f, map_location=None, pickle_module=pickle, **pickle_load_args)

1.5 Parallelism

1.6 Locally disabling gradient computation

1.7 Math operations

1.7.1 Pointwise Ops

add()返回结果Tensor
1. add(input, other, *, out=None)
  other是标量，对input每个元素加上other
2. add(input, other, *, alpha=1, out=None)
  other是Tensor，other先逐元素乘标量alpha再逐元素加input
mul(input, other, *, out=None)
若other是标量：对input每个元素乘以other
若other是Tensor：input和other逐元素相乘
返回结果Tensor
tanh(input, *, out=None)
对input逐元素做tanh运算。返回Tensor

1.7.2 Reduction Ops

max()
1. max(input)
2. max(input, dim, keepdim=False, *, out=None)
3. max(input, other, *, out=None) 见1.7.3 maximum()
sum(input, *, dtype=None)
返回input（Tensor）中所有元素的加和，返回Tensor
dtype是期望返回值的dtype
mean(input)
返回input（Tensor）中所有元素的平均值，返回Tensor

1.7.3 Comparison Ops

maximum(input, other, *, out=None)
逐元素计算input和other中较大的元素

1.7.4 Spectral Ops

1.7.5 Other Operations

flatten(input, start_dim=0, end_dim=- 1)
示例：

>>> t = torch.tensor([[[1, 2],
...                    [3, 4]],
...                   [[5, 6],
...                    [7, 8]]])
>>> torch.flatten(t)
tensor([1, 2, 3, 4, 5, 6, 7, 8])
>>> torch.flatten(t, start_dim=1)
tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])

1.7.6 BLAS and LAPACK Operations

BLAS简介
LAPACK

matmul(input, other, *, out=None)
对input和other两个Tensor做矩阵乘法

1.8 Utilities

2. torch.nn

2.1 Containers

Module
所有神经网络单元的基本类，神经网络模型应当是Module的子类。可以在Module对象里面放Module对象（以树形结构存储），在__init__方法中将这些子Module定义为属性即可
1. eval()
  将Module设置为evaluation mode，相当于 self.train(False)。
2. parameters(recurse=True)
  返回Module参数（一堆Tensor）的迭代器，一般都是用来传入优化器的
3. train(mode=True)
  如果入参为True，则将Module设置为training mode，training随之变为True；反之则设置为evaluation mode，training为False。
4. zero_grad(set_to_none=False)
  设置所有模型参数的梯度为0，类似于21.2 优化器的zero_grad()
Sequential(*args)
顺序容器。Module就按照被传入构造器的顺序添加。也可以传入ordered dict
示例代码：

# Example of using Sequential
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )

# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

ModuleList(modules=None)
以类似list的形式储存submodules。可以像标准list一样切片，但被包含的modules会自动注册，且对所有Module方法都是可见的。
示例代码：

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

MyModule就是有10层线性网络的神经网络模型了。

2.2 Convolution Layers

class Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=‘zeros’)
在输入信号（由几个平面图像构成）上应用2维卷积

2.3 Pooling layers

2.4 Padding Layers

2.5 Non-linear Activations (weighted sum, nonlinearity)

class ReLU(inplace=False)
逐元素应用修正线性单元（ReLU： $ReLU(x)=(x) ^+ =max(0,x)$ ）

2.6 Non-linear Activations (other)

class LogSoftmax(dim=None)

2.7 Normalization Layers

class BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
batch normalization¹

2.8 Recurrent Layers

2.9 Transformer Layers

2.10 Linear Layers

class Linear(in_features, out_features, bias=True)
对输入信号进行一个线性转换： $y = xA^T + b$

2.11 Dropout Layers

class torch.nn.Dropout²(p=0.5, inplace=False)
在训练过程中，随机将input tensor以概率为p的伯努利分布置0。每一次forward call独立。
这一方法被证明对正则化和防止co-adaptation of neurons（我还不知道这是啥意思）有效，文献：Improving neural networks by preventing co-adaptation of feature detectors
此外，训练时输出会乘以 $\frac{1}{1-p}$ ，则在评估模型时直接输出结果即可。

2.12 Sparse Layers

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)
embedding词典。相当于一个大矩阵，每一行存储一个word的embedding。Embedding.weight是这个矩阵的值（Tensor），weight.data可以改变该值。
输入是索引的列表（IntTensor或LongTensor），输出是对应的词嵌入（尺寸为 (input尺寸,embedding_dim) ）。
num_embeddings是词典长度（int）。
embedding_dim是表示向量维度（int）。
1. weight：尺寸为 (num_embeddings, embedding_dim) ，从 $\mathcal{N}(0,1)$ 中初始化数据。

示例代码：

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

PyTorch Python API详解大全（持续更新ing...）

文章目录

相关文章