RTP Payload Format for High Efficiency Video Coding (HEVC)

时间：2022-12-17 01:00:00 fci连接器工

版权声明：未经许可不得转载。转载前请联系作者（hello@yeshen.org）

This memo describes an RTP payload format for the video coding standard ITU-T Recommendation H.265 and ISO/IEC International Standard 23008-2, both also known as High Efficiency Video Coding (HEVC) and developed by the Joint Collaborative Team on Video Coding (JCT-VC). The RTP payload format allows for packetization of one or more Network Abstraction Layer (NAL) units in each RTP packet payload as well as fragmentation of a NAL unit into multiple RTP packets. Furthermore, it supports transmission of an HEVC bitstream over a single stream as well as multiple RTP streams. When multiple RTP streams are used, a single transport or multiple transports may be utilized. The payload format has wide applicability in videoconferencing, Internet video streaming, and high-bitrate entertainment-quality video, among others.

本备忘录描述了视频编码标准ITU-T H.265建议书和ISO / IEC国际标准23008-2的RTP这两种格式也被称为高效视频编码（HEVC），由视频编码联合合作组开发。（JCT-VC）。 RTP有效载荷格式允许每个有效载荷格式RTP对一个或多个网络抽象层进行分组有效载荷（NAL）单组和分组单元NAL单元分成多个部分RTP分组。此外，它支持单流和多流RTP流上传输HEVC比特流。当使用多个RTP流时，可以使用单个传输或多个传输。有效载荷格式广泛应用于视频会议、互联网视频流和高比例娱乐质量视频。

1 Introduction

The High Efficiency Video Coding specification, formally published as both ITU-T Recommendation H.265 [HEVC] and ISO/IEC International Standard 23008-2 [ISO23008-2], was ratified by the ITU-T in April 2013; reportedly, it provides significant coding efficiency gains over H.264 [H.264].
This memo describes an RTP payload format for HEVC. It shares its basic design with the RTP payload formats of [RFC6184] and [RFC6190]. With respect to design philosophy, security, congestion control, and overall implementation complexity, it has similar properties to those earlier payload format specifications. This is a conscious choice, as at least RFC 6184 is widely deployed and generally known in the relevant implementer communities. Mechanisms from RFC 6190 were incorporated as HEVC version 1 supports temporal scalability.
In order to help the overlapping implementer community, frequently only the differences between RFCs 6184 and 6190 and the HEVC payload format are highlighted in non-normative, explanatory parts of this memo. Basic familiarity with both specifications is assumed for those parts. However, the normative parts of this memo do not require study of RFCs 6184 or 6190.

2013年4月，ITU-T批准高效视频编码规范，正式作为ITU-T H.265建议书[HEVC]和ISO / IEC国际标准23008-2 [ISO23008-2]发布；据报道，它提供了超过H.264 [H.264]编码效率增益显著。
描述了备忘录HEVC的RTP负荷格式有效。它与[RFC6184]和[RFC6190]的RTP共享其基本设计的有效载荷格式。其设计原理、安全性、拥塞控制和整体复杂性与早期有效载荷格式规范相似。这是一个有意识的选择，因为至少RFC 6184在相关实施者社区被广泛部署和已知。由于HEVC版本1支持时间可伸缩性，因此并入RFC 6190的机制。
为了帮助重叠的实施者社区，通常只有RFC 6184和6190与HEVC有效载荷格式之间的差异显示在备忘录的非标准化和解释性部分中。假设这两个规范基本熟悉这些部件。然而，本备忘录的规范部分不需要研究RFC 6184或6190。

1.1. Overview of the HEVC Codec
H.264 and HEVC share a similar hybrid video codec design. In this memo, we provide a very brief overview of those features of HEVC that are, in some form, addressed by the payload format specified herein. Implementers have to read, understand, and apply the ITU-T/ISO/IEC specifications pertaining to HEVC to arrive at interoperable, well- performing implementations. Implementers should consider testing their design (including the interworking between the payload format implementation and the core video codec) using the tools provided by ITU-T/ISO/IEC, for example, conformance bitstreams as specified in [H.265.1]. Not doing so has historically led to systems that perform badly and that are not secure.
Conceptually, both H.264 and HEVC include a Video Coding Layer (VCL), which is often used to refer to the coding-tool features, and a Network Abstraction Layer (NAL), which is often used to refer to the systems and transport interface aspects of the codecs.

H.264和HEVC共享类似的混合视频编解码器设计。在本备忘录中，我们简要概述了HEVC本文指定的有效载荷格式解决了这些功能。实施者必须阅读、理解、应用和应用HEVC相关的ITU-T / ISO / IEC规范，实现可互操作，性能好。实施者应考虑使用ITU-T / ISO / IEC提供的工具测试其设计（包括有效载荷格式与核心视频编解码器之间的交换），例如[H.265.1]中规定的一致性比特流。历史上没有这样做会导致系统性能差和不安全。
从概念上讲，H.264和HEVC都包括视频编码层（VCL）网络抽象层（NAL），后者通常用于指代码工具的功能，NAL通常用于指代系统和传输编解码器的接口。

1.1.1. Coding-Tool Features

Similar to earlier hybrid-video-coding-based standards, including H.264, the following basic video coding design is employed by HEVC. A prediction signal is first formed by either intra- or motion- compensated prediction, and the residual (the difference between the original and the prediction) is then coded. The gains in coding efficiency are achieved by redesigning and improvng almost all parts of the codec over earlier designs. In addition, HEVC includes several tools to make the implementation on parallel architectures easier. Below is a summary of HEVC coding-tool features.
Quad-tree block and transform structure
One of the major tools that contributes significantly to the coding efficiency of HEVC is the use of flexible coding blocks and transforms, which are defined in a hierarchical quad-tree manner. Unlike H.264, where the basic coding block is a macroblock of fixed- size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size of 64x64. Each CTU can be divided into smaller units in a hierarchical quad-tree manner and can represent smaller blocks down to size 4x4. Similarly, the transforms used in HEVC can have different sizes, starting from 4x4 and going up to 32x32. Utilizing large blocks and transforms contributes to the major gain of HEVC, especially at high resolutions.

与早期基于混合视频编码的标准（包括H.264）类似，HEVC采用以下基本视频编码设计。首先通过帧内或运动补偿预测形成预测信号，然后编码残差（原始和预测之间的差）。通过在早期设计中重新设计和改进编解码器的几乎所有部分，可以实现编码效率的提高。此外，HEVC包括几个工具，可以更轻松地在并行体系结构上实现。以下是HEVC编码工具功能的摘要。
四叉树块和变换结构
对HEVC的编码效率有显着贡献的主要工具之一是使用灵活的编码块和变换，其以分层四叉树方式定义。与H.264不同，其中基本编码块是固定大小为16x16的宏块，HEVC定义了最大大小为64x64的编码树单元（CTU）。每个CTU可以以分层四叉树方式划分为较小的单元，并且可以表示小到4x4的较小块。类似地，HEVC中使用的变换可以具有不同的大小，从4x4开始到32x32。利用大块和变换有助于HEVC的主要增益，特别是在高分辨率下。

Entropy coding

HEVC uses a single entropy-coding engine, which is based on Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], whereas H.264 uses two distinct entropy coding engines. CABAC in HEVC shares many similarities with CABAC of H.264, but contains several improvements. Those include improvements in coding efficiency and lowered implementation complexity, especially for parallel architectures.

HEVC使用单个熵编码引擎，其基于上下文自适应二进制算术编码（CABAC）[CABAC]，而H.264使用两个不同的熵编码引擎。 HEVC中的CABAC与H.264的CABAC有许多相似之处，但包含一些改进。其中包括提高编码效率和降低实现复杂性，尤其是对于并行架构。

In-loop filtering

H.264 includes an in-loop adaptive deblocking filter, where the blocking artifacts around the transform edges in the reconstructed picture are smoothed to improve the picture quality and compression efficiency. In HEVC, a similar deblocking filter is employed but with somewhat lower complexity. In addition, pictures undergo a subsequent filtering operation called Sample Adaptive Offset (SAO), which is a new design element in HEVC. SAO basically adds a pixel- level offset in an adaptive manner and usually acts as a de-ringing filter. It is observed that SAO improves the picture quality, especially around sharp edges, contributing substantially to visual quality improvements of HEVC.

H.264包括环路内自适应去块滤波器，其中对重建图像中的变换边缘周围的块效应进行平滑以改善图像质量和压缩效率。在HEVC中，采用类似的去块滤波器，但复杂度稍低。此外，图片经历称为样本自适应偏移（SAO）的后续过滤操作，其是HEVC中的新设计元素。 SAO基本上以自适应方式添加像素级偏移，并且通常用作去振铃滤波器。据观察，SAO改善了图像质量，特别是在锐边附近，有助于HEVC的视觉质量改善。

Motion prediction and coding

There have been a number of improvements in this area that are summarized as follows. The first category is motion merge and Advanced Motion Vector Prediction (AMVP) modes. The motion information of a prediction block can be inferred from the spatially or temporally neighboring blocks. This is similar to the DIRECT mode in H.264 but includes new aspects to incorporate the flexible quad- tree structure and methods to improve the parallel implementations. In addition, the motion vector predictor can be signaled for improved efficiency. The second category is high-precision interpolation. The interpolation filter length is increased to 8-tap from 6-tap, which improves the coding efficiency but also comes with increased complexity. In addition, the interpolation filter is defined with higher precision without any intermediate rounding operations to further improve the coding efficiency.

该领域已经有许多改进，总结如下。第一类是运动合并和高级运动矢量预测（AMVP）模式。可以从空间或时间上相邻的块推断预测块的运动信息。这类似于H.264中的DIRECT模式，但包括结合灵活四叉树结构的新方面和改进并行实现的方法。另外，可以用信号通知运动矢量预测器以提高效率。第二类是高精度插值。插值滤波器长度从6抽头增加到8抽头，这提高了编码效率，但也增加了复杂性。此外，插值滤波器以更高的精度定义，无需任何中间舍入操作，以进一步提高编码效率。

Intra prediction and intra-coding
Compared to 8 intra prediction modes in H.264, HEVC supports angular intra prediction with 33 directions. This increased flexibility improves both objective coding efficiency and visual quality as the edges can be better predicted and ringing artifacts around the edges can be reduced. In addition, the reference samples are adaptively smoothed based on the prediction direction. To avoid contouring artifacts a new interpolative prediction generation is included to improve the visual quality. Furthermore, Discrete Sine Transform (DST) is utilized instead of traditional Discrete Cosine Transform (DCT) for 4x4 intra-transform blocks.

与H.264中的8个帧内预测模式相比，HEVC支持具有33个方向的角度帧内预测。这种增加的灵活性改善了客观编码效率和视觉质量，因为可以更好地预测边缘并且可以减少边缘周围的振铃伪影。另外，基于预测方向自适应地平滑参考样本。为了避免轮廓伪影，包括新的插值预测生成以改善视觉质量。此外，利用离散正弦变换（DST）代替传统的离散余弦变换（DCT）用于4×4帧内变换块。

Other coding-tool features
HEVC includes some tools for lossless coding and efficient screen- content coding, such as skipping the transform for certain blocks. These tools are particularly useful, for example, when streaming the user interface of a mobile device to a large display.

HEVC包括一些用于无损编码和有效屏幕内容编码的工具，例如跳过某些块的变换。例如，当将移动设备的用户界面流式传输到大型显示器时，这些工具特别有用。

1.1.2. Systems and Transport Interfaces

HEVC inherited the basic systems and transport interfaces designs from H.264. These include the NAL-unit-based syntax structure, the hierarchical syntax and data unit structure, the Supplemental Enhancement Information (SEI) message mechanism, and the video buffering model based on the Hypothetical Reference Decoder (HRD). The hierarchical syntax and data unit structure consists of sequence- level parameter sets, multi-picture-level or picture-level parameter sets, slice-level header parameters, and lower-level parameters. In the following, a list of differences in these aspects compared to H.264 is summarized.

HEVC继承了H.264的基本系统和传输接口设计。这些包括基于NAL单元的语法结构，分层语法和数据单元结构，补充增强信息（SEI）消息机制，以及基于假设参考解码器（HRD）的视频缓冲模型。分层语法和数据单元结构由序列级参数集，多图片级或图片级参数集，切片级报头参数和较低级参数组成。在下文中，总结了与H.264相比这些方面的差异列表。

Video parameter set(VPS)

A new type of parameter set, called Video Parameter Set (VPS), was introduced. For the first (2013) version of [HEVC], the VPS NAL unit is required to be available prior to its activation, while the information contained in the VPS is not necessary for operation of the decoding process. For future HEVC extensions, such as the 3D or scalable extensions, the VPS is expected to include information necessary for operation of the decoding process, e.g., decoding dependency or information for reference picture set construction of enhancement layers. The VPS provides a “big picture” of a bitstream, including what types of operation points are provided, the profile, tier, and level of the operation points, and some other high-level properties of the bitstream that can be used as the basis for session negotiation and content selection, etc. (see Section 7.1).
Profile, tier, and level
The profile, tier, and level syntax structure that can be included in both the VPS and Sequence Parameter Set (SPS) includes 12 bytes of data to describe the entire bitstream (including all temporally scalable layers, which are referred to as sub-layers in the HEVC specification), and can optionally include more profile, tier, and level information pertaining to individual temporally scalable layers. The profile indicator shows the “best viewed as” profile when the bitstream conforms to multiple profiles, similar to the major brand concept in the ISO Base Media File Format (ISOBMFF) [IS014496-12] [IS015444-12] and file formats derived based on ISOBMFF, such as the 3GPP file format [3GPPFF]. The profile, tier, and level syntax structure also includes indications such as 1) whether the bitstream is free of frame-packed content, 2) whether the bitstream is free of interlaced source content, and 3) whether the bitstream is free of field pictures. When the answer is yes for both 2) and 3), the bitstream contains only frame pictures of progressive source. Based on these indications, clients/players without support of post-processing functionalities for the handling of frame-packed, interlaced source content or field pictures can reject those bitstreams that contain such pictures.

引入了一种新的参数集，称为视频参数集（VPS）。对于[HEVC]的第一个（2013）版本，要求VPS NAL单元在其激活之前可用，而VPS中包含的信息对于解码过程的操作不是必需的。对于未来的HEVC扩展，例如3D或可伸缩扩展，期望VPS包括解码过程的操作所必需的信息，例如解码依赖性或用于增强层的参考图片集构造的信息。 VPS提供比特流的“大图”，包括提供的操作点类型，操作点的配置文件，层和级别，以及可用作基础的比特流的一些其他高级属性用于会话协商和内容选择等（参见第7.1节）。
配置文件，层级和级别可以包括在VPS和序列参数集（SPS）中的简档，层和级语法结构包括12个字节的数据来描述整个比特流（包括所有时间上可伸缩的层，其被称为子层中的子层） HEVC规范），并且可以可选地包括关于各个时间可伸缩层的更多简档，层和级别信息。当比特流符合多个配置文件时，配置文件指示符显示“最佳查看”配置文件，类似于ISO基本媒体文件格式（ISOBMFF）[IS014496-12] [IS015444-12]中的主要品牌概念和基于派生的文件格式在ISOBMFF上，例如3GPP文件格式[3GPPFF]。简档，层和级语法结构还包括诸如1）比特流是否没有帧打包内容，2）比特流是否没有隔行扫描的源内容，以及3）比特流是否没有场图像的指示。当2）和3）的答案都是肯定时，比特流仅包含渐进源的帧图像。基于这些指示，不支持处理帧封装，隔行扫描源内容或场图像的后处理功能的客户端/播放器可以拒绝包含这些图像的那些比特流。

Bitstream and elementary stream

HEVC includes a definition of an elementary stream, which is new compared to H.264. An elementary stream consists of a sequence of one or more bitstreams. An elementary stream that consists of two or more bitstreams has typically been formed by splicing together two or more bitstreams (or parts thereof). When an elementary stream contains more than one bitstream, the last NAL unit of the last access unit of a bitstream (except the last bitstream in the elementary stream) must contain an end of bitstream NAL unit, and the first access unit of the subsequent bitstream must be an Intra-Random Access Point (IRAP) access unit. This IRAP access unit may be a Clean Random Access (CRA), Broken Link Access (BLA), or Instantaneous Decoding Refresh (IDR) access unit.

HEVC包括基本流的定义，与H.264相比是新的。基本流由一个或多个比特流的序列组成。通常通过将两个或更多个比特流（或其部分）拼接在一起来形成由两个或更多个比特流组成的基本流。当基本流包含多于一个比特流时，比特流的最后一个访问单元的最后一个NAL单元（基本流中的最后一个比特流除外）必须包含比特流NAL单元的一端，以及后续比特流的第一个访问单元必须是随机内接入点（IRAP）访问单元。该IRAP访问单元可以是清洁随机访问（CRA），断链接入（BLA）或瞬时解码刷新（IDR）访问单元。

Random access support

HEVC includes signaling in the NAL unit header, through NAL unit types, of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, namely IDR, CRA, and BLA pictures, are supported: IDR pictures are conventionally referred to as closed group-of-pictures (closed-GOP) random access points whereas CRA and BLA pictures are conventionally referred to as open-GOP random access points. BLA pictures usually originate from splicing of two bitstreams or part thereof at a CRA picture, e.g., during stream switching. To enable better systems usage of IRAP pictures, altogether six different NAL units are defined to signal the properties of the IRAP pictures, which can be used to better match the stream access point types as defined in the ISOBMFF [IS014496-12] [IS015444-12], which are utilized for random access support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. Pictures following an IRAP picture in decoding order and preceding the IRAP picture in output order are referred to as leading pictures associated with the IRAP picture. There are two types of leading pictures: Random Access Decodable Leading (RADL) pictures and Random Access Skipped Leading (RASL) pictures. RADL pictures are decodable when the decoding started at the associated IRAP picture; RASL pictures are not decodable when the decoding started at the associated IRAP picture and are usually discarded. HEVC provides mechanisms to enable specifying the conformance of a bitstream wherein the originally present RASL pictures have been discarded. Consequently, system components can discard RASL pictures, when needed, without worrying about causing the bitstream to become non-compliant.

HEVC包括在NAL单元头中通过NAL单元类型在IDR图片之外的IRAP图片中的信令。支持三种类型的IRAP图像，即IDR，CRA和BLA图像：IDR图像通常被称为封闭图像组（闭合GOP）随机访问点，而CRA和BLA图像通常被称为开放图像。 -GOP随机访问点。 BLA图像通常源自在CRA图像处的两个比特流或其部分的拼接，例如，在流切换期间。为了能够更好地使用IRAP图像，总共定义了六个不同的NAL单元来发信号通知IRAP图像的属性，这些单元可用于更好地匹配ISOBMFF中定义的流接入点类型[IS014496-12] [IS015444- 12]，用于3GP-DASH [3GPDASH]和MPEG DASH [MPEGDASH]中的随机接入支持。按照解码顺序在IRAP图片之后并且在输出顺序中在IRAP图片之前的图片被称为与IRAP图片相关联的前导图片。有两种类型的前导图像：随机访问可解码前导（RADL）图像和随机访问跳过前导（RASL）图像。当解码开始于相关的IRAP图像时，RADL图像是可解码的;当解码在相关的IRAP图像处开始并且通常被丢弃时，RASL图像不可解码。 HEVC提供了能够指定比特流的一致性的机制，其中最初存在的RASL图像已被丢弃。因此，系统组件可以在需要时丢弃RASL图片，而不用担心导致比特流变得不合规。

Temporal scalability support

HEVC includes an improved support of temporal scalability, by inclusion of the signaling of TemporalId in the NAL unit header, the restriction that pictures of a particular temporal sub-layer cannot be used for inter prediction reference by pictures of a lower temporal sub-layer, the sub-bitstream extraction process, and the requirement that each sub-bitstream extraction output be a conforming bitstream. Media-Aware Network Elements (MANEs) can utilize the TemporalId in the NAL unit header for stream adaptation purposes based on temporal scalability.

HEVC包括通过在NAL单元头中包含TemporalId的信令来改进对时间可伸缩性的支持，特定时间子层的图片不能用于由较低时间子层的图片进行帧间预测参考的限制，子比特流提取过程，以及每个子比特流提取输出是一致的比特流的要求。媒体感知网络元件（MANE）可以基于时间可伸缩性利用NAL单元头中的TemporalId用于流适配目的。

Temporal sub-layer switching support
HEVC specifies, through NAL unit types present in the NAL unit header, the signaling of Temporal Sub-layer Access (TSA) and Step- wise Temporal Sub-layer Access (STSA). A TSA picture and pictures following the TSA picture in decoding order do not use pictures prior to the TSA picture in decoding order with TemporalId greater than or equal to that of the TSA picture for inter prediction reference. A TSA picture enables up-switching, at the TSA picture, to the sub- layer containing the TSA picture or any higher sub-layer, from the immediately lower sub-layer. An STSA picture does not use pictures with the same TemporalId as the STSA picture for inter prediction reference. Pictures following an STSA picture in decoding order with the same TemporalId as the STSA picture do not use pictures prior to the STSA picture in decoding order with the same TemporalId as the STSA picture for inter prediction reference. An STSA picture enables up-switching, at the STSA picture, to the sub-layer containing the STSA picture, from the immediately lower sub-layer.

HEVC通过NAL单元报头中存在的NAL单元类型指定时间子层接入（TSA）和逐步时间子层接入（STSA）的信令。按照解码顺序在TSA图像之后的TSA图像和图像不使用在TSA图像之前的图像的解码顺序，其中TemporalId大于或等于用于帧间预测参考的TSA图像的TemporalId。 TSA图像使得能够在TSA图像处从直接较低的子层向上切换到包含TSA图像或任何较高子层的子层。 STSA图片不使用具有与STSA图片相同的TemporalId的图片用于帧间预测参考。以与STSA图片相同的TemporalId的解码顺序的STSA图片之后的图片不使用与STSA图片在解码顺序中的图片之前具有与用于帧间预测参考的STSA图片相同的TemporalId。 STSA图片使得能够在STSA图片上从紧邻的较低子层向上包含STSA图片的子层。

Sub-layer reference or non-reference pictures

The concept and signaling of reference/non-reference pictures in HEVC are different from H.264. In H.264, if a picture may be used by any other picture for inter prediction reference, it is a reference picture; otherwise, it is a non-reference picture, and this is signaled by two bits in the NAL unit header. In HEVC, a picture is called a reference picture only when it is marked as “used for reference”. In addition, the concept of sub-layer reference picture was introduced. If a picture may be used by another other picture with the same TemporalId for inter prediction reference, it is a sub- layer reference picture; otherwise, it is a sub-layer non-reference picture. Whether a picture is a sub-layer reference picture or sub- layer non-reference picture is signaled through NAL unit type values.

HEVC中的参考/非参考图片的概念和信令与H.264不同。在H.264中，如果图片可以被任何其他图片用于帧间预测参考，则它是参考图片; 否则，它是非参考图像，并且这由NAL单元头中的两个比特用信号通知。在HEVC中，仅当图片被标记为“用于参考”时才将图片称为参考图片。此外，还介绍了子层参考图的概念。如果图片可以被具有相同TemporalId的另一个其他图片用于帧间预测参考，则它是子层参考图片; 否则，它是子层非参考图片。通过NAL单元类型值来用信号通知图片是子层参考图片还是子层非参考图片。

Extensibility

Besides the TemporalId in the NAL unit header, HEVC also includes the signaling of a six-bit layer ID in the NAL unit header, which must be equal to 0 for a single-layer bitstream. Extension mechanisms have been included in the VPS, SPS, Picture Parameter Set (PPS), SEI NAL unit, slice headers, and so on. All these extension mechanisms enable future extensions in a backward-compatible manner, such that bitstreams encoded according to potential future HEVC extensions can be fed to then-legacy decoders (e.g., HEVC version 1 decoders), and the then-legacy decoders can decode and output the base-layer bitstream.

除了NAL单元头中的TemporalId之外，HEVC还包括NAL单元头中的六比特层ID的信令，对于单层比特流，该信令必须等于0。扩展机制已包括在VPS，SPS，图片参数集（PPS），SEI NAL单元，切片报头等中。所有这些扩展机制以后向兼容的方式实现未来的扩展，使得根据潜在的未来HEVC扩展编码的比特流可以被馈送到当时的传统解码器（例如，HEVC版本1解码器），并且当时的传统解码器可以解码和输出基层比特流。

Bitstream extraction
HEVC includes a bitstream-extraction process as an integral part of the overall decoding process. The bitstream extraction process is used in the process of bitstream conformance tests, which is part of the HRD buffering model.
HEVC包括比特流提取过程，作为整个解码过程的组成部分。比特流提取过程用于比特流一致性测试过程，这是HRD缓冲模型的一部分。

Reference picture management

The reference picture management of HEVC, including reference picture marking and removal from the Decoded Picture Buffer (DPB) as well as Reference Picture List Construction (RPLC), differs from that of H.264. Instead of the reference picture marking mechanism based on a sliding window plus adaptive Memory Management Control Operation (MMCO) described in H.264, HEVC specifies a reference picture management and marking mechanism based on Reference Picture Set (RPS), and the RPLC is consequently based on the RPS mechanism. An RPS consists of a set of reference pictures associated with a picture, consisting of all reference pictures that are prior to the associated picture in decoding order, that may be used for inter prediction of the associated picture or any picture following the associated picture in decoding order. The reference picture set consists of five lists of reference pictures; RefPicSetStCurrBefore, RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr, and RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter, and RefPicSetLtCurr contain all reference pictures that may be used in inter prediction of the current picture and that may be used in inter prediction of one or more of the pictures following the current picture in decoding order. RefPicSetStFoll and RefPicSetLtFoll consist of all reference pictures that are not used in inter prediction of the current picture but may be used in inter prediction of one or more of the pictures following the current picture in decoding order. RPS provides an “intra-coded” signaling of the DPB status, instead of an “inter-coded” signaling, mainly for improved error resilience. The RPLC process in HEVC is based on the RPS, by signaling an index to an RPS subset for each reference index; this process is simpler than the RPLC process in H.264.

HEVC的参考图像管理（包括参考图像标记和从解码图像缓冲器（DPB）中去除以及参考图像列表构造（RPLC））不同于H.264。代替基于滑动窗口加上H.264中描述的自适应存储器管理控制操作（MMCO）的参考图片标记机制，HEVC指定基于参考图片集（RPS）的参考图片管理和标记机制，因此RPLC基于RPS机制。 RPS由与图片相关联的一组参考图片组成，包括在解码顺序中在相关图片之前的所有参考图片，其可用于相关图片的帧间预测或在解码中关联图片之后的任何图片订购。参考图片集包括五个参考图片列表; RefPicSetStCurrBefore，RefPicSetStCurrAfter，RefPicSetStFoll，RefPicSetLtCurr和RefPicSetLtFoll。 RefPicSetStCurrBefore，RefPicSetStCurrAfter和RefPicSetLtCurr包含可以在当前图片的帧间预测中使用的所有参考图片，并且可以用于按照解码顺序在当前图片之后的一个或多个图片的帧间预测中。 RefPicSetStFoll和RefPicSetLtFoll由未在当前图片的帧间预测中使用的所有参考图片组成，但是可以用于按解码顺序在当前图片之后的一个或多个图片的帧间预测中。 RPS提供DPB状态的“帧内编码”信令，而不是“帧间编码”信令，主要用于改善错误恢复。 HEVC中的RPLC过程基于RPS，通过向每个参考索引的RPS子集发信号通知索引;这个过程比H.264中的RPLC过程简单。

Ultra-low delay support

HEVC specifies a sub-picture-level HRD operation, for support of the so-called ultra-low delay. The mechanism specifies a standard- compliant way to enable delay reduction below a one-picture interval. Coded Picture Buffer (CPB) and DPB parameters at the sub-picture level may be signaled, and utilization of this information for the derivation of CPB timing (wherein the CPB removal time corresponds to decoding time) and DPB output timing (display time) is specified. Decoders are allowed to operate the HRD at the conventional access- unit level, even when the sub-picture-level HRD parameters are present.

HEVC指定子图像级HRD操作，以支持所谓的超低延迟。该机制规定了一种符合标准的方法，可以在一个图像间隔内实现延迟降低。可以用信号通知子图像级别的编码图像缓冲器（CPB）和DPB参数，并且利用该信息来推导CPB定时（其中CPB移除时间对应于解码时间）和DPB输出定时（显示时间）是指定。即使存在子图像级HRD参数，也允许解码器在常规接入单元级别操作HRD。

New SEI messages

HEVC inherits many H.264 SEI messages with changes in syntax and/or semantics making them applicable to HEVC. Additionally, there are a few new SEI messages reviewed briefly in the following paragraphs.
The display orientation SEI message informs the decoder of a transformation that is recommended to be applied to the cropped decoded picture prior to display, such that the pictures can be properly displayed, e.g., in an upside-up manner.
The structure of pictures SEI message provides information on the NAL unit types, picture-order count values, and prediction dependencies of a sequence of pictures. The SEI message can be used, for example, for concluding what impact a lost picture has on other pictures.
The decoded picture hash SEI message provides a checksum derived from the sample values of a decoded picture. It can be used for detecting whether a picture was correctly received and decoded.
The active parameter sets SEI message includes the IDs of the active video parameter set and the active sequence parameter set and can be used to activate VPSs and SPSs. In addition, the SEI message includes the following indications: 1) An indication of whether “full random accessibility” is supported (when supported, all parameter sets needed for decoding of the remaining of the bitstream when random accessing from the beginning of the current CVS by completely discarding all access units earlier in decoding order are present in the remaining bitstream, and all coded pictures in the remaining bitstream can be correctly decoded); 2) An indication of whether there is no parameter set within the current CVS that updates another parameter set of the same type preceding in decoding order. An update of a parameter set refers to the use of the same parameter set ID but with some other parameters changed. If this property is true for all CVSs in the bitstream, then all parameter sets can be sent out-of-band before session start.
The decoding unit information SEI message provides information regarding coded picture buffer removal delay for a decoding unit. The message can be used in very-low-delay buffering operations.
The region refresh information SEI message can be used together with the recovery point SEI message (present in both H.264 and HEVC) for improved support of gradual decoding refresh. This supports random access from inter-coded pictures, wherein complete pictures can be correctly decoded or recovered after an indicated number of pictures in output/display order.

HEVC继承了许多H.264 SEI消息，其语法和/或语义发生了变化，使其适用于HEVC。此外，以下段落中简要回顾了一些新的SEI消息。
显示方向SEI消息通知解码器建议在显示之前应用于裁剪的解码图像的变换，使得可以例如以颠倒的方式正确地显示图像。
图片SEI消息的结构提供关于NAL单元类型，图片顺序计数值和图片序列的预测依赖性的信息。例如，可以使用SEI消息来总结丢失的图片对其他图片的影响。
解码图像散列SEI消息提供从解码图像的样本值导出的校验和。它可用于检测图片是否被正确接收和解码。
活动参数集SEI消息包括活动视频参数集和活动序列参数集的ID，并且可以用于激活VPS和SPS。另外，SEI消息包括以下指示：1）是否支持“完全随机可访问性”的指示（当支持时，当从当前CVS的开始随机访问时解码剩余比特流所需的所有参数集通过完全丢弃在解码顺序中较早的所有访问单元存在于剩余的比特流中，并且可以正确地解码剩余比特流中的所有编码图像）; 2）指示在当前CVS内是否没有参数集，其更新在解码顺序之前的相同类型的另一参数集。参数集的更新是指使用相同的参数集ID但更改了一些其他参数。如果此属性对于比特流中的所有CVS都为真，那么所有参数集都可以在会话开始之前发送到带外。
解码单元信息SEI消息提供关于解码单元的编码图像缓冲器移除延迟的信息。该消息可用于极低延迟的缓冲操作。
区域刷新信息SEI消息可以与恢复点SEI消息（存在于H.264和HEVC中）一起使用，以改进对逐渐解码刷新的支持。这支持来自帧间编码图像的随机访问，其中在输出/显示顺序中指示数量的图像之后可以正确地解码或恢复完整图像。

1.1.3. Parallel Processing Support

The reportedly significantly higher encoding computational demand of HEVC over H.264, in conjunction with the ever-increasing video resolution (both spatially and temporally) required by the market, led to the adoption of VCL coding tools specifically targeted to allow for parallelization on the sub-picture level. That is, parallelization occurs, at the minimum, at the granularity of an integer number of CTUs. The targets for this type of high-level parallelization are multicore CPUs and DSPs as well as multiprocessor systems. In a system design, to be useful, these tools require signaling support, which is provided in Section 7 of this memo. This section provides a brief overview of the tools available in [HEVC].
Many of the tools incorporated in HEVC were designed keeping in mind the potential parallel implementations in multicore/multiprocessor architectures. Specifically, for parallelization, four picture partition strategies, as described below, are available.
Slices are segments of the bitstream that can be reconstructed independently from other slices within the same picture (though there may still be interdependencies through loop filtering operations). Slices are the only tool that can be used for parallelization that is also available, in virtually identical form, in H.264. Parallelization based on slices does not require much inter-processor or inter-core communication (except for inter-processor or inter-core data sharing for motion compensation when decoding a predictively coded picture, which is typically much heavier than inter-processor or inter-core data sharing due to in-picture prediction), as slices are designed to be independently decodable. However, for the same reason, slices can require some coding overhead. Further, slices (in contrast to some of the other tools mentioned below) also serve as the key mechanism for bitstream partitioning to match Maximum Transfer Unit (MTU) size requirements, due to the in-picture independence of slices and the fact that each regular slice is encapsulated in its own NAL unit. In many cases, the goal of parallelization and the goal of MTU size matching can place contradicting demands to the slice layout in a picture. The realization of this situation led to the development of the more advanced tools mentioned below.
Dependent slice segments allow for fragmentation of a coded slice into fragments at CTU boundaries without breaking any in-picture prediction mechanisms. They are complementary to the fragmentation mechanism described in this memo in that they need the cooperation of the encoder. As a dependent slice segment necessarily contains an integer number of CTUs, a decoder using multiple cores operating on CTUs can process a dependent slice segment without communicating parts of the slice segment’s bitstream to other cores. Fragmentation, as specified in this memo, in contrast, does not guarantee that a fragment contains an integer number of CTUs.
In Wavefront Parallel Processing (WPP), the picture is partitioned into rows of CTUs. Entropy decoding and prediction are allowed to use data from CTUs in other partitions. Parallel processing is possible through parallel decoding of CTU rows, where the start of the decoding of a row is delayed by two CTUs, so to ensure that data related to a CTU above and to the right of the subject CTU is available before the subject CTU is being decoded. Using this staggered start (which appears like a wavefront when represented graphically), parallelization is possible with up to as many processors/cores as the picture contains CTU rows.
Because in-picture prediction between neighboring CTU rows within a picture is allowed, the required inter-processor/inter-core communication to enable in-picture prediction can be substantial. The WPP partitioning does not result in the creation of more NAL units compared to when it is not applied; thus, WPP cannot be used for MTU size matching, though slices can be used in combination for that purpose.
Tiles define horizontal and vertical boundaries that partition a picture into tile columns and rows. The scan order of CTUs is changed to be local within a tile (in the order of a CTU raster scan of a tile), before decoding the top-left CTU of the next tile in the order of tile raster scan of a picture. Similar to slices, tiles break in-picture prediction dependencies (including entropy decoding dependencies). However, they do not need to be included into individual NAL units (same as WPP in this regard); hence, tiles cannot be used for MTU size matching, though slices can be used in combination for that purpose. Each tile can be processed by one processor/core, and the inter-processor/inter-core communication required for in-picture prediction between processing units decoding neighboring tiles is limited to conveying the shared slice header in cases a slice is spanning more than one tile, and loop-filtering- related sharing of reconstructed samples and metadata. Insofar, tiles are less demanding in terms of inter-processor communication bandwidth compared to WPP due to the in-picture independence between two neighboring partitions.

据报道，HEVC对H.264的编码计算需求明显高于市场所需的不断增加的视频分辨率（空间和时间），导致采用专门针对的并行化的VCL编码工具。子图片级别。也就是说，并行化至少以整数个CTU的粒度发生。这种高级并行化的目标是多核CPU和DSP以及多处理器系统。在系统设计中，为了有用，这些工具需要信令支持，这在本备忘录的第7节中提供。本节简要概述[HEVC]中可用的工具。
HEVC中包含的许多工具都是在设计时考虑到多核/多处理器架构中潜在的并行实现。具体地，对于并行化，可以使用如下所述的四种图像分区策略。
切片是比特流的片段，其可以独立于同一图片内的其他片重建（尽管通过环路滤波操作可能仍然存在相互依赖性）。切片是唯一可用于并行化的工具，在H.264中也可以以几乎相同的形式使用。基于切片的并行化不需要太多的处理器间或核心间通信（除了在解码预测编码图像时用于运动补偿的处理器间或核心间数据共享，这通常比处理器间或交互间重得多。由于片内预测导致的核心数据共享，因为片被设计为可独立解码。但是，出于同样的原因，切片可能需要一些编码开销。此外，切片（与下面提到的一些其他工具相反）也可以作为比特流分区的关键机制，以匹配最大传输单元（MTU）大小要求，因为切片的图片内独立性和每个常规的事实slice被封装在自己的NAL单元中。在许多情况下，并行化的目标和MTU大小匹配的目标可能会对图片中的切片布局提出矛盾的要求。实现这种情况导致开发出下面提到的更先进的工具。
相关切片段允许将编码切片分段成CTU边界处的片段，而不破坏任何图像内预测机制。它们是本备忘录中描述的碎片机制的补充，因为它们需要编码器的配合。由于从属片段必须包含整数个CTU，因此使用在CTU上操作的多个核的解码器可以处理从属片段而不将片段的比特流的部分传送到其他核。相反，本备忘录中指定的分段并不保证片段包含整数个CTU。
在波前并行处理（WPP）中，图像被划分为多个CTU行。允许熵解码和预测使用来自其他分区中的CTU的数据。通过并行处理CTU行可以进行并行处理，其中行的解码开始被延迟两个CTU，从而确保在主题CTU之前可以获得与主题CTU上方和右侧的CTU相关的数据。正在被解码。使用这种交错的启动（当以图形方式表示时看起来像波前），可以使用与图片包含CTU行一样多的处理器/核心进行并行化。
因为允许图像内的相邻CTU行之间的图像内预测，所以实现图像内预测所需的处理器间/核间通信可能是实质性的。与未应用时相比，WPP分区不会导致创建更多NAL单元;因此，WPP不能用于MTU大小匹配，尽管可以为此目的组合使用切片。
Tiles定义水平和垂直边界，将图片分割为tile列和行。在按照图片的图块光栅扫描的顺序解码下一个图块的左上角CTU之前，CTU的扫描顺序被改变为在图块内是局部的（按照图块的CTU光栅扫描的顺序）。与切片类似，切片破坏了图片内预测依赖性（包括熵解码依赖性）。但是，它们不需要包含在单独的NAL单元中（在这方面与WPP相同）;因此，瓦片不能用于MTU大小匹配，尽管可以为此目的组合使用切片。每个瓦片可以由一个处理器/核心处理，并且在解码相邻瓦片的处理单元之间的图片内预测所需的处理器间/核心间通信限于在切片跨越多于一个的情况下传送共享切片报头。瓦片，以及与重构样本和元数据相关的循环过滤相关共享。

1.1.4. NAL Unit Header
HEVC maintains the NAL unit concept of H.264 with modifications. HEVC uses a two-byte NAL unit header, as shown in Figure 1. The payload of a NAL unit refers to the NAL unit excluding the NAL unit header.

HEVC通过修改维护H.264的NAL单元概念。 HEVC使用双字节NAL单元头，如图1所示.NAL单元的有效载荷是指除NAL单元头之外的NAL单元。

+---------------+---------------+
|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|   Type    |  LayerId  | TID |
+-------------+-----------------+
Figure 1: The Structure of the HEVC NAL Unit Header

The semantics of the fields in the NAL unit header are as specified in [HEVC] and described briefly below for convenience. In addition to the name and size of each field, the corresponding syntax element name in [HEVC] is also provided.
F: 1 bit forbidden_zero_bit. Required to be zero in [HEVC]. Note that the inclusion of this bit in the NAL unit header was to enable transport of HEVC video over MPEG-2 transport systems (avoidance of start code emulations) [MPEG2S]. In the context of this memo,the value 1 may be used to indicate a syntax violation, e.g., for a NAL unit resulted from aggregating a number of fragmented units of a NAL unit but missing the last fragment, as described in Section 4.4.3.
Type: 6 bits nal_unit_type. This field specifies the NAL unit type as defined in Table 7-1 of [HEVC]. If the most significant bit of this field of a NAL unit is equal to 0 (i.e., the value of this field is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the NAL unit is a non-VCL NAL unit. For a reference of all currently defined NAL unit types and their semantics, please refer to Section 7.4.2 in [HEVC].
LayerId: 6 bits nuh_layer_id. Required to be equal to zero in [HEVC]. It is anticipated that in future scalable or 3D video coding extensions of this specification, this syntax element will be used to identify additional layers that may be present in the CVS, wherein a layer may be, e.g., a spatial scalable layer, a quality scalable layer, a texture view, or a depth view.
TID: 3 bits nuh_temporal_id_plus1. This field specifies the temporal identifier of the NAL unit plus 1. The value of TemporalId is equal to TID minus 1. A TID value of 0 is illegal to ensure that there is at least one bit in the NAL unit header equal to 1, so to enable independent considerations of start code emulations in the NAL unit header and in the NAL unit payload data.

NAL单元头中的字段的语义如[HEVC]中所规定，并且为了方便起见在下面简要描述。除了每个字段的名称和大小之外，还提供了[HEVC]中的相应语法元素名称。
F：1位forbidden_zero_bit。在[HEVC]中要求为零。注意，在NAL单元头中包含该比特是为了能够通过MPEG-2传输系统传输HEVC视频（避免启动代码仿真）[MPEG2S]。在本备忘录的上下文中，值1可用于指示语法违规，例如，对于NAL单元，其是由聚合NAL单元的多个分段单元但缺少最后一个片段而产生的，如第4.4.3节中所述。
类型：6位nal_unit_type。该字段指定[HEVC]表7-1中定义的NAL单元类型。如果NAL单元的该字段的最高有效位等于0（即，该字段的值小于32），则NAL单元是VCL NAL单元。否则，NAL单元是非VCL NAL单元。有关所有当前定义的NAL单元类型及其语义的参考，请参见[HEVC]中的第7.4.2节。
LayerId：6比特nuh_layer_id。在[HEVC]中要求等于零。预期在本说明书的未来可缩放或3D视频编码扩展中，此语法元素将用于识别可存在于CVS中的附加层，其中层可为（例如，空间可缩放层），质量可缩放的图层，纹理视图或深度视图。
TID：3位nuh_temporal_id_plus1。该字段指定NAL单元的时间标识符加1.TemporalId的值等于TID减1.TID值为0是非法的，以确保NAL单元头中至少有一位等于1，所以在NAL单元报头和NAL单元有效载荷数据中启用独立的起始码仿真考虑。

1.2. Overview of the Payload Format

This payload format defines the following processes required for transport of HEVC coded data over RTP [RFC3550]:
o Usage of RTP header with this payload format
o Packetization of HEVC coded NAL units into RTP packets using three types of payload structures: a single NAL unit packet, aggregation packet, and fragment unit
o Transmission of HEVC NAL units of the same bitstream within a single RTP stream or multiple RTP streams (within one or more RTP sessions), where within an RTP stream transmission of NAL units may be either non-interleaved (i.e., the transmission order of NAL units is the same as their decoding order) or interleaved (i.e., the transmission order of NAL units is different from the decoding order)
o Media type parameters to be used with the Session Description Protocol (SDP) [RFC4566]
o A payload header extension mechanism and data structures for enhanced support of temporal scalability based on that extension mechanism.

此有效载荷格式定义了通过RTP [RFC3550]传输HEVC编码数据所需的以下过程：
o使用具有此有效载荷格式的RTP头
o使用三种类型的有效载荷结构将HEVC编码的NAL单元分组为RTP分组：单个NAL单元分组，聚合分组和分段单元
o在单个RTP流或多个RTP流（在一个或多个RTP会话内）内的相同比特流的HEVC NAL单元的传输，其中在NTP单元内的RTP流传输可以是非交织的（即，传输顺序为 NAL单元与其解码顺序相同）或交织（即，NAL单元的传输顺序与解码顺序不同）
o与会话描述协议（SDP）[RFC4566]一起使用的媒体类型参数
o有效负载头扩展机制和数据结构，用于基于该扩展机制增强对时间可伸缩性的支持。

2 Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119].
In this document, the above key words will convey that interpretation only when in ALL CAPS. Lowercase uses of these words are not to be interpreted as carrying the significance described in RFC 2119.
This specification uses the notion of setting and clearing a bit when bit fields are handled. Setting a bit is the same as assigning that bit the value of 1 (On). Clearing a bit is the same as assigning that bit the value of 0 (Off).

本文件中的关键词“必须”，“不得”，“必须”，“应该”，“不应该”，“应该”，“不应该”，“推荐”，“可以”和“可选” 按照BCP 14 [RFC2119]中的描述进行解释。
在本文档中，上述关键词仅在全部大写时才传达该解释。这些词的小写用法不应被解释为具有RFC 2119中描述的重要性。
该规范使用了在处理位字段时设置和清除位的概念。设置位与将该位赋值为1（On）相同。清除一位与将该位赋值为0（关闭）相同。

3 Definitions and Abbreviations

3.1. Definitions
This document uses the terms and definitions of [HEVC]. Section 3.1.1 lists relevant definitions from [HEVC] for convenience. Section 3.1.2 provides definitions specific to this memo.

3.1.1. Definitions from the HEVC Specification

access unit: A set of NAL units that are associated with each other according to a specified classification rule, that are consecutive in decoding order, and that contain exactly one coded picture.
BLA access unit: An access unit in which the coded picture is a BLA picture.
BLA picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
Coded Video Sequence (CVS): A sequence of access units that consists, in decoding order, of an IRAP access unit with NoRaslOutputFlag equal to 1, followed by zero or more access units that are not IRAP access units with NoRaslOutputFlag equal to 1, including all subsequent access units up to but not including any subsequent access unit that is an IRAP access unit with NoRaslOutputFlag equal to 1.
Informative note: An IRAP access unit may be an IDR access unit, a BLA access unit, or a CRA access unit. The value of NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA access unit, and each CRA access unit that is the first access unit in the bitstream in decoding order, is the first access unit that follows an end of sequence NAL unit in decoding order, or has HandleCraAsBlaFlag equal to 1.
CRA access unit: An access unit in which the coded picture is a CRA picture.
CRA picture: A RAP picture for which each VCL NAL unit has nal_unit_type equal to CRA_NUT.
IDR access unit: An access unit in which the coded picture is an IDR picture.
IDR picture: A RAP picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
IRAP access unit: An access unit in which the coded picture is an IRAP picture.
IRAP picture: A coded picture for which each VCL NAL unit has nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23), inclusive.
layer: A set of VCL NAL units that all have a particular value of nuh_layer_id and the associated non-VCL NAL units, or one of a set of syntactical structures having a hierarchical relationship.
operation point: bitstream created from another bitstream by operation of the sub-bitstream extraction process with the another bitstream, a target highest TemporalId, and a target-layer identifier list as input.
random access: The act of starting the decoding process for a bitstream at a point other than the beginning of the bitstream.
sub-layer: A temporal scalable layer of a temporal scalable bitstream consisting of VCL NAL units with a particular value of the TemporalId variable, and the associated non-VCL NAL units.
sub-layer representation: A subset of the bitstream consisting of NAL units of a particular sub-layer and the lower sub-layers.
tile: A rectangular region of coding tree blocks within a particular tile column and a particular tile row in a picture.
tile column: A rectangular region of coding tree blocks having a height equal to the height of the picture and a width specified by syntax elements in the picture parameter set.
tile row: A rectangular region of coding tree blocks having a height specified by syntax elements in the picture parameter set and a width equal to the width of the picture.

访问单元：一组NAL单元，它们根据指定的分类规则相互关联，在解码顺序上是连续的，并且只包含一个编码图像。
BLA访问单元：一种访问单元，其中编码图像是BLA图像。
BLA图片：每个VCL NAL单元具有等于BLA_W_LP，BLA_W_RADL或BLA_N_LP的nal_unit_type的IRAP图片。
编码视频序列（CVS）：一系列访问单元，按解码顺序包含NoRaslOutputFlag等于1的IRAP访问单元，后跟零个或多个不是IRAP访问单元且NoRaslOutputFlag等于1的访问单元，包括所有后续访问单元，但不包括任何后续访问单元，它是NoRaslOutputFlag等于1的IRAP访问单元。
信息性说明：IRAP访问单元可以是IDR访问单元，BLA访问单元或CRA访问单元。对于每个IDR访问单元，每个BLA访问单元以及作为解码顺序中的比特流中的第一访问单元的每个CRA访问单元，NoRaslOutputFlag的值等于1，是遵循序列结束NAL单元的第一访问单元在解码顺序中，或者HandleCraAsBlaFlag等于1。
CRA访问单元：一种访问单元，其中编码图像是CRA图像。
CRA图片：每个VCL NAL单元具有等于CRA_NUT的nal_unit_type的RAP图片。
IDR访问单元：一种访问单元，其中编码图像是IDR图像。
IDR图片：每个VCL NAL单元的nal_unit_type等于IDR_W_RADL或IDR_N_LP的RAP图片。
IRAP访问单元：一种访问单元，其中编码图像是IRAP图像。
IRAP图片：每个VCL NAL单元在BLA_W_LP（16）到RSV_IRAP_VCL23（23）的范围内具有nal_unit_type的编码图片。
layer：一组VCL NAL单元，它们都具有特定的nuh_layer_id值和相关的非VCL NAL单元，或者是具有层次关系的一组语法结构中的一个。
操作点：通过操作子位流提取过程从另一个比特流创建的比特流，其中另一个比特流，目标最高TemporalId和目标层标识符列表作为输入。
随机访问：在比特流开始之外的点开始比特流的解码过程的行为。
子层：时间可伸缩比特流的时间可伸缩层，其由具有TemporalId变量的特定值的VCL NAL单元和相关联的非VCL NAL单元组成。
子层表示：由特定子层和下子层的NAL单元组成的比特流的子集。
tile：特定tile列中的编码树块的矩形区域和图片中的特定tile行。
tile列：编码树块的矩形区域，其高度等于图片的高度，以及由图片参数集中的语法元素指定的宽度。
瓦片行：编码树块的矩形区域，其具有由图片参数集中的语法元素指定的高度，并且宽度等于图片的宽度。

3.1.2. Definitions Specific to This Memo

dependee RTP stream: An RTP stream on which another RTP stream depends. All RTP streams in a Multiple RTP streams on a Single media Transport (MRST) or Multiple RTP streams on Multiple media Transports (MRMT), except for the highest RTP stream, are dependee RTP streams.
highest RTP stream: The RTP stream on which no other RTP stream depends. The RTP stream in a Single RTP stream on a Single media Transport (SRST) is the highest RTP stream.
Media-Aware Network Element (MANE): A network element, such as a middlebox, selective forwarding unit, or application-layer gateway that is capable of parsing certain aspects of the RTP payload headers or the RTP payload and reacting to their contents.
Informative note: The concept of a MANE goes beyond normal routers or gateways in that a MANE has to be aware of the signaling (e.g., to learn about the payload type mappings of the media streams), and in that it has to be trusted when working with Secure RTP (SRTP). The advantage of using MANEs is that they allow packets to be dropped according to the needs of the media coding. For example, if a MANE has to drop packets due to congestion on a certain link, it can identify and remove those packets whose elimination produces the least adverse effect on the user experience. After dropping packets, MANEs must rewrite RTCP packets to match the changes to the RTP stream, as specified in Section 7 of [RFC3550].
Media Transport: As used in the MRST, MRMT, and SRST definitions below, Media Transport denotes the transport of packets over a transport association identified by a 5-tuple (source address, source port, destination address, destination port, transport protocol). See also Section 2.1.13 of [RFC7656].
Informative note: The term “bitstream” in this document is equivalent to the term “encoded stream” in [RFC7656].

dependee RTP stream：另一个RTP流所依赖的RTP流。单个媒体传输（MRST）上的多个RTP流中的所有RTP流或多个媒体传输（MRMT）上的多个RTP流（除了最高RTP流之外）是依赖性RTP流。
最高RTP流：没有其他RTP流所依赖的RTP流。单媒体传输（SRST）上的单RTP流中的RTP流是最高RTP流。
媒体感知网络元素（MANE）：网络元素，例如中间盒，选择性转发单元或应用层网关，其能够解析RTP有效载荷报头或RTP有效载荷的某些方面并对其内容作出反应。
信息性说明：MANE的概念超出了普通的路由器或网关，因为MANE必须知道信令（例如，要了解媒体流的有效载荷类型映射），并且必须要信任它。使用Secure RTP（SRTP）。使用MANE的优点是它们允许根据媒体编码的需要丢弃分组。例如，如果MANE由于某个链路上的拥塞而必须丢弃分组，则它可以识别并移除那些消除对用户体验产生最小不利影响的分组。丢弃数据包后，MANE必须重写RTCP数据包以匹配对RTP流的更改，如[RFC3550]第7节中所述。
媒体传输：如下面的MRST，MRMT和SRST定义中所使用的，媒体传输表示通过由5元组标识的传输关联（源地址，源端口，目的地地址，目的地端口，传输协议）传输分组。另见[RFC7656]的第2.1.13节。
资料性说明：本文件中的术语“比特流”等同于[RFC7656]中的术语“编码流”。

Multiple RTP streams on a Single media Transport (MRST): Multiple RTP streams carrying a single HEVC bitstream on a Single Transport. See also Section 3.5 of [RFC7656].
Multiple RTP streams on Multiple media Transports (MRMT): Multiple RTP streams carrying a single HEVC bitstream on Multiple Transports. See also Section 3.5 of [RFC7656].
NAL unit decoding order: A NAL unit order that conforms to the constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].
NAL unit output order: A

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

RTP Payload Format for High Efficiency Video Coding (HEVC)

相关文章