The h.264 Sequence Parameter Set

This is a follow-up to my World’s Smallest h.264 Encoder post. I’ve received several emails asking about precise details of things in two entities in the h.264 bitstream: the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS). Both entities contain information that an h.264 decoder needs to decode the video data, for example the resolution and frame rate of the video.

Recall that an h.264 bitstream contains a sequence of Network Abstraction Layer (NAL) units. The SPS and PPS are both types of NAL units. The SPS NAL unit contains parameters that apply to a series of consecutive coded video pictures, referred to as a “coded video sequence” in the h.264 standard. The PPS NAL unit contains parameters that apply to the decoding of one or more individual pictures inside a coded video sequence.

In the case of my simple encoder, we emitted a single SPS and PPS at the start of the video data stream, but in the case of a more complex encoder, it would not be uncommon to see them inserted periodically in the data for two reasons—first, often a decoder will need to start decoding mid-stream, and second, because the encoder may wish to vary parameters for different parts of the stream in order to achieve better compression or quality goals.

In my trivial encoder, the h.264 SPS and PPS were hardcoded in hex as:

/* h.264 bitstreams */
const uint8_t sps[] =
{0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00, 0x0a, 0xf8, 0x41, 0xa2};
const uint8_t pps[] =
{0x00, 0x00, 0x00, 0x01, 0x68, 0xce, 0x38, 0x80};

Let’s decode this into something readable from the spec. The first thing I did was to look at section 7 of the h.264 specification. I saw that at a minimum I had to choose how to fill in the SPS parameters in the table below. In the table, as in the standard, the type u(n) indicates an unsigned integer of n bits, and ue(v) indicates an unsigned exponential-golomb coded value of a variable number of bits. The spec doesn’t seem to define the maximum number of bits anywhere, but the reference encoder software uses 32. (People wishing to explore the security of decoder software may find it interesting to violate this assumption!)

Parameter Name Type Value Comments
forbidden_zero_bit u(1) 0 Despite being forbidden, it must be set to 0!
nal_ref_idc u(2) 3 3 means it is “important” (this is an SPS)
nal_unit_type u(5) 7 Indicates this is a sequence parameter set
profile_idc u(8) 66 Baseline profile
constraint_set0_flag u(1) 0 We’re not going to honor constraints
constraint_set1_flag u(1) 0 We’re not going to honor constraints
constraint_set2_flag u(1) 0 We’re not going to honor constraints
constraint_set3_flag u(1) 0 We’re not going to honor constraints
reserved_zero_4bits u(4) 0 Better set them to zero
level_idc u(8) 10 Level 1, sec A.3.1
seq_parameter_set_id ue(v) 0 We’ll just use id 0.
log2_max_frame_num_minus4 ue(v) 0 Let’s have as few frame numbers as possible
pic_order_cnt_type ue(v) 0 Keep things simple
log2_max_pic_order_cnt_lsb_minus4 ue(v) 0 Fewer is better.
num_ref_frames ue(v) 0 We will only send I slices
gaps_in_frame_num_value_allowed_flag u(1) 0 We will have no gaps
pic_width_in_mbs_minus_1 ue(v) 7 SQCIF is 8 macroblocks wide
pic_height_in_map_units_minus_1 ue(v) 5 SQCIF is 6 macroblocks high
frame_mbs_only_flag u(1) 1 We will not to field/frame encoding
direct_8x8_inference_flag u(1) 0 Used for B slices. We will not send B slices
frame_cropping_flag u(1) 0 We will not do frame cropping
vui_prameters_present_flag u(1) 0 We will not send VUI data
rbsp_stop_one_bit u(1) 1 Stop bit. I missed this at first and it caused me much trouble.

Some key things here are the profile (profile_idc) and level (level_idc) that I chose, and the picture width and height. If you encode the above table in hex, you will get the values in the SPS array declared above.

A question I got a couple of times in email was about the width and height parameters—specifically, what to do if the picture width or height is not an integer multiple of macroblock size. Recall that, for the 4:2:0 sampling scheme in my encoder, a macroblock consists of 16×16 luma samples. In this case, you would set the frame_cropping_flag to 1, and reduce the number of pixels in the horizontal and vertical direction with the frame_crop_left_offset,frame_crop_right_offset, frame_crop_top_offset, and frame_crop_bottom_offsetparameters, which are conditionally present in the bitstream only if the frame_cropping_flag is set to one.

One interesting problem that we see fairly often with h.264 is when the container format (MP4, MOV, etc.) contains different values for some of these parameters than the SPS and PPS. In this case, we find different video players handle the streams differently.

A handy tool for decoding h.264 bitstreams, including the SPS, is the h264bitstream tool. It comes with a command line program that decodes a bitstream to the parameter names defined in the h.264 specification. Let’s look at its output for a sample mp4 file I downloaded from youtube. First, I extract the h.264 NAL units from the file using ffmpeg:

ffmpeg.exe -i Old Faithful.mp4 -vcodec copy -vbsf h264_mp4toannexb -an of.h264

The NAL units now reside in the file of.h264. I then run the h264_analyze command from the h264bitstream package to produce the following output:

h264_analyze of.h264
!! Found NAL at offset 4 (0x0004), size 25 (0x0019)
==================== NAL ====================
forbidden_zero_bit : 0
nal_ref_idc : 3
nal_unit_type : 7 ( Sequence parameter set )
======= SPS =======
profile_idc : 100
constraint_set0_flag : 0
constraint_set1_flag : 0
constraint_set2_flag : 0
constraint_set3_flag : 0
reserved_zero_4bits : 0
level_idc : 31
seq_parameter_set_id : 0
chroma_format_idc : 1
residual_colour_transform_flag : 0
bit_depth_luma_minus8 : 0
bit_depth_chroma_minus8 : 0
qpprime_y_zero_transform_bypass_flag : 0
seq_scaling_matrix_present_flag : 0
log2_max_frame_num_minus4 : 3
pic_order_cnt_type : 0
log2_max_pic_order_cnt_lsb_minus4 : 3
delta_pic_order_always_zero_flag : 0
offset_for_non_ref_pic : 0
offset_for_top_to_bottom_field : 0
num_ref_frames_in_pic_order_cnt_cycle : 0
num_ref_frames : 1
gaps_in_frame_num_value_allowed_flag : 0
pic_width_in_mbs_minus1 : 79
pic_height_in_map_units_minus1 : 44
frame_mbs_only_flag : 1
mb_adaptive_frame_field_flag : 0
direct_8x8_inference_flag : 1
frame_cropping_flag : 0
frame_crop_left_offset : 0
frame_crop_right_offset : 0
frame_crop_top_offset : 0
frame_crop_bottom_offset : 0
vui_parameters_present_flag : 1
=== VUI ===
aspect_ratio_info_present_flag : 1
aspect_ratio_idc : 1
sar_width : 0
sar_height : 0
overscan_info_present_flag : 0
overscan_appropriate_flag : 0
video_signal_type_present_flag : 0
video_signal_type_present_flag : 0
video_format : 0
video_full_range_flag : 0
colour_description_present_flag : 0
colour_primaries : 0
transfer_characteristics : 0
matrix_coefficients : 0
chroma_loc_info_present_flag : 0
chroma_sample_loc_type_top_field : 0
chroma_sample_loc_type_bottom_field : 0
timing_info_present_flag : 1
num_units_in_tick : 100
time_scale : 5994
fixed_frame_rate_flag : 1
nal_hrd_parameters_present_flag : 0
vcl_hrd_parameters_present_flag : 0
low_delay_hrd_flag : 0
pic_struct_present_flag : 0
bitstream_restriction_flag : 1
motion_vectors_over_pic_boundaries_flag : 1
max_bytes_per_pic_denom : 0
max_bits_per_mb_denom : 0
log2_max_mv_length_horizontal : 11
log2_max_mv_length_vertical : 11
num_reorder_frames : 0
max_dec_frame_buffering : 1
=== HRD ===
cpb_cnt_minus1 : 0
bit_rate_scale : 0
cpb_size_scale : 0
initial_cpb_removal_delay_length_minus1 : 0
cpb_removal_delay_length_minus1 : 0
dpb_output_delay_length_minus1 : 0
time_offset_length : 0

The only additional thing I’d like to point out here is that this particular SPS also contains information about the frame rate of the video (see timing_info_present_flag). These parameters must be closely checked when you generate bitstreams to ensure they agree with the container format that the h.264 will eventually be muxed into. Even a small error, such as 29.97 fps in one place and 30 fps in another, can result in severe audio/video synchronization problems.

Next time I will write about the h.264 Picture Parameter Set (PPS).

时间: 2024-08-08 08:30:21

The h.264 Sequence Parameter Set的相关文章

h.264语法结构分析

NAL Unit Stream Network Abstraction Layer,简称NAL. h.264把原始的yuv文件编码成码流文件,生成的码流文件就是NAL单元流(NAL unit Stream).而NAL单元流,就是NAL单元组成的. 标准的Annex B规定了NAL单元组成NAL单元流的方式,下面描述了如何将一个NAL单元打包起来,而多个NAL单元进行组合则形成了NAL单元流. byte_stream_nal_unit( NumBytesInNALunit ) { C Descri

How to use VideoToolbox to decompress H.264 video stream

来源:http://stackoverflow.com/questions/29525000/how-to-use-videotoolbox-to-decompress-h-264-video-stream/ How to use VideoToolbox to decompress H.264 video stream up vote 15 down vote favorite 12 I had a lot of trouble figuring out how to use Apple's

H.264视频的RTP荷载格式

Status of This Memo This document specifies an Internet standards track protocol for the   Internet community, and requests discussion and suggestions for   improvements.  Please refer to the current edition of the "Internet   Official Protocol Stand

开放视频编码(H.264)编解码数据输入、输出接口

AnyChat是一套开放的音视频即时通信解决方案,早期的版本已经开放了原始数据的输入.输出接口:1.通过客户端回调函数可以输出用户原始的视频采样帧数据(YUV.RGB):视频数据回调函数2.通过外部数据输入接口可以支持将外部的视频帧传给AnyChat进行编码:如何使用外部音视频数据输入功能? 对于某些特定的场合,上层应用希望获取AnyChat内核原始的H.264编码数据,或是希望将H.264编码之后的数据传输给AnyChat,自AnyChat r4268版本开始提供了支持,该特性将给AnyCha

H.264帧分析

1.H.264是由一个个NAL Unit组成,每个NAL Unit以0x000001或者0x00000001为起始码,每两个起始码之间的数据称之为NAL Unit:H.264规定当检查到0x000000标志前一个NAL单元的结束: 而NAL单元视频数据为了防止出现0x000000与起始码竞争,编码器时会插入0x03,当解码器在NAL内部监测到有0x000003的序列时,将把0x03抛弃掉,恢复原始数据. 2.每个NAL Unit第一个字节的低5bit表示该单元的类型nal_unit_type:

获得H.264视频分辨率的方法

在使用ffmpeg解码播放TS流的时候(例如之前写过的UDP组播流),在连接时往往需要耗费大量时间.经过debug发现是av_find_stream_info(已抛弃,现在使用的是avformat_find_stream_info)这个方法十分耗时,而且是阻塞的.av_find_stream_info方法主要是获得相应的流信息,其中对我的应用最有用的就是视频的分辨率.在av_find_stream_info中是要不断的读取数据包,解码获得相应的信息,而其中除了分辨率信息以外的东西对我的应用中是无

H.264 JM 的使用

H.264是由I组织的缩写)和ITU(国际电信联盟的缩写)共同制定的视频压缩标准,J是Joint的缩写,意思就是ISO和ITU共同组成的联合专家组,M是Model的缩写,JM放一起就是指ISO和ITU共同组成的联合专家组提供的H.264编解码器参考模型. JM特点是支持特性好,实用性差.编码和解码的速度很慢.经常用于学术研究. 之前的项目由于需要用JM检查结果的正确性, 所以用到了JM, 查到一些资料,整理如下: 可以在这里下载JM的代码和一些资料: http://iphome.hhi.de/s

H.264码流结构解析

大概前五六年之前写过的一个大体分析H.264格式,不是很详细,可以大致看看有哪些格式. H.264码流结构解析 那个时候上传的百度文库,以前记得有多积分,现在都不能下载了,还要充钱才可以.真是~~~ 1. H.264简介 MPEG(Moving Picture Experts Group)和VCEG(Video Coding Experts Group)已经联合开发了一个比早期研发的MPEG 和H.263性能更好的视频压缩编码标准,这就是被命名为AVC(Advanced Video Coding

视音频数据处理入门:H.264视频码流解析

前两篇文章介绍的YUV/RGB处理程序以及PCM处理程序都属于视音频原始数据的处理程序.从本文开始介绍视音频码流的处理程序.本文介绍的程序是视频码流处理程序.视频码流在视频播放器中的位置如下所示. 本文中的程序是一个H.264码流解析程序.该程序可以从H.264码流中分析得到它的基本单元NALU,并且可以简单解析NALU首部的字段.通过修改该程序可以实现不同的H.264码流处理功能. 原理 H.264原始码流(又称为"裸流")是由一个一个的NALU组成的.他们的结构如下图所示. 其中每