GOP (Group of Pictures) is one of the important parameters of DASH media. Incorrectly set GOP may affect the overall quality of the media; it affects the quality of the playback in adaptive streaming scenarios, when a media player switches between different quality levels, or, it may even prevent the media player from playing such media at all.
In this article we’ll see what GOP is, why is it important, how to set it correctly, and how to inspect already existing media in order to understand whether GOP was set correctly.
Versions of software and documents that were used while creating this article:
- DASH-IF IOP v2.0 and v2.9 (draft for v3.0).
- FFmpeg (64-bit, git-cd960c8, 2015-01-15).
- x264 (64-bit, 20141220-git-40bb568).
- x265 (64-bit, 1.4, build 397, 8bpp, taken from here). Documentation.
- MP4Box from GPAC (64-bit, 0.5.2-DEV-rev27).
- mp4dump from Bento4 (32-bit, 1.4.2-586).
- dash.js 1.2.0.
The “Sintel” movie trailer is used as a test media. You can find this media, as well as other media that is free to use here.
What is GOP
Compressed H.264 (also known as AVC) and H.265 (also known as HEVC) media consists of frames of different types: I- (least compressed) (also known as the key frame), P- and B-frames (most compressed). These frames are organized in a specific order. Let’s say we have a sequence of frames: I B B P B B I B B P B B. The GOP structure starts with an I-frame, so the GOP length (the distance between 2 consecutive I-frames), in this case, is equal to 6. The whole sequence, hence, consists of 2 groups of pictures.
Depending on a context, the acronym “GOP” may be used to describe different things. If the point of interest is the specific order of specific frame types, then GOP means the “GOP structure”. If the point of interest is the amount of frames in GOP, then GOP means the “GOP length”. This article focuses on the GOP length, since it’s more important while creating correct DASH media that is ready for adaptive streaming.
It’s also important to understand that GOP is the concept related only to a video content, since only this type of content consists of pictures.
The Importance of GOP
There are many reasons why GOP is important, and it’s essential to consider them all while creating DASH media that is ready for adaptive streaming. Here are just a few of them:
- Precision of seeking in a media player. Media players can start playback only from I-frames, since this type of frames is less compressed and contains the most of the information. So, if the GOP length is big, I-frames will be far away from each other, and a user of the media player won’t be able to navigate to a place between these I-frames. Instead, the media player will jump to a nearest I-frame. This effect may be undesirable, if the precision of seeking is important. By making the GOP length smaller, the precision of seeking may be increased.
- Seamless quality level switching. In adaptive streaming scenarios, media may have multiple quality levels. In the DASH standard, these quality levels are called “Representations”. The idea is that depending on a network conditions or bandwidth (usually; but other factors may also exist), the highest quality that is possible to download will be shown to the user. Also, since network conditions and bandwidth may vary over the time, the media player may detect it and seamlessly switch to another Representation. But the media player can start the playback of another Representation only starting from an I-frame, so, in order to make it possible to switch seamlessly, I-frames in different Representations must have the same locations. Otherwise, there will be noticeable “jumps” in the playback, or, depending on the media player’s capabilities, the playback may be stopped at all.
- Media segmentation. The media player does not retrieve the whole media (with all its Representations, etc.). Instead, it retrieves only chunks of media that it needs at the current moment, and those that it can actually retrieve without stopping the playback. These chunks are called segments. And in order to make seamless switching possible, segments must start with an I-frame, and durations of segments must be equal. Also, it’s possible to break the stream apart into different segments only at I-frames.
- Media server load (amount of segment requests). If the GOP length is very small, then there will be a lot of segments. This is good if the network conditions change very fast — in this case, the media player will be able to faster respond to such changes. But this also has a negative impact on a media server load, since the media player will be performing a lot of segment requests. By increasing the segment duration, it will be possible to decrease the media server load, but the media player won’t be able to switch to another quality levels very fast.
- Playback latency. Since the GOP length may control the segment duration, shorter segments may be loaded by the media player faster. As a result, the playback starts also faster.
- Media content quality. The GOP length may also affect the quality of the media (both in good and a bad way). According to one study, the “perfect” GOP length, for the best objective media quality, is from 5 to 8 seconds.
That’s a lot of things to consider, and some of them, unfortunately, may affect others. Which GOP length should you choose — depends on your requirements. But the good news is that by setting the GOP length we may address all these scenarios.
Types of GOP
There are 2 types of GOP exist. It’s important to know about them, since in DASH-IF IOP you may find this requirement (section 3.3):
“switching of video streams at closed GOP boundaries”
So, these GOP types are:
- Closed GOP. It starts with an I-frame and doesn’t reference frames from other, surrounding GOPs.
- Open GOP. It doesn’t start with an I-frame, but it references the last frame from a previous GOP.
Although open GOP increases the compression efficiency, it also increases the complexity of decoders. As a result, not all media players support it. That’s why it’s safer to use the closed GOP, and that’s why DASH-IF IOP explicitly mentions it.
Producing DASH Media
DASH-IF IOP mentions 2 video compression formats: AVC and HEVC. I’m going to use the x264 encoder for AVC media and the x265 encoder for HEVC media. Speaking of an audio compression format, it will be AAC — it’s also mentioned in DASH-IF IOP.
Speaking of encoding parameters, compression efficiency, video frame sizes, and so forth — it’s not important in this article. The only important thing is GOP.
Know Your Media
It’s very important to know the media that you’re going to work with before doing the encoding and segmenting. Speaking of GOP, the only important thing that we have to consider is the amount of frames per second (FPS). So, by knowing the FPS and the segment duration, we can easily set the GOP length to the desired value. Let’s say that there are 24 FPS in our media and we decided that the segment duration will be 4 seconds. This means that we need to put I-frames at locations that are 24 * 4 = 96 frames apart from each other (a segment will start from an I-frame and will contain 96 frames). So, the GOP length is 96.
Let’s create DASH media that consists of one video and one audio stream. The video stream will have 3 Representations (quality levels) that differ only by their bit rate. For AVC video, these bit rates will be: 400 kbps, 700 kbps and 1100 kbps. For HEVC video, bit rates will be: 200 kbps, 350 kbps and 600 kbps (the quality is close to AVC). The segment duration is 4 seconds. The audio stream will have only 1 Representation, and it will be 128 kbps.
The audio and video content will be separated from each other. Such media is called non-multiplexed (non-muxed). We’re not going to use multiplexed (muxed) media, because DASH-IF IOP explicitly says (section 3.2.1):
“only non-multiplexed Representations are supported, i.e. each Representation only contains a single media component”
Creating the AVC Media
Before creating DASH media, we have to produce appropriately encoded media first.
In FFmpeg, there are 2 ways of producing media with I-frames at specific locations:
- By specifying the GOP length, so that I-frames will be only at specific locations.
- By forcing I-frames to be at specific locations, but allowing the encoder to use I-frames wherever it wants. This is good for seeking, but it decreases the compression efficiency.
It’s up to you which one to use, but in this article we’re going to use the first one (by specifying the GOP length), not only because it provides the better compression efficiency, but also because it better demonstrates what GOP is.
Let’s create our 3 Representations for video content. The commands are:
- 400 kbps: ffmpeg -r 24 -i “1080-png\sintel_trailer_2k_%04d.png” -c:v libx264 -b:v 400k -minrate 400k -maxrate 400k -bufsize 800k -g 96 -keyint_min 96 -sc_threshold 0 -profile:v high -level 3.1 -flags +cgop -movflags faststart -preset veryslow -tune animation -r 24 -pix_fmt yuv420p -filter:v “scale=’trunc(oh*a/2)*2:720′” -vstats_file “720p-400k-stats.txt” “720p-400k.mp4”
- 700 kbps: ffmpeg -r 24 -i “1080-png\sintel_trailer_2k_%04d.png” -c:v libx264 -b:v 700k -minrate 700k -maxrate 700k -bufsize 1400k -g 96 -keyint_min 96 -sc_threshold 0 -profile:v high -level 3.1 -flags +cgop -movflags faststart -preset veryslow -tune animation -r 24 -pix_fmt yuv420p -filter:v “scale=’trunc(oh*a/2)*2:720′” -vstats_file “720p-700k-stats.txt” “720p-700k.mp4”
- 1100 kbps: ffmpeg -r 24 -i “1080-png\sintel_trailer_2k_%04d.png” -c:v libx264 -b:v 1100k -minrate 1100k -maxrate 1100k -bufsize 2200k -g 96 -keyint_min 96 -sc_threshold 0 -profile:v high -level 3.1 -flags +cgop -movflags faststart -preset veryslow -tune animation -r 24 -pix_fmt yuv420p -filter:v “scale=’trunc(oh*a/2)*2:720′” -vstats_file “720p-1100k-stats.txt” “720p-1100k.mp4”
Important parameters are:
- -g, -keyint_min and -sc_threshold. If we want to force the specific GOP length, we have to make -g and -keyint_min equal, and set -sc_threshold to 0. Also, make sure that the input and output FPS (-r parameters) are set correctly.
- -flags +cgop. This enables (+) the closed (c) GOP mode.
- -vstats_file. The stats file contains information about types of frames and their locations. This is useful if we want to ensure that the closed GOP was used, and that positions of I-frames are correct. While doing the encoding, FFmpeg also shows in the output what it (the x264 encoder, to be more precise) is going to do, so it’s a good idea to read this output too.
Now, let’s create 1 Representation for audio content. The command is:
- 128 kbps: ffmpeg -i “audio.flac” -strict experimental -c:a aac -b:a 128k “audio-128k.m4a”
Creating the DASH AVC Media
We have an appropriately encoded media, so the next step is to segment it and create the DASH manifest file (MPD). The command is:
- mp4box -dash 4000 -frag 4000 -rap -frag-rap -profile dashavc264:live -segment-name %s- “720p-400k.mp4” “720p-700k.mp4” “720p-1100k.mp4” “audio-128k.m4a” -out “DASH\AVC\sintel.mpd”
Important parameters are:
- -dash. This is the segment duration in milliseconds (which is 4 seconds, as we want). Another interesting parameter here is -frag — the duration of a fragment in a segment. Fragments and segments may look very similarly, but segments are physical files, and fragments are logical chunks of data in the segment. So, it’s possible to have multiple fragments in one physical segment. In this case, we have 1 fragment per segment.
- -profile. By specifying the “dashavc264:live” profile, we instruct MP4Box to produce media that meets DASH-IF IOP requirements.
And that’s it! Now we have the DASH AVC media with the correct GOP across different video content Representations. We can test the playback in the dash.js media player (in a web-broswer that supports MSE (Media Source Extensions)).
Creating DASH HEVC Media
Everything that was mentioned in this article for AVC, also applies to HEVC. The reason why this separate section exists is that at the moment (January 2015), FFmpeg has an issue of passing encoding parameters to the x265 encoder. So, the way how the media will be created, and how the tools will be used, is different.
First, let’s create an appropriately encoded media. The commands are:
- Raw YUV for x265: ffmpeg -r 24 -i “1080-png\sintel_trailer_2k_%04d.png” -r 24 -pix_fmt yuv420p -filter:v “scale=’trunc(oh*a/2)*2:720′” “720p.yuv”
- 200 kbps: x265_64 –fps 24 –input-res 1280×720 “720p.yuv” –profile main –level-idc 3.1 –no-high-tier –no-open-gop –keyint 96 –min-keyint 96 –no-scenecut –bitrate 200 –vbv-maxrate 200 –vbv-bufsize 400 –sar 1:1 –preset veryslow “720p-200k.h265”
- 350 kbps: x265_64 –fps 24 –input-res 1280×720 “720p.yuv” –profile main –level-idc 3.1 –no-high-tier –no-open-gop –keyint 96 –min-keyint 96 –no-scenecut –bitrate 350 –vbv-maxrate 350 –vbv-bufsize 700 –sar 1:1 –preset veryslow “720p-350k.h265”
- 600 kbps: x265_64 –fps 24 –input-res 1280×720 “720p.yuv” –profile main –level-idc 3.1 –no-high-tier –no-open-gop –keyint 96 –min-keyint 96 –no-scenecut –bitrate 600 –vbv-maxrate 600 –vbv-bufsize 1200 –sar 1:1 –preset veryslow “720p-600k.h265”
Important parameters are:
- –no-open-gop. This disables the open GOP mode, so the encoder is using the closed GOP mode.
- –keyint, –min-keyint and –no-scenecut. This forces the encoder to use the constant GOP length.
At this moment we have a bunch of .h265 files, which represent the raw HEVC bitstreams. In order to be able to segment these files and create DASH media, we need to put the raw bitstream into the MP4 container. The commands are:
- 200 kbps: mp4box -add “720p-200k.h265” “720p-200k.mp4”
- 350 kbps: mp4box -add “720p-350k.h265” “720p-350k.mp4”
- 600 kbps: mp4box -add “720p-600k.h265” “720p-600k.mp4”
Speaking of audio, it’s created the same way as in AVC case.
Now we are ready to segment the media and make DASH media from it. The command is:
- mp4box -dash 4000 -frag 4000 -rap -frag-rap -profile live -segment-name %s- “720p-200k.mp4” “720p-350k.mp4” “720p-600k.mp4” “audio-128k.m4a” -out “DASH\HEVC\sintel.mpd”
Unfortunately, no web-browsers support HEVC at this moment (January 2015), so if you want to try the playback, you may use the Osmo4 (MP4Client) player that is the part of GPAC.
Inspecting DASH Media
Sometimes, there’s a need to find out whether GOP was set correctly. Since the process of creation of DASH media consist of 2 steps (encoding and segmentation), there are 2 types of media that we may face with:
- Encoded and prepared for segmentation. This is the best type of media for inspection, since it’s pretty easy to inspect such media. Especially if you’re in control of encoding process.
- Already segmented (DASH). Inspecting such media is more difficult, since it’s already segmented. At the same time, it’s the most popular one, since people never say “hey, I created a wrongly encoded media and now its DASH version is not working”; usually, it’s more like “hey, my DASH media is not working. What’s wrong with it?”, without saying anything about how it was created.
Let’s see what we can do in both cases. I’m going to describe ways of inspecting media that I know about, so if you know other ways to do it, leave a comment!
When we were creating AVC media, we were passing the -vstats_file (stats file) parameter to FFmpeg. After the media is encoded, we now have the stats file. This file contains some useful information about frames, including their types. So, by looking at frames of type I (I-frames) and their positions, we can see whether the GOP length is correct.
If the stats file is not available, we can use MP4Box in order to get the information about frames. In can be done like this:
- mp4box -dts “720p-1100k.mp4”
This command will produce the *_ts.txt file, that contains some information about each frame. What we’re interested in is RAP (Random Access Point). For I-frames, RAP is equal to 1. For other frames it will be 0. So, by looking at positions of I-frames, we will be able to say whether the GOP length is correct.
After Segmenting (DASH Media)
Checking such media is difficult, but the first thing you can check is the amount of segments for different Representations of the same content. If GOP is constant, then there must be the same amount of segments for each Representation. And if GOP is not constant, then there’s a chance that the amount of segments for different Representations will different. That’s not very reliable way of checking GOP, but, at least, this is something.
Another, more reliable, but more difficult way to check GOP in DASH media is by using the mp4dump utility from Bento4. The command is:
- mp4dump “720p-400k-1.m4s”
In the output, in the “trun” atom, you will see the sample count. It must be equal to the GOP length. If it’s not, then this is a bad sign (unless it’s the last segment (the last segment is allowed to contain less frames than the GOP length)). Of course, in this case, you have to check all segments in all Representations, which is exactly what makes this approach difficult for manual inspection.
But how to get segments, if everything that we have is a URL to MPD? In this case we can use the “mp4-dash-clone.py” script from Bento4. It will download DASH media from the provided URL to the provided location. Unfortunately, the script at the moment (January 2015) supports only on-demand media.
As you can see, it’s much more easier to control the correctness of the DASH media on its early stages of creation. So, it’s a good idea to keep all possible information from each stage. In this case, if it will be needed to inspect the media, you’ll have everything that is needed for doing inspection fast and with correct and reliable results.
Incorrectly set GOP may affect the media playback in a variety of ways. Choosing the correct GOP is also not so easy — you have to consider many things that may affect the playback as well. Server load, the playback latency, how fast the media player will be able to switch between different Representations of the same content — that’s just a few reasons why the choice of the GOP length must be done carefully. At the same time, by knowing what GOP is and how to control it, it becomes really easy to create the correct media. Hopefully, this article provided this knowledge.
Speaking of media inspection, it may be pretty difficult, if you’re not in control of process of media creation. I guess, the best approach here is to simply educate people who create media to get the required information at all stages of DASH media creation, and keep it somewhere, so that it would be possible to access this information when it’s needed. What do you think? How would you solve this problem?