Misplaced Pages

MPEG-2: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 22:16, 18 November 2003 editJzhang (talk | contribs)1,134 editsNo edit summary← Previous edit Revision as of 18:52, 27 November 2003 edit undo213.68.175.5 (talk)No edit summaryNext edit →
Line 9: Line 9:
=== MPEG-2 video coding (simplified) === === MPEG-2 video coding (simplified) ===


MPEG-2 is for the ''generic coding of moving pictures and associated audio'' and creates a video stream out of three types of frame data (intraframes, forward predictive frames and bilinear frames) arranged in a specified order called the GOP structure (GOP = Group Of Pictures - see below). The originating material is a uncompressed video sequence at a pre-set pixel resolution at 25 (]) or 30 (]) frames/second with sound. MPEG-2 is for the ''generic coding of moving pictures and associated audio'' and creates a video stream out of three types of frame data (intra frames, forward predictive frames and bidirectional predicted frames) arranged in a specified order called the GOP structure (GOP = Group Of Pictures - see below).
Typically the originating material is a video sequence at a pre-set pixel resolution at 25 (]) or 29.997 (]) frames/second with sound.


MPEG-2 supports both interlaced and progressive scan video streams. In progressive scan streams, the basic unit of encoding is a frame, while in interlaced streams, the basic unit is a field. In the discussion below, the generic terms "picture" and "image" refer to either fields or frames, depending on the type of stream. MPEG-2 supports both interlaced and progressive scan video streams. In progressive scan streams, the basic unit of encoding is a frame, while in interlaced streams, the basic unit is a field. In the discussion below, the generic terms "picture" and "image" refer to either fields or frames, depending on the type of stream.


The MPEG-2 stream is made up of a series of data frames encoding pictures. The three ways of encoding a picture are: intra-coded (I picture), forward predictive (P picture) and bilinear predictive (B picture). The MPEG-2 stream is made up of a series of data frames encoding pictures. The three ways of encoding a picture are: intra-coded (I picture), forward predictive (P picture) and bidirectional predictive (B picture).


The video image is separated into one ] and two ] channels. It is also divided into 16x16 pixel "macroblocks", which are the basic unit of coding within a picture. Each macroblock is divided into four 8x8 luminance blocks. The number of 8x8 chrominance blocks per macroblock depends on the ] of the source image. For example, in the common 4:2:0 format, there is one chrominance block per macroblock for each of the channels, making a total of six blocks per macroblock. The video image is separated into one ] (Y) and two ] channels (also called color difference signals U and V). It is also divided into "macroblocks", which are the basic unit of coding within a picture. Each macroblock is divided into four 8x8 luminance blocks. The number of 8x8 chrominance blocks per macroblock depends on the ] of the source image. For example, in the common 4:2:0 format, there is one chrominance block per macroblock for each of the channels, making a total of six blocks per macroblock.


In the case of I pictures, the actual image data is then passed through the encoding process described below. P and B pictures are first subjected to a process of "motion compensation", in which they are correlated with the previous (and in the case of B pictures, the next) image. Each macroblock in the P or B picture is then associated with an area in the previous or next image that is well-correlated with it. The "motion vector" that maps the macroblock to its correlated area is encoded, and then the difference between the two areas is passed through the encoding process described below. In the case of I pictures, the actual image data is then passed through the encoding process described below. P and B pictures are first subjected to a process of "motion compensation", in which they are correlated with the previous (and in the case of B pictures, the next) image. Each macroblock in the P or B picture is then associated with an area in the previous or next image that is well-correlated with it. The "motion vector" that maps the macroblock to its correlated area is encoded, and then the difference between the two areas is passed through the encoding process described below.
Line 23: Line 25:
I pictures encode for spacial redundancy, P and B pictures for temporal redundancy. Because adjacent frames in a video stream are often well-correlated, P pictures may be 10% of the size of I pictures, and B pictures 2% of their size. I pictures encode for spacial redundancy, P and B pictures for temporal redundancy. Because adjacent frames in a video stream are often well-correlated, P pictures may be 10% of the size of I pictures, and B pictures 2% of their size.


The sequence of different frame types is called the Group of Pictures (GOP) structure. There are many possible structures but a common one is 15 frames long, and has the sequence IBBPBBPBBPBBPBBPBB. A similar 12 frame sequence is also common. The ratio of I, P and B pictures in the GOP structure is determined by the nature of the video stream and the bandwidth constraints on the output stream, although encoding time may also be an issue. This is particularly true in live transmission and in real-time environments with limited computing resources, as a stream containing many B pictures can take three times longer to encode than an I-picture-only file. The sequence of different frame types is called the Group of Pictures (GOP) structure. There are many possible structures but a common one is 15 frames long, and has the sequence I_BB_P_BB_P_BB_P_BB_P_BB_P_BB_. A similar 12 frame sequence is also common. The ratio of I, P and B pictures in the GOP structure is determined by the nature of the video stream and the bandwidth constraints on the output stream, although encoding time may also be an issue. This is particularly true in live transmission and in real-time environments with limited computing resources, as a stream containing many B pictures can take three times longer to encode than an I-picture-only file.


The output bit-rate of an MPEG-2 encoder can be constant or variable, with the maximum bit rate determined by the playback media - for example the DVD movie maximum is 10.4 Mbit/s. To achieve a constant bit-rate the degree of quantization is iteratively altered to achieve the output bit-rate requirement. Increasing quantization leads to visible artefacts when the stream is decoded, generally in the form of "mosaicing", where the discontinuities at the edges of macroblocks become more visible as bit rate is reduced. ] Launched The ] Professional Videotape For an Bitrate Of Up To 50Mbps @ 4:2:2P@ML The output bit-rate of an MPEG-2 encoder can be constant or variable, with the maximum bit rate determined by the playback media - for example the DVD movie maximum is 10.4 Mbit/s. To achieve a constant bit-rate the degree of quantization is iteratively altered to achieve the output bit-rate requirement. Increasing quantization leads to visible artefacts when the stream is decoded, generally in the form of "mosaicing", where the discontinuities at the edges of macroblocks become more visible as bit rate is reduced. ] Launched The ] Professional Videotape For an Bitrate Of Up To 50Mbps @ 4:2:2P@ML
Line 29: Line 31:
=== MPEG-2 audio encoding === === MPEG-2 audio encoding ===


MPEG-2 also introduces new audio encoding methods.
MPEG-2 includes compressed audio (MPEG layer 3, or ]), which supports bit rates between 32 Kbit/s and 384 Kbit/s. 384 Kbit/s is common for DVD movies. MPEG-2 Is Also Compatible With ] And ] Surround Sound
These are
- low bitrate encoding with halved sampling rate (MPEG-1 Layer 1/2/3 LSF)
- multichannel encoding with up to 5.1 channels

MPEG Audio plays only a very little role on DVD, Digital Dolby is
nearly always used.
Digital Dolby 5.1 @ 384 kbps and
Digital Dolby 2.0 @ 192 kbps is common for DVD movies.
Bitrate of Digital Dolby on DVD is restricted to 448 kbps, which is not
enough for highquality 5.1 audio.

dts is an additional option with 754 or 1410 kbps.


=== MPEG-2 standards === === MPEG-2 standards ===

Revision as of 18:52, 27 November 2003


MPEG-2 (1994) is the designation for a group of audio and video coding standards agreed upon by MPEG (Motion Pictures Coding Experts Group), and published as ISO standard 13818. MPEG-2 is typically used to encode audio and video for broadcast signals, including digital satellite and Cable TV. MPEG-2, with some modifications, is also the coding format used by standard commercial DVD movies.

MPEG-2 is similar to MPEG-1, but also provides support for interlaced video (the format used by broadcast TV systems.) MPEG-2 video is not optimized for low bit-rates (less than 1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. MPEG-2 also introduces and defines Transport Streams, which are designed to carry digital video and audio over unreliable media, and are used in broadcast applications. With some enhancements, MPEG-2 is also the current standard for HDTV transmission. A standards-compliant MPEG-2 decoder should be capable of playing back MPEG-1 streams.

MPEG-2 audio, defined in Part 3 of the standard, enhances MPEG-1's audio by allowing the coding of audio programs with more than two channels. Part 3 of the standard allows this to be done in a backwards compatible way, allowing MPEG-1 audio decoders to decode the two main stereo components of the presentation, or in a non backwards compatible way, which allows encoders to make better use of available bandwidth. MPEG-2 supports various audio formats, including MPEG-2 AAC.

MPEG-2 video coding (simplified)

MPEG-2 is for the generic coding of moving pictures and associated audio and creates a video stream out of three types of frame data (intra frames, forward predictive frames and bidirectional predicted frames) arranged in a specified order called the GOP structure (GOP = Group Of Pictures - see below).

Typically the originating material is a video sequence at a pre-set pixel resolution at 25 (CCIR) or 29.997 (FCC) frames/second with sound.

MPEG-2 supports both interlaced and progressive scan video streams. In progressive scan streams, the basic unit of encoding is a frame, while in interlaced streams, the basic unit is a field. In the discussion below, the generic terms "picture" and "image" refer to either fields or frames, depending on the type of stream.

The MPEG-2 stream is made up of a series of data frames encoding pictures. The three ways of encoding a picture are: intra-coded (I picture), forward predictive (P picture) and bidirectional predictive (B picture).

The video image is separated into one luminance (Y) and two chrominance channels (also called color difference signals U and V). It is also divided into "macroblocks", which are the basic unit of coding within a picture. Each macroblock is divided into four 8x8 luminance blocks. The number of 8x8 chrominance blocks per macroblock depends on the chrominance format of the source image. For example, in the common 4:2:0 format, there is one chrominance block per macroblock for each of the channels, making a total of six blocks per macroblock.

In the case of I pictures, the actual image data is then passed through the encoding process described below. P and B pictures are first subjected to a process of "motion compensation", in which they are correlated with the previous (and in the case of B pictures, the next) image. Each macroblock in the P or B picture is then associated with an area in the previous or next image that is well-correlated with it. The "motion vector" that maps the macroblock to its correlated area is encoded, and then the difference between the two areas is passed through the encoding process described below.

Each block is treated with an 8x8 discrete cosine transform. The resulting DCT coefficients are then quantized according to a pre-defined scheme, re-ordered to maximize the probability of long runs of zeros, and run-length coded. Finally a fixed-table huffman encoding scheme is applied.

I pictures encode for spacial redundancy, P and B pictures for temporal redundancy. Because adjacent frames in a video stream are often well-correlated, P pictures may be 10% of the size of I pictures, and B pictures 2% of their size.

The sequence of different frame types is called the Group of Pictures (GOP) structure. There are many possible structures but a common one is 15 frames long, and has the sequence I_BB_P_BB_P_BB_P_BB_P_BB_P_BB_. A similar 12 frame sequence is also common. The ratio of I, P and B pictures in the GOP structure is determined by the nature of the video stream and the bandwidth constraints on the output stream, although encoding time may also be an issue. This is particularly true in live transmission and in real-time environments with limited computing resources, as a stream containing many B pictures can take three times longer to encode than an I-picture-only file.

The output bit-rate of an MPEG-2 encoder can be constant or variable, with the maximum bit rate determined by the playback media - for example the DVD movie maximum is 10.4 Mbit/s. To achieve a constant bit-rate the degree of quantization is iteratively altered to achieve the output bit-rate requirement. Increasing quantization leads to visible artefacts when the stream is decoded, generally in the form of "mosaicing", where the discontinuities at the edges of macroblocks become more visible as bit rate is reduced. Sony Launched The MPEG IMX Professional Videotape For an Bitrate Of Up To 50Mbps @ 4:2:2P@ML

MPEG-2 audio encoding

MPEG-2 also introduces new audio encoding methods. These are - low bitrate encoding with halved sampling rate (MPEG-1 Layer 1/2/3 LSF) - multichannel encoding with up to 5.1 channels

MPEG Audio plays only a very little role on DVD, Digital Dolby is nearly always used. Digital Dolby 5.1 @ 384 kbps and Digital Dolby 2.0 @ 192 kbps is common for DVD movies. Bitrate of Digital Dolby on DVD is restricted to 448 kbps, which is not enough for highquality 5.1 audio.

dts is an additional option with 754 or 1410 kbps.

MPEG-2 standards

ISO/IEC 13818-1
Systems - describes synchronization and multiplexing of video and audio.
ISO/IEC 13818-2
Video - compression codec for interlaced and non-interlaced video signals.
ISO/IEC 13818-3
Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio (MP3).
ISO/IEC 13818-4
Describes procedures for testing compliance.
ISO/IEC 13818-5
Describes systems for Software simulation.
ISO/IEC 13818-6
Describes extensions for DSM-CC (Digital Storage Media Command and Control.)
ISO/IEC 13818-7
Advanced Audio Coding (AAC)
ISO/IEC 13818-9
Extension for real time interfaces.
ISO/IEC 13818-10
Conformance extensions for DSM-CC.