CSE 228: Week 3 Part 2
JPEG continued and MPEG Introduction
JPEG Stages

Quantization:
JPEG utilizes a quantization table in order to quantize the results of the DCT. Remember that JPEG is a lossy compression scheme. Quantization allows us to define which elements should receive fine quantization and which elements should receive coarse quantization. Those elements that receive a finer quantization will be reconstructed close or exactly the same as the original image, while those elements that receive coarse quantization will not be reconstructed as accurately, if at all. A quantization table is an 8x8 matrix of integers that correspond to the results of the DCT. Each entry in this table is an 8-bit integer. To quantize the data, one merely divides the result of the DCT by the quantization value and keeps the integer portion of the result. Therefore, the higher the integer in the quantization table, the coarser and more compressed the result becomes. This quantization table is defined by the compression application, not by the JPEG standard. A great deal of work goes into creating "good" quantization tables that achieve both good image quality and good compression. Also, because it is not defined by the standard, this quantization table must be stored with the compressed image in order to decompress it.
Entropy Encoding:
Last time we saw that DCT converts the spatial intensity values into the frequency domain and orders the results in a zig-zag sequence. Also, we saw that the there tends to be little or no high-frequency change in an 8x8 block. Here is where we really use this to our advantage. The first step in entropy encoding is to Run-Length Encode the zero values. Because there will be many zeroes in the high-frequency results of the DCT, we can produce long run-length encoded strings of zeroes with the zig-zag sequence. As an example, consider an image that only has non-zero values for the DC and the first two AC coefficients. If we encode row-by-row, we get DC, AC1, {0,6}; AC2, {0,7};{0,8};{0,8}; If we use the zig-zag sequence, however, we get DC,AC1,AC2,{0,61}, a tremendous space savings! Another important point to remember here is that the DC coefficients are encoded as differences from the previous DC coefficient (except for the DC coefficient of the first block, which obviously cannot do this). Because there tends to be little change from block-to-block, this gives us additional space savings. The final step in entropy encoding is to Huffman code the results. The Huffman algorithm can also have a table specified by the JPEG application, or it can create its own by examining the data. In either case, this must also be written with the compressed results in order to decompress.
JPEG Decompression
Decompressing a JPEG image is basically applying the compression process in reverse. First, the Huffman table is read and the image is entropy decoded. Then, the quantization table is read and the result is de-quantized. The process of de-quantization will naturally produce a blocky result. In order to fix this, most applications will interpolate the pixel values to bring them closer to a smooth curve. Next, the IDCT (Inverse-DCT) will be performed on this result, and from this the image will have been reconstructed.
Lossy JPEG
Loss is introduced into the JPEG compression scheme in 3 ways:
Lossless JPEG
Lossless JPEG eliminates the 3 sources of loss shown above. It encodes pixels in a 4x4 block, storing values for 3 of the pixels and encoding one as the difference from another pixel. This means that this algorithm has a maximum compression of around 25%.
Multi-Resolution JPEG
In order to deal with limited bandwidth, there are multiple resolution JPEG formats.
These multi-resolution formats are commonly used on the WWW. These allow a rough representation of the image to be drawn first, then the image is refined, and then the image is refined again into the final version.
Results of JPEG Compression
In general, the following results are seen with JPEG compression:
Remembering that the original image used 24 bpp, there is a lot of savings here!
MPEG
Background of MPEG
MPEG stands for the Motion, Movie, or Moving Picture Experts Group (all 3 can be used). This is not affiliated with Hollywood; rather it is a group of computer scientists trying to make a standard for digital representation of video.
There are several flavors of MPEG:
Why MPEG instead of JPEG for video?
JPEG is an algorithm designed exclusively for digital images. While MPEG does utilize JPEG to some extent, motion video has some additional properties that JPEG does not consider:
Because video is displayed at 30 frames per second, even JPEG cannot give us the compression necessary to make digital video feasible. However, if we can exploit the relationship between successive frames (there will likely be little or no change from frame to frame), we can compress even more. How does MPEG do this?
Interframe Coding
With 30 frames per second, you will naturally expect differences between successive frames of a video sequence to be relatively small. MPEG achieves a great deal of compression by exploiting the relationship between successive frames. Rather than encoding one initial frame and then sending only differences for all the remaining frames, MPEG uses the windowing approach. Windowing breaks up the video sequence into smaller subsequences and encodes differences only within a window, not between them. This is done for two reasons,
Each of these windows in MPEG is called a Group of Pictures (GOP). How long is a GOP? The answer: as long as you like. The length of a GOP is not specified by the MPEG standard, and a video sequence can even contain GOPs of different lengths.
Frame Types
Because I and P frames are used to predict other P and B frames, they are called Reference Frames.
Motion Estimation
Motion estimation in MPEG operates on Macroblocks. A macroblock is a 16x16 pixel range in a frame. There are two primary types of motion estimation, forward and backward. Forward prediction predicts how a macroblock from the previous reference frame moves forward into the current frame. Backward prediction predicts how a macroblock from the next reference frame moves back into the current frame. Examples are shown below, with Forward prediction in red (left-to-right) and Backward prediction in green (right-to-left)

Motion estimation operates as follows: First, compare a macroblock of the current frame against all 16x16 regions of the frame you are predicting from. Then, select the 16x16 region with the least mean-squared error from the current macroblock and encode a motion vector, which specifies the 16x16 region you are predicting from and the error values for each pixel in the macroblock. This is done only for the combined Y,U, and V values. Subsampling and separation of the Y, U, and V bands comes later.
There are four types of macroblocks:
It is important to remember that P and B frames can contain intracoded macroblocks as well as predicted macroblocks if there is no efficient way to predict the macroblock.
Decoding vs. Presentation order
MPEG is actually used in decoding order rather than presentation order. Examples of both follow:
Presentation Order
I1 B2 B3 B4 P5 B6
B7 B8 P9 B10 B11 B12 I13
Decoding Order
I1 P5 B2 B3 B4 P9
B6 B7 B8 I13 B10 B11 B12
The reason for the difference is that in order to decode a predicted frame, all frames that it may be predicting from must be decoded first. Therefore, since B2..4 may all be predicting from both I1 and P5, both must be decoded before B2..4. This distinction becomes very important when you work with MPEG.
Independent vs. Dependent GOPs
Independent GOPs do not depend on any frames of the previous GOP for prediction. Dependent GOPs depend on a reference frame from another GOP for prediction. Examples follow (in decoding order):
Case 1: GOP1 is dependent upon GOP2 (which starts with I13)
I1 P5 B2 B3 B4 P9
B5 B6 B7 I13 B10 B11 B12
Case 2: GOP1 is not dependent upon GOP2 (which starts with I13)
I1 P5 B2 B3 B4 P9
B5 B6 B7 P12 B10 B11 I13
To illustrate the difference, imagine trying to perform a simple edit operation and cut out GOP2. In order to do this, I13 must be removed. If this happens, B10..12 will not be able to be decoded since they depend on I13. In the second case no frames in the first GOP depend on the second GOP, making this operation possible. As shown here, if you want to make a dependent GOP independent, end the current GOP with a P frame.
Bandwidth
Bandwidth is a major concern in digital video. For MPEG, you can remember the following:
MPEG-1 is fixed to a maximum 1.2Mbit/second bandwidth. If an encoded MPEG-1 stream is larger than this, the encoder will have to make the quantization more coarse (to increase compression) and re-encode the sequence. This idea is called feedback, where the output of the encoder is analyzed and changes the input back into the encoder until the sequence is acceptable.
Motion Estimation and Subsampling:
In MPEG, Motion estimation is done BEFORE subsampling and separation of the Y U and V bands. This means that there is only one motion vector for a macroblock rather than one for each of the 3 bands (Y, U, and V). The results of motion estimation will then be processed similar to JPEG in the following manner:
Note that due to subsampling of U and V, one 16x16 macroblock will contain 4 Y blocks, 1 U, and 1 V block.
This sequence is done for all types of frames. Although it may at first seem counterintuitive, the error matrices from motion vectors are also passed through DCT and the remaining steps of JPEG. The reason for this is that you expect little or no change in the macroblock as it moves from frame to frame. Any change that the macroblock does go through will likely be a change that will affect the entire region or low frequency gradual change from one side of the region to another. As an example, consider a macroblock that is forward predicted into a region that is covered by a shadow. The Y component of each pixel may reduce by a constant factor, and the U and V values will not change. Therefore, if you think of it in this manner, DCT will encode change that affects an entire region as the DC coefficient, and the remaining AC coefficients in this case would be near zero. This enables us to save a great deal of space.
Error Handling:
MPEG-1 is a format designed for computer use only, it is not intended for broadcast purposes. In MPEG-1, if you lose an I frame then the entire GOP is lost. If you lose a P frame, you can lose all frames until the next reference frame.