[FFmpeg-user] Question about macroblocks in soft telecined video

Sun Sep 6 10:26:59 EEST 2020

On 09/06/2020 02:26 AM, Carl Eugen Hoyos wrote:
> Am So., 6. Sept. 2020 um 06:20 Uhr schrieb Mark Filipak
> <markfilipak.windows+ffmpeg at gmail.com>:
> 
>> I would guess that, for an undecoded video that's soft telecined (i.e. @24/1.001 FPS),
>> the interlace in the macroblocks is field-based (i.e. the same as if @30/1.001 FPS),
>> not frame-based (i.e. the same as if @24 FPS).
> 
> This does not make sense.

Okay. I followed what I wrote with an example (below), so I'll go with the example because... 
examples are usually easier to understand than are abstractions.

>> Specifically, for YCbCr420, blocks 5 (Cb) & 6 (Cr) are deinterlaced into top & bottom
>> fields (in the top-half & bottom-half of the blocks) rather than being progressive.
> 
> This seems more difficult to understand but I don't think it makes sense either.

A YCbCr420 macroblock contains 6 blocks: blocks 1..4, which are 256 samples of Y (luminance) at full 
resolution, and blocks 5+6, which are each 64 samples of chrominance: Cb & Cr, at 1/4 resolution.

I'm happy to I explain.

For undecoded, frame-based (so-called "progressive") macroblocks, the Y-blocks (1..4) are 4-way 
interleaved in the stream (2x2 samples/quad, 4x4 quads/block, 2x1 blocks/field, and 1x2 
fields/frame) -- [1] -- and must be deinterleaved by the decoder to arrive at whole and unbroken 
(concurrent) frames. Staying with frame-based macroblock, the Cb & Cr blocks (5 & 6) are entirely 
uninterleaved because 64 samples (remember, 1/4 resolution) fit in 64 bytes.

[1] I reserve the word "interlace" for whole sample rows, only, not parts of sample rows. 
Frame-based macroblocks are never deinterlaced when they're encoded. Consequently, they don't have 
to be interlaced when they're decoded because they're already interlaced.

For undecoded, field-based (so-called "interlaced") macroblocks, in addition to the 4-way 
interleaves, the 2x2 quads between blocks 1 & 3 (and also between blocks 2 & 4) are deinterlaced in 
the stream and must be interlaced by the decoder to arrive at whole and unbroken frames. The 
equivalent deinterlace for chrominance is that the rows in top half of block 5 (Cb) are deinterlaced 
with the rows in the bottom half of block 5, and must be interlaced by the decoder. Likewise for 
block 6 (Cr).

You see, to you guys, all that's important is that frames come out of the decoder as whole, unbroken 
streams. But to anyone who examines disc files (VOBs), the stuff on the disc is all we have to work 
with when trying to figure out a strategy for identifying the nature of the source. And the nature 
of the source is important.

Okay, to proceed. Soft telecined video is actually 23/1.001 frames per second of video even though 
the metadata tells the decoder to produce 30/1.001 FPS. Of course, the metadata is the key to how 
the stream 'teaches' the decoder how to telecine. MPV is smart enough to recognize 23/1.001 FPS data 
and to ignore the metadata and to play at 23/1.001 FPS. Ffmpeg can do the same thing (and thereby 
eliminate the need to transcode), but the ffmpeg user has to tell ffmpeg to do it.

Okay, so what does this have to do with macroblocks? Well, I'm writing a video glossary and I want 
it to be complete. For example, think back to the controversy we've had regarding the meaning of 
"interlace". From your perspective, interlace is function that the decoder performs. From my 
perspective, interlace is a condition of the stream. From your perspective, it seems like I don't 
know what I'm talking about. From my perspective, it seems like ffmpeg developers are using sloppy 
nomenclature. We have both been right and wrong.

Studying macroblocks has shown me what your perspective is. To you, interlacing is what the decoder 
must do. Of course that's correct. To me (prior to studying macroblocks), interlacing was an 
architectural feature of the stream (program or transport), and saying that fields that are clearly 
deinterlaced in the stream are 'interlaced' just didn't make sense. Of course, your perspective is 
after decoding while my perspective is prior to decoding (because discs contain undecoded 
macroblocks!). I hope I've made myself clearly understood.

By the way, I have made pictures of all this stuff. Would you like to see them?

-- 
Racism is like roaches. When you switch on the light, they scurry.
But if you don't switch on the light, you don't know they're there.