Mark Filipak (ffmpeg) markfilipak at bog.us
Mon Sep 28 23:54:42 EEST 2020

On 09/28/2020 03:49 PM, James Darnley wrote:
> On 28/09/2020, Mark Filipak (ffmpeg) <markfilipak at bog.us> wrote:
>> On 09/27/2020 03:31 PM, James Darnley wrote:
>>> On 27/09/2020, Mark Filipak (ffmpeg) <markfilipak at bog.us> wrote:
>>>> 2, Are the width & height indexes in bytes or samples? If bytes, how are
>>>> 8-bit v. 10-bit v. 12-bit
>>>> pixel formats handled at the index generation end?
>>> Width and height are given in pixels.  How that relates to bytes in
>>> memory depends on the pixel format.  Different planes can have
>>> different sizes like in the extremely common yuv420p.
>> Ah-ha #1. I think you've answered another question. The planes that ffmpeg
>> refers to are the Y, Cb,
>> and Cr samples, is that correct?
> If the pixel format is a YCbCr format, such as yuv420p, then yes.  If
> it matters to you: I am not sure of the exact order of the planes.  It
> is probably documented in the pixel format header.

Yes, I'm familiar with pixfmt.h
I find this surprising. But then, ffmpeg is full of surprises, eh?

I anticipated there would be a single ffmpeg video processing/pipeline format that decoders would 
provision. Many, differing pixel formats seems a point of complexity that promotes error.

Regarding the order of the planes, I suspect there is none. I've not examined the source code, but I 
suspect that 3 unique buffer pointers are supplied to the decoder. Also surprising is that the word 
"plane" is apparently used for both video and audio.

> RGB is also available and maybe some other more niche ones.  Oh, alpha
> channels too.  Again see the pixel format.
>> So, I'm going to make some statements that can be confirmed or refuted --
>> making statements rather
>> than asking questions is just part of my training, not arrogance. Statements
>> are usually clearer.
>> I'm trying to nail down the structures for integration into my glossary.
>> For YCbCr420, 8-bit, 720x576 (for example), the planes are separate and the
>> structures are:
>> Y: 2-dimensional, 720x576 byte array.
>> Cb: 2-dimensional, 180x144 byte array.
>> Cr: 2-dimensional, 180x144 byte array.
> What do you mean by 2 dimensional?

Width x Height.

>  IMO you should think of the planes
> as a single block of memory each.  The first pixel will be the first
> byte.  In your example the first plane in a yuv420p picture will be at
> least 720*576 bytes long.  The two chroma planes will have 360x288
> samples each with their own linesize.  I'm not sure how you got
> 180x144.  The subsampling is only a factor of 2 for 4:2:0.

I don't know what you mean. In 4:2:0 format, there are 1 each of Cb & Cr for every 4 Y.
180x144 = (720/2)x(576/2). ...Argh! Wrong! ...Duh?

Of course I should have written 360x288 -- my bad. 8-] ...brain fart! (How embarrassing.)

> The linesize can make it larger than that.  The linesize also says how
> many bytes are between the start of a row and the start of the next.
> The same color space and subsampling could be expressed in a few
> different ways.  Again it is the pixel format which says how the data
> is laid out in memory.  You will probably have yuv420p
>> Specifically, the decoder's output is not in macroblock format, correct? The
>> reason I ask for
>> confirmation is that H.262 implies that even raw pictures are in macroblock
>> format, improbable as
>> that may seem.
> An AVFrame might not come from a source that has macroblocks.  I have
> no idea what H.262 says.

Okay, some architecture, okay? I'm interested in how ffmpeg programmatically represents frames 
during processing. (Frames are represented as (W/16)*(H/16) number of macroblocks in MPEG-PSs.)

>>>   Byte order
>>> (endianess) of larger samples depends on the pixel format (but it is
>>> usually native).  The number of bytes used for a sample is given in
>>> the pixel format.  The bits are in the low N bits.
>> Ah-ha #2. I think you've answered yet another question: The arrays are
>> bytes, not bits, correct? So,
>> going from 8-bit samples to 10-bit samples doubles the sizes of the arrays,
>> correct?
> You cannot easily address bits in C and ffmpeg doesn't bother with bit
> fields.  yuv420p10 will use 16-bit words with the samples in the low
> 10 bits and the high 6 are zero.  This does have the effect of
> doubling the size of the memory buffers.
> P.S.   When I say pixel format I mean the specific ffmpeg feature.


Thanks again, James. I'm going to assume that Y, Cb, and Cr are buffered separately, i.e. that 
there's no frame struct per se.

I think that wraps it up vis-a-vis ffmpeg internal representation of video.

The U.S. political problem? Amateurs are doing the street fighting.
The Princeps Senatus and the Tribunus Plebis need their own armies.

More information about the ffmpeg-user mailing list