[FFmpeg-devel] RFC: new packed pixel formats (machine vision)

Tue Oct 15 09:55:51 EEST 2024

Hi All,

I want to pick up a discussion i started last week
(https://ffmpeg.org/pipermail/ffmpeg-devel/2024-October/334585.html)
in a new thread, with the relevant information nicely organized. This
is about adding pixel formats common in machine vision to ffmpeg
(though i understand some formats may also be used by cinema cameras),
and supporting them as input formats in swscale so that it becomes
easy to use ffmpeg for machine vision purposes (I already have such
software, it will be open-sourced in good time, but right now there is
a proprietary conversion layer from Basler i need to replace (e.g. by
this proposal)).

Example formats are 10 and 12 bit Bayer formats, where the 10 bit
cannot be represented in AVPixFmtDescriptors as currently as effective
bit depth for the red and blue channels is 2.5 bits, but component
depths should be integers. Other example formats are 10bit gray
formats where multiple values are packed without padding over multiple
bytes (e.g. 4 10-bit pixels packed into 5 bytes, so not aligned to 16
or 32 bits).

See https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-monochrome-pixel-formats.html
for a diagram of the Mono10p and
https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-raw-bayer-pixel-formats.html
for diagrams of the packed and not packed bayer formats.

Here a proposal for how these new formats could be encoded into
AVPixFmtDescriptor, so that these can then be used in ffmpeg/swscale.
I have taken care that none of the existing pixel formats or any code
dealing with them would be affected, although new code would be needed
to handle these new formats (av_read_image_line2, av_write_image_line2
and functions printing info about AVPixFmtDescriptors, plus swscale of
course--i commit to do a full audit to ensure nothing else is missed).

First, two new flags are needed (usages are shown below in the example
new pixel formats). I propose:
- AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL which indicates that the value
in the component depths (ints) represent a 16 bit numerator and
denominator packed into the int. That should be able to store any
value that could ever be possible and importantly allows for the
fractional bit depths needed for the bayer formats.
- AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED which indicates formats that are
bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g.
4 10-bit values in 5 bytes). This flag is needed because
AV_PIX_FMT_FLAG_BITSTREAM
formats are aligned to 8 or 32 bits, and this kind of unaligned
packing needs special handling ( see below).

Using these flags, here are some example new pixel formats:
    [AV_PIX_FMT_BAYER_RGGB10] = {
        .name = "bayer_rggb10",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 (10<<16 + 4) */
            { 0, 2, 0, 0, 655362 },  /* 5: 10/2 */
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 */
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL,
    },
    [AV_PIX_FMT_BAYER_RGGB12] = {
        .name = "bayer_rggb12",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 3 },
            { 0, 2, 0, 0, 6 },
            { 0, 2, 0, 0, 3 },
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER,
    },
    [AV_PIX_FMT_BAYER_GRAY10P] = {
        .name = "gray10p",
        .nb_components = 1,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 10 },       /* Y */
        },
        .flags = AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED,
    },
    [AV_PIX_FMT_BAYER_RGGB10P] = {
        .name = "bayer_rggb10p",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 (10<<16 + 2) */
            { 0, 2, 0, 0, 655362 },  /* 5: 10/2 */
            { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 */
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL |
AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED,
    },
    [AV_PIX_FMT_BAYER_RGGB12P] = {
        .name = "bayer_rggb12p",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, 3 },
            { 0, 2, 0, 0, 6 },
            { 0, 2, 0, 0, 3 },
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED,
    },

When a AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED is encountered, one needs
to find out how many bytes are used to store how many samples (with a
"sample" I refer to one color channel value or a gray scale value).
This information can be distilled from the AVPixFmtDescriptor as
follows:
gray10p: sum(component_bit_depths)=10: least common multiple of 10 and
8 is 40, so there are 40/10=4 samples packed in to 40/8=5 bytes.
bayer_rggb10p: sum(component_bit_depths)=10: least common multiple of
10 and 8 is 40, so there are 40/10=4 samples packed in to 40/8=5
bytes.
bayer_rggb12p: sum(component_bit_depths)=12: least common multiple of
12 and 8 is 24, so there are 24/12=2 samples packed in to 24/8=3
bytes.
Presence of the AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED flag indicates
that such computations are needed and leaves it flexible how many
samples are packed into how many bytes.

I have not thought about whether this would also allow turning v210
(v210enc/dec, AV_CODEC_ID_V210 ) into a pixel format and deprecating
the encoder/decoder (presumably its a good thing to remove this
special handling), or whether this scheme then runs into a limitation.
bitpacked_enc (AV_CODEC_ID_BITPACKED) should also be examined. I leave
examining this for a later stage after comments on the above proposal.

Looking forward to hearing what you/the list think!

All the best,
Dee