[FFmpeg-devel] [RFC]] swscale modernization proposal

Sun Jun 23 22:00:24 EEST 2024

On Sun, Jun 23, 2024 at 7:46 PM Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Sun, Jun 23, 2024 at 12:19:13AM +0200, Vittorio Giovara wrote:
> > On Sat, Jun 22, 2024 at 3:22 PM Niklas Haas <ffmpeg at haasn.xyz> wrote:
> >
> > > Hey,
> > >
> > > As some of you know, I got contracted (by STF 2024) to work on
> improving
> > > swscale, over the course of the next couple of months. I want to share
> my
> > > current plans and gather feedback + measure sentiment.
> > >
> > > ## Problem statement
> > >
> > > The two issues I'd like to focus on for now are:
> > >
> > > 1. Lack of support for a lot of modern formats and conversions (HDR,
> ICtCp,
> > >    IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
> > > 2. Complicated context management, with cascaded contexts, threading,
> > > stateful
> > >    configuration, multi-step init procedures, etc; and related bugs
> > >
> > > In order to make these feasible, some amount of internal
> re-organization of
> > > duties inside swscale is prudent.
> > >
> > > ## Proposed approach
> > >
> > > The first step is to create a new API, which will (tentatively) live in
> > > <libswscale/avscale.h>. This API will initially start off as a
> near-copy
> > > of the
> > > current swscale public API, but with the major difference that I want
> it
> > > to be
> > > state-free and only access metadata in terms of AVFrame properties. So
> > > there
> > > will be no independent configuration of the input chroma location etc.
> like
> > > there is currently, and no need to re-configure or re-init the context
> when
> > > feeding it frames with different properties. The goal is for users to
> be
> > > able
> > > to just feed it AVFrame pairs and have it internally cache expensive
> > > pre-processing steps as needed. Finally, avscale_* should ultimately
> also
> > > support hardware frames directly, in which case it will dispatch to
> some
> > > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo.
> (But I
> > > will
> > > defer this to a future milestone)
> > >
> > > After this API is established, I want to start expanding the
> functionality
> > > in
> > > the following manner:
> > >
> > > ### Phase 1
> > >
> > > For basic operation, avscale_* will just dispatch to a sequence of
> > > swscale_*
> > > invocations. In the basic case, it will just directly invoke swscale
> with
> > > minimal overhead. In more advanced cases, it might resolve to a
> *sequence*
> > > of
> > > swscale operations, with other operations (e.g. colorspace conversions
> a la
> > > vf_colorspace) mixed in.
> > >
> > > This will allow us to gain new functionality in a minimally invasive
> way,
> > > and
> > > will let API users start porting to the new API. This will also serve
> as a
> > > good
> > > "selling point" for the new API, allowing us to hopefully break up the
> > > legacy
> > > swscale API afterwards.
> > >
> > > ### Phase 2
> > >
> > > After this is working, I want to cleanly separate swscale into two
> distinct
> > > components:
> > >
> > > 1. vertical/horizontal scaling
> > > 2. input/output conversions
> > >
> > > Right now, these operations both live inside the main SwsContext, even
> > > though
> > > they are conceptually orthogonal. Input handling is done entirely by
> the
> > > abstract callbacks lumToYV12 etc., while output conversion is currently
> > > "merged" with vertical scaling (yuv2planeX etc.).
> > >
> > > I want to cleanly separate these components so they can live inside
> > > independent
> > > contexts, and be considered as semantically distinct steps. (In
> particular,
> > > there should ideally be no more "unscaled special converters", instead
> > > this can
> > > be seen as a special case where there simply is no vertical/horizontal
> > > scaling
> > > step)
> > >
> > > The idea is for the colorspace conversion layer to sit in between the
> > > input/output converters and the horizontal/vertical scalers. This all
> > > would be
> > > orchestrated by the avscale_* abstraction.
> > >
> > > ## Implementation details
> > >
> > > To avoid performance loss from separating "merged" functions into their
> > > constituents, care needs to be taken such that all intermediate data,
> in
> > > addition to all involved look-up tables, will fit comfortably inside
> the L1
> > > cache. The approach I propose, which is also (afaict) used by zscale,
> is to
> > > loop over line segments, applying each operation in sequence, on a
> small
> > > temporary buffer.
> > >
> > > e.g.
> > >
> > > hscale_row(pixel *dst, const pixel *src, int img_width)
> > > {
> > >     const int SIZE = 256; // or some other small-ish figure, possibly a
> > > design
> > >                           // constant of the API so that SIMD
> > > implementations
> > >                           // can be appropriately unrolled
> > >
> > >     pixel tmp[SIZE];
> > >     for (i = 0; i < img_width; i += SIZE) {
> > >         int pixels = min(SIZE, img_width - i);
> > >
> > >         { /* inside read input callback */
> > >             unpack_input(tmp, src, pixels);
> > >             // the amount of separation here will depend on the
> performance
> > >             apply_matrix3x3(tmp, yuv2rgb, pixels);
> > >             apply_lut3x1d(tmp, gamma_lut, pixels);
> > >             ...
> > >         }
> > >
> > >         hscale(dst, tmp, filter, pixels);
> > >
> > >         src += pixels;
> > >         dst += scale_factor(pixels);
> > >     }
> > > }
> > >
> > > This function can then output rows into a ring buffer for use inside
> the
> > > vertical scaler, after which the same procedure happens (in reverse)
> for
> > > the
> > > final output pass.
> > >
> > > Possibly, we also want to additionally limit the size of a row for the
> > > horizontal scaler, to allow arbitrary large input images.
> > >
> > > ## Comments / feedback?
> > >
> > > Does the above approach seem reasonable? How do people feel about
> > > introducing
> > > a new API vs. trying to hammer the existing API into the shape I want
> it
> > > to be?
> > >
> > > I've attached an example of what <avscale.h> could end up looking
> like. If
> > > there is broad agreement on this design, I will move on to an
> > > implementation.
> > >
> >
> > What do you think of the concept of kernels like
> > https://github.com/lu-zero/avscale/blob/master/kernels/rgb2yuv.c
> > The idea is that there is a bit of analysis on input and output format
> > requested, and either a specialized kernel is used, or a chain of kernels
> > is built and data is passed along.
> > Among the design goals of that library, there was also readability (so
> that
> > the flow was always under control) and the ease of writing assembly
> and/or
> > shader for any single kernel.
>
> I think I have not looked at lucas work before, so i cannot comment on it
> specifically
> But i think what you suggest is what Niklas intends to do.
> swscale has evolved over a long time from code with a very small subset of
> the current features. The code is in need for being "refactored" into some
> cleaner kernel / modular design.
> Also as you mention lu_zero, I had talked with him very briefly and he will
> be on the next extra member vote for the GA (whoever initiates it, ill try
> to
> make sure luca is not forgotten) Just saying, i have not forgotten
> him, just that i wanted to accumulate more people before bringing that up.
>
>
> >
> > Needless to say I support the plan of renaming the library so that it can
>
> As the main author of libswscale, i find this quite offensive.
>

Looks like Rust is not so popular, so bigger coins are in FFland.

>
> thx
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Everything should be made as simple as possible, but not simpler.
> -- Albert Einstein
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>