[FFmpeg-devel] [RFC]] swscale modernization proposal

Sat Jun 29 14:47:43 EEST 2024

On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg at haasn.xyz> wrote:
> Hey,
> 
> As some of you know, I got contracted (by STF 2024) to work on improving
> swscale, over the course of the next couple of months. I want to share my
> current plans and gather feedback + measure sentiment.
> 
> ## Problem statement
> 
> The two issues I'd like to focus on for now are:
> 
> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
>    IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
> 2. Complicated context management, with cascaded contexts, threading, stateful
>    configuration, multi-step init procedures, etc; and related bugs
> 
> In order to make these feasible, some amount of internal re-organization of
> duties inside swscale is prudent.
> 
> ## Proposed approach
> 
> The first step is to create a new API, which will (tentatively) live in
> <libswscale/avscale.h>. This API will initially start off as a near-copy of the
> current swscale public API, but with the major difference that I want it to be
> state-free and only access metadata in terms of AVFrame properties. So there
> will be no independent configuration of the input chroma location etc. like
> there is currently, and no need to re-configure or re-init the context when
> feeding it frames with different properties. The goal is for users to be able
> to just feed it AVFrame pairs and have it internally cache expensive
> pre-processing steps as needed. Finally, avscale_* should ultimately also
> support hardware frames directly, in which case it will dispatch to some
> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
> defer this to a future milestone)

So, I've spent the past days implementing this API and hooking it up to
swscale internally. (For testing, I am also replacing `vf_scale` by the
equivalent AVScale-based implementation to see how the new API impacts
existing users). It mostly works so far, with some left-over translation
issues that I have to address before it can be sent upstream.

------

One of the things I was thinking about was how to configure
scalers/dither modes, which sws currently, somewhat clunkily, controls
with flags. IMO, flags are not the right design here - if anything, it
should be a separate enum/int, and controllable separately for chroma
resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80).

That said, I think that for most end users, having such fine-grained
options is not really providing any end value - unless you're already
knee-deep in signal theory, the actual differences between, say,
"natural bicubic spline" and "Lanczos" are obtuse at best and alien at
worst.

My idea was to provide a single `int quality`, which the user can set to
tune the speed <-> quality trade-off on an arbitrary numeric scale from
0 to 10, with 0 being the fastest (alias everything, nearest neighbour,
drop half chroma samples, etc.), the default being something in the
vicinity of 3-5, and 10 being the maximum quality (full linear
downscaling, anti-aliasing, error diffusion, etc.).

The upside of this approach is that it would be vastly simpler for most
end users. It would also track newly added functionality automatically;
e.g. if we get a higher-quality tone mapping mode, it can be
retroactively added to the higher quality presets. The biggest downside
I can think of is that doing this would arguably violate the semantics
of a "bitexact" flag, since it would break results relative to
a previous version of libswscale - unless we maybe also force a specific
quality level in bitexact mode?

Open questions:

1. Is this a good idea, or do the downsides outweigh the benefits?
2. Is an "advanced configuration" API still needed, in addition to the
   quality presets?

------

I have attached my current working draft of the public half of
<avscale.h>, for reference. You can also find my implementation draft at
the time of writing here:

https://github.com/haasn/FFmpeg/blob/avscale/libswscale/avscale.h
-------------- next part --------------
/*
 * Copyright (C) 2024 Niklas Haas
 *
 * This file is part of FFmpeg.
 *
 * FFmpeg is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * FFmpeg is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public
 * License along with FFmpeg; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 */

#ifndef SWSCALE_AVSCALE_H
#define SWSCALE_AVSCALE_H

/**
 * @file
 * @ingroup libsws
 * Higher-level wrapper around libswscale + related libraries, which is
 * capable of handling more advanced colorspace transformations.
 */

#include "libavutil/frame.h"
#include "libavutil/log.h"

/**
 * Main external API structure. New fields cannot be added to the end with
 * minor version bumps. Removal, reordering and changes to existing fields
 * require a major version bump. sizeof(AVScaleContext) is not part of the ABI.
 */
typedef struct AVScaleContext {
    const AVClass *av_class;

    /**
     * Private context used for internal data.
     */
    struct AVScaleInternal *internal;

    /**
     * Private data of the user, can be used to carry app specific stuff.
     */
    void *opaque;

    /**
     * Bitmask of AV_SCALE_* flags.
     */
    int64_t flags;

    /**
     * How many threads to use for processing, or 0 for automatic selection.
     */
    int threads;

    /**
     * Quality factor (0-10). The default quality is [TBD]. Higher values
     * sacrifice speed in exchange for quality.
     *
     * TODO: explain what changes at each level
     */
    int quality;
} AVScaleContext;

enum {
    /**
    * Force bit-exact output. This will prevent the use of platform-specific
    * optimizations that may lead to slight difference in rounding, in favor
    * of always maintaining exact bit output compatibility with the reference
    * C code.
    *
    * Note: This is also available under the name "accurate_rnd" for
    * backwards compatibility.
    */
    AV_SCALE_BITEXACT = 1 << 0,

    /**
    * Return an error on underspecified conversions. Without this flag,
    * unspecified fields are defaulted to sensible values.
    */
    AV_SCALE_STRICT = 1 << 1,
};

/**
 * Allocate an AVScaleContext and set its fields to default values. The
 * resulting struct should be freed with avscale_free_context().
 */
AVScaleContext *avscale_alloc_context(void);

/**
 * Free the codec context and everything associated with it, and write NULL
 * to the provided pointer.
 */
void avscale_free_context(AVScaleContext **ctx);

/**
 * Get the AVClass for AVScaleContext. It can be used in combination with
 * AV_OPT_SEARCH_FAKE_OBJ for examining options.
 *
 * @see av_opt_find().
 */
const AVClass *avscale_get_class(void);

/**
 * Statically test if a conversion is supported. Values of (respectively)
 * NONE/UNSPECIFIED are ignored.
 *
 * Returns 1 if the conversion is supported, or 0 otherwise.
 */
int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src);
int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src);
int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src);
int avscale_test_transfer(enum AVColorTransferCharacteristic dst,
                          enum AVColorTransferCharacteristic src);

/**
 * Scale source data from `src` and write the output to `dst`. This is
 * merely a convenience wrapper around `avscale_frame_slice(ctx, dst, src, 0,
 * src->height)`.
 *
 * @param ctx   The scaling context.
 * @param dst   The destination frame.
 *
 *              The data buffers may either be already allocated by the caller
 *              or left clear, in which case they will be allocated by the
 *              scaler. The latter may have performance advantages - e.g. in
 *              certain cases some (or all) output planes may be references to
 *              input planes, rather than copies.
 * @param src   The source frame. If the data buffers are set to NULL, then
 *              this function performs no conversion. It will instead merely
 *              initialize internal state that *would* be required to perform
 *              the operation, as well as returing the correct error code for
 *              unsupported frame combinations.
 *
 * @return 0 on success, a negative AVERROR code on failure.
 */
int avscale_frame(AVScaleContext *ctx, AVFrame *dst, const AVFrame *src);

/**
 * Like `avscale_frame`, but operates only on the (source) range from `ystart`
 * to `height`.
 *
 * Note: For interlaced or vertically subsampled frames, `ystart` and `height`
 * must be aligned to a multiple of the subsampling size (typically 2, or 4 in
 * the case of interlaced subsampled material).
 *
 * @param ctx   The scaling context.
 * @param dst   The destination frame. See avscale_framee() for more details.
 * @param src   The source frame. See avscale_framee() for more details.
 * @param slice_start   First row of slice, relative to `src`
 * @param slice_height  Number of (source) rows in the slice
 *
 * @return 0 on success, a negative AVERROR code on failure.
 */
int avscale_frame_slice(AVScaleContext *ctx, AVFrame *dst, const AVFrame *src,
                        int slice_start, int slice_height);

#endif /* SWSCALE_AVSCALE_H */