[FFmpeg-devel] [RFC]] swscale modernization proposal
Niklas Haas
ffmpeg at haasn.xyz
Sat Jun 29 14:47:43 EEST 2024
On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffmpeg at haasn.xyz> wrote:
> Hey,
>
> As some of you know, I got contracted (by STF 2024) to work on improving
> swscale, over the course of the next couple of months. I want to share my
> current plans and gather feedback + measure sentiment.
>
> ## Problem statement
>
> The two issues I'd like to focus on for now are:
>
> 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp,
> IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...)
> 2. Complicated context management, with cascaded contexts, threading, stateful
> configuration, multi-step init procedures, etc; and related bugs
>
> In order to make these feasible, some amount of internal re-organization of
> duties inside swscale is prudent.
>
> ## Proposed approach
>
> The first step is to create a new API, which will (tentatively) live in
> <libswscale/avscale.h>. This API will initially start off as a near-copy of the
> current swscale public API, but with the major difference that I want it to be
> state-free and only access metadata in terms of AVFrame properties. So there
> will be no independent configuration of the input chroma location etc. like
> there is currently, and no need to re-configure or re-init the context when
> feeding it frames with different properties. The goal is for users to be able
> to just feed it AVFrame pairs and have it internally cache expensive
> pre-processing steps as needed. Finally, avscale_* should ultimately also
> support hardware frames directly, in which case it will dispatch to some
> equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will
> defer this to a future milestone)
So, I've spent the past days implementing this API and hooking it up to
swscale internally. (For testing, I am also replacing `vf_scale` by the
equivalent AVScale-based implementation to see how the new API impacts
existing users). It mostly works so far, with some left-over translation
issues that I have to address before it can be sent upstream.
------
One of the things I was thinking about was how to configure
scalers/dither modes, which sws currently, somewhat clunkily, controls
with flags. IMO, flags are not the right design here - if anything, it
should be a separate enum/int, and controllable separately for chroma
resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80).
That said, I think that for most end users, having such fine-grained
options is not really providing any end value - unless you're already
knee-deep in signal theory, the actual differences between, say,
"natural bicubic spline" and "Lanczos" are obtuse at best and alien at
worst.
My idea was to provide a single `int quality`, which the user can set to
tune the speed <-> quality trade-off on an arbitrary numeric scale from
0 to 10, with 0 being the fastest (alias everything, nearest neighbour,
drop half chroma samples, etc.), the default being something in the
vicinity of 3-5, and 10 being the maximum quality (full linear
downscaling, anti-aliasing, error diffusion, etc.).
The upside of this approach is that it would be vastly simpler for most
end users. It would also track newly added functionality automatically;
e.g. if we get a higher-quality tone mapping mode, it can be
retroactively added to the higher quality presets. The biggest downside
I can think of is that doing this would arguably violate the semantics
of a "bitexact" flag, since it would break results relative to
a previous version of libswscale - unless we maybe also force a specific
quality level in bitexact mode?
Open questions:
1. Is this a good idea, or do the downsides outweigh the benefits?
2. Is an "advanced configuration" API still needed, in addition to the
quality presets?
------
I have attached my current working draft of the public half of
<avscale.h>, for reference. You can also find my implementation draft at
the time of writing here:
https://github.com/haasn/FFmpeg/blob/avscale/libswscale/avscale.h
-------------- next part --------------
/*
* Copyright (C) 2024 Niklas Haas
*
* This file is part of FFmpeg.
*
* FFmpeg is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* FFmpeg is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with FFmpeg; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
#ifndef SWSCALE_AVSCALE_H
#define SWSCALE_AVSCALE_H
/**
* @file
* @ingroup libsws
* Higher-level wrapper around libswscale + related libraries, which is
* capable of handling more advanced colorspace transformations.
*/
#include "libavutil/frame.h"
#include "libavutil/log.h"
/**
* Main external API structure. New fields cannot be added to the end with
* minor version bumps. Removal, reordering and changes to existing fields
* require a major version bump. sizeof(AVScaleContext) is not part of the ABI.
*/
typedef struct AVScaleContext {
const AVClass *av_class;
/**
* Private context used for internal data.
*/
struct AVScaleInternal *internal;
/**
* Private data of the user, can be used to carry app specific stuff.
*/
void *opaque;
/**
* Bitmask of AV_SCALE_* flags.
*/
int64_t flags;
/**
* How many threads to use for processing, or 0 for automatic selection.
*/
int threads;
/**
* Quality factor (0-10). The default quality is [TBD]. Higher values
* sacrifice speed in exchange for quality.
*
* TODO: explain what changes at each level
*/
int quality;
} AVScaleContext;
enum {
/**
* Force bit-exact output. This will prevent the use of platform-specific
* optimizations that may lead to slight difference in rounding, in favor
* of always maintaining exact bit output compatibility with the reference
* C code.
*
* Note: This is also available under the name "accurate_rnd" for
* backwards compatibility.
*/
AV_SCALE_BITEXACT = 1 << 0,
/**
* Return an error on underspecified conversions. Without this flag,
* unspecified fields are defaulted to sensible values.
*/
AV_SCALE_STRICT = 1 << 1,
};
/**
* Allocate an AVScaleContext and set its fields to default values. The
* resulting struct should be freed with avscale_free_context().
*/
AVScaleContext *avscale_alloc_context(void);
/**
* Free the codec context and everything associated with it, and write NULL
* to the provided pointer.
*/
void avscale_free_context(AVScaleContext **ctx);
/**
* Get the AVClass for AVScaleContext. It can be used in combination with
* AV_OPT_SEARCH_FAKE_OBJ for examining options.
*
* @see av_opt_find().
*/
const AVClass *avscale_get_class(void);
/**
* Statically test if a conversion is supported. Values of (respectively)
* NONE/UNSPECIFIED are ignored.
*
* Returns 1 if the conversion is supported, or 0 otherwise.
*/
int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src);
int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src);
int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src);
int avscale_test_transfer(enum AVColorTransferCharacteristic dst,
enum AVColorTransferCharacteristic src);
/**
* Scale source data from `src` and write the output to `dst`. This is
* merely a convenience wrapper around `avscale_frame_slice(ctx, dst, src, 0,
* src->height)`.
*
* @param ctx The scaling context.
* @param dst The destination frame.
*
* The data buffers may either be already allocated by the caller
* or left clear, in which case they will be allocated by the
* scaler. The latter may have performance advantages - e.g. in
* certain cases some (or all) output planes may be references to
* input planes, rather than copies.
* @param src The source frame. If the data buffers are set to NULL, then
* this function performs no conversion. It will instead merely
* initialize internal state that *would* be required to perform
* the operation, as well as returing the correct error code for
* unsupported frame combinations.
*
* @return 0 on success, a negative AVERROR code on failure.
*/
int avscale_frame(AVScaleContext *ctx, AVFrame *dst, const AVFrame *src);
/**
* Like `avscale_frame`, but operates only on the (source) range from `ystart`
* to `height`.
*
* Note: For interlaced or vertically subsampled frames, `ystart` and `height`
* must be aligned to a multiple of the subsampling size (typically 2, or 4 in
* the case of interlaced subsampled material).
*
* @param ctx The scaling context.
* @param dst The destination frame. See avscale_framee() for more details.
* @param src The source frame. See avscale_framee() for more details.
* @param slice_start First row of slice, relative to `src`
* @param slice_height Number of (source) rows in the slice
*
* @return 0 on success, a negative AVERROR code on failure.
*/
int avscale_frame_slice(AVScaleContext *ctx, AVFrame *dst, const AVFrame *src,
int slice_start, int slice_height);
#endif /* SWSCALE_AVSCALE_H */
More information about the ffmpeg-devel
mailing list