[FFmpeg-devel] [PATCH] libavfilter: scale_cuda filter adds dynamic command values

Mon Nov 26 21:35:39 EET 2018

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, 26 de November de 2018 20:10, Timo Rothenpieler <timo at rothenpieler.org> wrote:

> On 26.11.2018 19:09, msanders wrote:
>
> > Hi,
> > This patch adds command support for dynamic change the size in the “scale_cuda” resize filter. In fact, it’s the first GPU filter accepting realtime commands. Using similar changes it’s possible to port it to other hwaccelerators. The only limitation is that the values cannot exceed the initial values. It is therefore necessary to set up the graph with the higher values you may need.
> > One example: { -filter_complex "scale_cuda=720:576,hwdownload,format=nv12,zmq" }
> > And then you can use the <c> or ZMQ messages to change the width and/or height.
> > Warning: This patch requires, to have sense, to apply too this other patch that fixes the hwdownload filter.
> > https://patchwork.ffmpeg.org/patch/11165/
>
> I'm not sure if this is such a good idea. There's a lot of places all
> over the codebase that have a hardcoded assumption about how hardware
> frames in general, and CUDA frames in particular work.
>
I think it's a good idea because: I implemented it and I'm using it. In addition, it works!

For sure you can't change the HW frames, only the "size" of images in the HW frames. That's all.

> A lot of code checks the width/height from the hwframes ctx instead of
> the frame itself, because it needs the real size(1088 for 1080p for
> example) of the underlying buffer at all times.
> So those consumers would straight up ignore any scaling done to the
> frames, reading messed up data instead.
>
For all checks that I have done, all works as expected.
Only the "hwdownload" has some troubles. But I also provided the patch that solves it.

> On top of that, in the specific case of CUDA, the CUDA pix_fmt is
> defined with the assumption that the entire frame is in a single
> continuous buffer with all planes right after one another. This would
> most prominently affect nvenc, as its API only takes one lone CUdevptr
> as input, and then has a fixed idea about how the data behind it looks.
>
I take account of this in the code. See the part where I always use the original (context) boundaries for secondary planes.

> So it would produce output with random data to the right and bottom of
> the scaled frame, still with the outer dimensions before the
> re-configuration.
>
No. The output is always close to the top (begining). No other alignement is allowed. Nothing random at all.

> The only way this could possibly work is if a new hw_frames_ctx is
> created on reconfiguration.

No. This isn't viable. If you recreate the hw_context then the next HW filters fails.
Please, belive me. I've already tried it and it doesn't work.

> With nvenc this would actually work without any changes, as it re-reads
> the width/height out of it on every frame already, and initially only
> gets the CUDA-Context and sw_pix_fmt out of it, so those would need to
> stay the same, which isn't an issue.

Why you like to "resize" before the "encoder"? This will produce a bitstream with size changes. Not a good idea.

Think on what does the current "vf_scale"... that already supports live size changes: It's useful when you're doing some filtering/processing. In such situation it's good that the "scale_cuda" does equal to "scale" (aka dynamic commands). Then you can do more work inside the GPU before downloading frames to RAM (remember that with my patch for "hwdonwload" it works). Only think in the wasted CPU consumed for a simple "scale" instead of "scale_cuda" when you need "live size changes".

> But for a bunch of other hardware filters more work is needed. Specially
> as some parts overlap with other APIs.

I don't see the problem. Please explain it better.

Regards.