[FFmpeg-devel] [PATCH v1] avcodec/v210enc: add yuv420p/yuv420p10 input pixel format support

Sun Sep 22 19:59:57 EEST 2019

On Sat, Sep 21, 2019 at 10:49:21PM -0400, Devin Heitmueller wrote:
> 
> > On Sep 21, 2019, at 4:44 PM, Michael Niedermayer <michael at niedermayer.cc> wrote:
> > 
> >> The patch just expands 4:2:0 to 4:2:2 while properly supporting interlaced chroma.  
> > 
> > 4:2:0 and 4:2:2 have a chroma plane with different resolution.
> > converting between planes of different resolution is what i called scaling.
> > 
> > 
> >> It avoids having to auto insert the swscale filter in the case where there is no scaling required (e.g. H.264 4:2:0 video being output to decklink in its original resolution).
> > 
> > yes, doing an operation in the encoder avoids a filter being inserted which
> > does that operation.
> > Thats true for every encoder and every filter.
> 
> The key thing here is the encoder is already touching every pixel, so avoiding having the need for the filter essentially allows the conversion to happen at essentially zero cost (as we repack the pixels into the requisite v210 layout).
> 
> > Also replacing interpolation by a nearest neighbor implementation
> > is quite expectedly faster.
> 
> Yes, and we can certainly argue about whether doing interpolation of chroma when doing 4:2:0 to 4:2:2 actually has any visible benefit.  I can however say the cost of having swscaler in the pipeline is considerable.  In fact I didn’t appreciate it myself until I was trying to deliver 1080p60 in realtime to four decklink outputs and couldn’t keep up on my target platform.  And because filters generally aren’t threaded, I got hit with one of those cases where I had to break out the profiler and ask “why on Earth is the main ffmpeg thread so busy?"
> 
> 
> > one problem is
> > the user can setup the scale filter with high quality in mind or with 
> > low quality and speed in mind.
> > But after this patch she always gets low quality because the low quality
> > convertion code is hardcoded in the encoder which pretends to support 420.
> > The outside code has no chance to know it shouldnt feed 420 if high quality
> > is wanted.
> 

> The user can still insert a scaler explicitly or use the pix_fmt argument so the format filter gets put into the pipeline.

The problem is the user first has to know about the "in encoder convert" and
fully understand the implications. Only then can she insert a scaler and 
format filter manually (unless she is copy and pasting this from somewhere).
That is alot of knowledge for the average user

> 
> > 
> > Also why should this be in one encoder and not be available to other
> > encoders supporting 4:2:2 input ?
> > A solution should work for all of them
> 
> I would assume this would really only be helpful in encoders which only support 4:2:2 and not 4:2:0, since typical encoders that accept 4:2:0 would preserve that in their resulting encoding (i.e. they wouldn’t blindly upscale 4:2:0 to 4:2:2 for no good reason).

Maybe. But then there are specifications that demand 4:2:2 in cases where the
underlaying encoder supports 4:2:0. In these cases 4:2:2 would need to be
delivered and delivering 4:2:0 directly to the encoder would not produce
a compliant result (with current encoders)

> 
> I did actually consider doing a separate filter which just does packed/planer conversion and 4:2:0 to 4:2:2 (as opposed to swscaler).  In this case though the additional modularity in such a filter was outweighed by my goal to minimize the number of times I’m copying the frame data.  Combining it with the v210 encoding meant only a single pass over the data.
> 
> > 
> > Iam not sure what is the best solution but simply hardcoding this in
> > one encoder feels rather wrong
> 
> The scale filter performs three basic roles:
> 1.  Scaling
> 2.  Packed to planer conversion (or vice versa)
> 3.  Colorspace conversion
> 

> I supposed potentially someone could redesign swscale to include the option to not take the slow path for cases where scaling isn’t actually required (i.e. cases where only 2 and 3 are needed).

For this look at ff_get_unscaled_swscale() a new 420->422 scaler can easily be
added in there, it would not be 0 copy though. But it certainly would
make sense for the cases where this codepath i used

> 
> Just so we’re all on the same page - this wasn’t a case of random or premature optimization.  I have a specific use case where I’m decoding four instances of 1080p60 video and the platform can’t keep up without this change.  It’s the result of actually profiling the entire pipeline as opposed to some unit test with a benchmark.  In fact I don’t particularly agree with Limin's numbers where he used the benchmark option, since that fails to take into account caching behavior or memory bandwidth on a platform which is constrained (a problem which is exacerbated when running multiple instances).  In a perfect world we would have very small operations which each perform some discrete function, and we can combine all of those in a pipeline.  In the real world though, significant benefits can be gained by combining certain operations to avoid copying the same pixels over and over again.

I certainly see the benefit this 0copy422to420 has. And iam not
against such an optimization. But it needs to be done in a
cleaner way. Iam not sure how to do that ATM.

There possibly could be a function which produces a list of line pointers
so that for a image for each plane and line a pointer is produced.
and encoders which use this would have a capability flag set.

This would then mean that a encoder supporting 422 input
could be fed with 420 and a encoder supporting 420 could be
fed with 422 converted by just different values in the pointer list.

Also this feature would be known from the outside by the flag and
be controlled by the outside. such a encoder would never claim support of
other pixel formats

Thanks

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20190922/80f07c2d/attachment.sig>