[FFmpeg-devel] [PATCH] swscale/arm: add ff_nv{12, 21}_to_{argb, rgba, abgr, bgra}_neon

Thu Nov 19 18:29:23 CET 2015

On Thu, Nov 19, 2015 at 04:50:54PM +0100, Michael Niedermayer wrote:
> On Thu, Nov 19, 2015 at 11:48:53AM +0100, Clément Bœsch wrote:
> > From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > 
> > Signed-off-by: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > Signed-off-by: Clément Bœsch <clement at stupeflix.com>
> > 
> > ---
> > The function takes about 29ms with a 1080p source (testsrc2) on a
> > cortex-a8. Though, 16ms (more than half the time!) is spend in the vst2
> > call. Any suggestion on how to speed up this?
> > 
> > Also, the reference code seems to cause some kind of ringing, while our
> > ASM doesn't:
> >   http://b.pkh.me/nv12-rgba-ref.png
> >   http://b.pkh.me/nv12-rgba-neon.png
> 
> what did you test exactly here ?

./ffmpeg -f lavfi -i testsrc2 -vf format=nv12,format=rgba -ss 1 -frames:v 1 -y nv12-rgba-ref.png

(on ARM though, and with -cpuflags 0)

> but there are several codepathes for rgb output, one uses LUTs and
> not all use full resolution chroma
> 

Yeah, we noticed...

Note: on x86 there are some yuv2rgb mmx code but it's not called above
because it doesn't handle nv12 (only yuv420 & friends), so the chroma
issue is reproducible (it's calling the LUT path).

> 
> > 
> > Last, we noticed that the y_offset is scaled to 1<<9 for some reason we
> > couldn't figure out. Hopefully we're doing it correctly here.
> 
> [...]
> > +.macro compute_half_line dst half_y ofmt
> > +    vmovl.u8            q7, \half_y                                    @ 8px of Y
> > +    vdup.16             q5, r9
> > +    vsub.s16            q7, q5
> > +    vmull.s16           q1, d14, d0                                    @ q1 = (srcY - y_offset) * y_coeff (left)
> > +    vmull.s16           q2, d15, d0                                    @ q2 = (srcY - y_offset) * y_coeff (right)
> 
> if you do something like (srcY) * y_coeff - y_offset2
> then you could keep a bit more precission in the requested brightness
> correction

The code in swscale/output.c seems to always use the form we use here. Is
it on purpose?

> OTOH maybe you want to be bitexact to some existing codepath
> 

Right... I suppose we don't have much tests with custom
brightness/contrast/saturation. Should I add expose them in vf_scale and
see how much breaks? :)

> either way, your patch passes fate with arm qemu here so i have
> no objections if you also tested it and it works
> but maybe others have more comments about the asm ...
> 

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151119/9d6d1a9b/attachment.sig>