[FFmpeg-devel] [PATCH] swscale/arm: add ff_nv{12, 21}_to_{argb, rgba, abgr, bgra}_neon

Fri Nov 20 18:46:16 CET 2015

On Thu, Nov 19, 2015 at 06:29:23PM +0100, Clément Bœsch wrote:
> On Thu, Nov 19, 2015 at 04:50:54PM +0100, Michael Niedermayer wrote:
> > On Thu, Nov 19, 2015 at 11:48:53AM +0100, Clément Bœsch wrote:
> > > From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > > 
> > > Signed-off-by: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> > > Signed-off-by: Clément Bœsch <clement at stupeflix.com>
> > > 
> > > ---
> > > The function takes about 29ms with a 1080p source (testsrc2) on a
> > > cortex-a8. Though, 16ms (more than half the time!) is spend in the vst2
> > > call. Any suggestion on how to speed up this?
> > > 
> > > Also, the reference code seems to cause some kind of ringing, while our
> > > ASM doesn't:
> > >   http://b.pkh.me/nv12-rgba-ref.png
> > >   http://b.pkh.me/nv12-rgba-neon.png
> > 
> > what did you test exactly here ?
> 
> ./ffmpeg -f lavfi -i testsrc2 -vf format=nv12,format=rgba -ss 1 -frames:v 1 -y nv12-rgba-ref.png
> 
> (on ARM though, and with -cpuflags 0)
> 
> > but there are several codepathes for rgb output, one uses LUTs and
> > not all use full resolution chroma
> > 
> 
> Yeah, we noticed...
> 
> Note: on x86 there are some yuv2rgb mmx code but it's not called above
> because it doesn't handle nv12 (only yuv420 & friends), so the chroma
> issue is reproducible (it's calling the LUT path).
> 
> > 
> > > 
> > > Last, we noticed that the y_offset is scaled to 1<<9 for some reason we
> > > couldn't figure out. Hopefully we're doing it correctly here.
> > 
> > [...]
> > > +.macro compute_half_line dst half_y ofmt
> > > +    vmovl.u8            q7, \half_y                                    @ 8px of Y
> > > +    vdup.16             q5, r9
> > > +    vsub.s16            q7, q5
> > > +    vmull.s16           q1, d14, d0                                    @ q1 = (srcY - y_offset) * y_coeff (left)
> > > +    vmull.s16           q2, d15, d0                                    @ q2 = (srcY - y_offset) * y_coeff (right)
> > 
> > if you do something like (srcY) * y_coeff - y_offset2
> > then you could keep a bit more precission in the requested brightness
> > correction
> 
> The code in swscale/output.c seems to always use the form we use here. Is
> it on purpose?

if srcY has some extra bits precission then it shuld be fine


> 

> > OTOH maybe you want to be bitexact to some existing codepath
> > 
> 
> Right... I suppose we don't have much tests with custom
> brightness/contrast/saturation. Should I add expose them in vf_scale and
> see how much breaks? :)

contrast/brightness/saturation fate tests are welcome

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151120/bd5e58ac/attachment.sig>