[FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

Wed May 11 22:02:33 CEST 2016

On Wed, May 11, 2016 at 9:04 PM, Reimar Döffinger <Reimar.Doeffinger at gmx.de>
wrote:

>
>
> On 11.05.2016, at 20:37, Michael Niedermayer <michael at niedermayer.cc>
> wrote:
>
> > On Wed, May 11, 2016 at 06:39:20PM +0200, Matthieu Bouron wrote:
> >> From: Matthieu Bouron <matthieu.bouron at stupeflix.com>
> >>
> >> ---
> >>
> >> Hello,
> >>
> >> Here are some benchmark on a rpi2 of the attached patch.
> >>
> >> ./ffmpeg -f lavfi -i
> sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop
> -t 1000 -f null -
> >>
> >> With patch:    avg=0.001159 speed=44,1x
> >> Without patch: avg=0.001297 speed=40,8x
> >>
> >> ./ffmpeg -f lavfi -i
> sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop
> -t 1000 -f null -
> >>
> >
> >> With patch:    avg=0.001374 speed=45,6x
> >> Without patch: avg=0.000782 speed=64,6x
> >
> > so its slower ? or am i misreading this ?
>

>
> Yes, that seems weird.
> Also, what are common filter lengths?
>

Sorry I inverted the two results, the neon version is actually faster:

With*out* patch:    avg=0.001374 speed=45,6x
With patch: avg=0.000782 speed=64,6x

> Because for a length of 4 or 8 or 16 I'd think this would be much better
> fully unrolled.
> And for longer ones at least partially unrolled.
>

The common filter length seems to be 32 but it might depends.
Regarding the little performance gain on the float version it seems to be
due to the switch between vfp instructions versus neon instructions (i'm
not 100% sure).

Matthieu

[...]