[FFmpeg-devel] [PATCH] swresample/resample: improve bessel function accuracy

Tue Nov 3 13:08:31 CET 2015

On Tue, Nov 3, 2015 at 2:17 AM, Clément Bœsch <u at pkh.me> wrote:
> On Mon, Nov 02, 2015 at 10:10:33PM -0500, Ganesh Ajjanagadde wrote:
>> On Mon, Nov 2, 2015 at 9:32 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>> > On Mon, Nov 2, 2015 at 6:49 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>> >> On Mon, Nov 2, 2015 at 6:37 PM, wm4 <nfxjfg at googlemail.com> wrote:
>> >>> On Mon,  2 Nov 2015 14:49:54 -0500
>> >>> Ganesh Ajjanagadde <gajjanagadde at gmail.com> wrote:
>> >>>
>> >>>> This improves accuracy for the bessel function, and this in turn should
>> >>>> improve the quality of the Kaiser window.
>> >>>
>> >>>
>> >>> "Should"? Does it or does it not? If you don't know, why the patch?
>> >>
>> >> It improves the window in the sense of mathematically matching the
>> >> Kaiser window expression due to the improved bessel function accuracy.
>> >> That claim I have verified and placed in the commit message with
>> >> evidence.
>> >>
>> >> What that translates into in terms of resampling accuracy, I don't
>> >> know. Normally, such things do help reduce the noise floor, but I
>> >> don't know an easy way to test via FATE or associated tools. I put
>> >> that caveat in the bottom lines of the patch.
>> >
>> > Turns out the init speed is definitely improved as well (~20% boost
>> > with default settings).
>> > I did a cheap trick of calling build filter 1000 times to get a large
>> > number of runs.
>> > Results (x86-64, Haswell, GNU/Linux):
>> >
>> > test: fate-swr-resample-dblp-44100-2626
>> >
>> > new:
>> > 995894468 decicycles in build_filter(loop 1000),     256 runs,      0 skips
>> > 1029719302 decicycles in build_filter(loop 1000),     512 runs,      0 skips
>> > 984101131 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
>> >
>> > old:
>> > 1250020763 decicycles in build_filter(loop 1000),     256 runs,      0 skips
>> > 1246353282 decicycles in build_filter(loop 1000),     512 runs,      0 skips
>> > 1220017565 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
>> >
>> > Thus, this patch has both things going for it luckily. Will leave to
>> > the maintainer (Michael I believe) to give details of accuracy
>> > benefits as translated to the actual resampling if easily testable,
>> > and I will add the perf numbers to the message.
>>
>> One last comment on performance (this is something I have discussed
>> with Timothy privately), if one removes the crippling
>> -fno-tree-vectorize, FATE still passes on my GCC 5.2 setup, and one
>> gets further ~5 % perf improvement here:
>> 949318408 decicycles in build_filter(loop 1000),     256 runs,      0 skips
>> 948795082 decicycles in build_filter(loop 1000),     512 runs,      0 skips
>> 928925076 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
>>
>> I am sure a lot of other places may benefit.
>
> It's been a while I want to remove -fno-tree-vectorize. We've already
> observed relatively small performance improvements in various places. If
> it's still buggy, compilers need to be fixed anyway.
>
> --extra-cflags=-fno-tree-vectorize is a good enough workaround if some
> people explicitly want to disable it.

Yes, but it needs to be documented. I was thinking of the following:
Add both a --enable-vectorize and --disable-vectorize in configure.
If neither have been used explicitly, disable vectorization by
default, but print out a highlighted (e.g warning/info) message saying
that build may be slower, but we keep disabled by default due to bad
compilers, etc. If users want to suppress the message, they should
choose one of the two.
That way:
1. Vectorization is still disabled by default so we don't regress anything.
2. All users get this info, and they can make a decision as they see fit.

>
> About the bessel patch itself, I'm happy that you benched it, because its
> performance are actually relevant: last time I tried to replace the
> current unrolled loop with a real loop (trying to match libavresample
> code), I observed a noticeable overall slow down while resampling (the
> init is very slow). Yes, no number, sorry :)

No need for a number, Michael already posted in a commit message
giving ~ 10% from a bessel unrolling for instance. The perf
improvement I showed is over master which has this.

It is an init, meant to be called once at the beginning. Of course if
the file is short, it may be relevant. If the init is being called
multiple times, then something is wrong with the API/app. Yes, we care
about performance everywhere, just saying that it seemed to me to be a
slightly less important issue here. I might be entirely wrong. If it
is that important, we should consider caching or building filters
offline for common sampling rates and number of taps, at least if
!CONFIG_SMALL. That will greatly speed it up. I believe wm4 did some
stuff with this recently, so he may be able to comment on this or
suggest something from an API user's perspective?

There is one last issue: copyright. I added a link to the boost
project as a comment, but I don't know the method for this, and Carl
sent me a private email about it asking me to do something.
Suggestions?

>
> [...]
>
> --
> Clément B.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>