[FFmpeg-devel] [PATCH] swresample/resample: improve bessel function accuracy
gajjanag at mit.edu
Tue Nov 3 04:10:33 CET 2015
On Mon, Nov 2, 2015 at 9:32 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Mon, Nov 2, 2015 at 6:49 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>> On Mon, Nov 2, 2015 at 6:37 PM, wm4 <nfxjfg at googlemail.com> wrote:
>>> On Mon, 2 Nov 2015 14:49:54 -0500
>>> Ganesh Ajjanagadde <gajjanagadde at gmail.com> wrote:
>>>> This improves accuracy for the bessel function, and this in turn should
>>>> improve the quality of the Kaiser window.
>>> "Should"? Does it or does it not? If you don't know, why the patch?
>> It improves the window in the sense of mathematically matching the
>> Kaiser window expression due to the improved bessel function accuracy.
>> That claim I have verified and placed in the commit message with
>> What that translates into in terms of resampling accuracy, I don't
>> know. Normally, such things do help reduce the noise floor, but I
>> don't know an easy way to test via FATE or associated tools. I put
>> that caveat in the bottom lines of the patch.
> Turns out the init speed is definitely improved as well (~20% boost
> with default settings).
> I did a cheap trick of calling build filter 1000 times to get a large
> number of runs.
> Results (x86-64, Haswell, GNU/Linux):
> test: fate-swr-resample-dblp-44100-2626
> 995894468 decicycles in build_filter(loop 1000), 256 runs, 0 skips
> 1029719302 decicycles in build_filter(loop 1000), 512 runs, 0 skips
> 984101131 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
> 1250020763 decicycles in build_filter(loop 1000), 256 runs, 0 skips
> 1246353282 decicycles in build_filter(loop 1000), 512 runs, 0 skips
> 1220017565 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
> Thus, this patch has both things going for it luckily. Will leave to
> the maintainer (Michael I believe) to give details of accuracy
> benefits as translated to the actual resampling if easily testable,
> and I will add the perf numbers to the message.
One last comment on performance (this is something I have discussed
with Timothy privately), if one removes the crippling
-fno-tree-vectorize, FATE still passes on my GCC 5.2 setup, and one
gets further ~5 % perf improvement here:
949318408 decicycles in build_filter(loop 1000), 256 runs, 0 skips
948795082 decicycles in build_filter(loop 1000), 512 runs, 0 skips
928925076 decicycles in build_filter(loop 1000), 1024 runs, 0 skips
I am sure a lot of other places may benefit. Yes, I know it was buggy
on older GCC versions, etc, but I see no reason to cripple users with
The commit 973859f5230e77beea7bb59dc081870689d6d191 is ancient and has
a "sloppy" commit message.
"It provides no speed benefit" is not true, and even back then was
never backed up by hard numbers, though the commit would have been ok
on non x86 at the time, possibly even now. As can be seen clearly from
of Mans's evidence came from non x86 architectures which are Mans's
expertise. In the absence of any complaints, the change was committed
and x86 people have suffered from it. Yes, particular versions may be
buggy, but such things should be checked more carefully especially
since this is "free performance".
Mans's argument was that adventurous people could add it back in.
Needless to say, it was not documented anywhere, and casual users like
me (and I suspect some others here) were completely unaware of this.
It is quite obscure and easy to miss unless one digs through configure
(or config.mak), or finds out from someone else.
I also recall (and this prompted the private conversation with
Timothy) that Timothy's memcpy change from the loop actually yield no
difference with a half decent auto-vectoriser. It is a low hanging
fruit in my view, and I have posted a perf number above.
At the very least, we should inform users of this low hanging thing
e.g by printing out during configure an "info" message - they can try
out an "adventurous" build with this, test out FATE, and if it passes,
go ahead with a new build.
Or we could have a "dev option" for --enable-vectorize.
Or at a more ambitious scale, we could collect configs where this will
work, test for them, and only disable otherwise.
There are a number of possibilities, for which a separate thread is better.
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel