[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
Michael Niedermayer
michael at niedermayer.cc
Sun Oct 11 00:50:06 CEST 2015
On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote:
> On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
> <gajjanagadde at gmail.com> wrote:
> > This uses Stein's binary GCD algorithm:
> > https://en.wikipedia.org/wiki/Binary_GCD_algorithm
> > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard architectures.
> > Have not benchmarked, so I can't comment
>
> Before you submit a patch that's supposed to make something faster,
> you should benchmark it to verify that it is in fact faster. Do this
> with inputs of various sizes on both 32- and 64-bit architectures and
> both with and without compilers that support __builtin_ctzll(v).
a quick benchmark with
--- a/tests/fate-run.sh
+++ b/tests/fate-run.sh
@@ -296,7 +296,7 @@ if test $err != 0 && test $gen != "no" ; then
fi
if test $err = 0; then
- rm -f $outfile $errfile $cmpfile $cleanfiles
+ rm -f $outfile $cmpfile $cleanfiles
and START/STOP_TIMER and running make fate -j12 -k
shows that the new code appears significantly faster (x86-64 sandybridge + gcc)
old:
aac-al18_44.err: 1748 decicycles in gcd, 4096 runs, 0 skips
aac-am00_88.err: 2496 decicycles in gcd, 4096 runs, 0 skips
aac-am00_88.err: 2498 decicycles in gcd, 8192 runs, 0 skips
aac-er_eld1001np_44_ep0.err: 2433 decicycles in gcd, 1023 runs, 1 skips
aac-er_eld1001np_44_ep0.err: 2467 decicycles in gcd, 2047 runs, 1 skips
aac-er_eld1001np_44_ep0.err: 2352 decicycles in gcd, 4095 runs, 1 skips
aac-er_eld2000np_48_ep0.err: 1737 decicycles in gcd, 4096 runs, 0 skips
aac-er_eld2100np_48_ep0.err: 1052 decicycles in gcd, 4096 runs, 0 skips
aac-fixed-al18_44.err: 1764 decicycles in gcd, 4096 runs, 0 skips
aac-fixed-er_eld1001np_44_ep0.err: 2484 decicycles in gcd, 4096 runs, 0 skips
aac-fixed-er_eld2000np_48_ep0.err: 1749 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-adx.err: 1411 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-adx.err: 1392 decicycles in gcd, 8192 runs, 0 skips
acodec-adpcm-adx.err: 1403 decicycles in gcd, 16384 runs, 0 skips
acodec-adpcm-adx-trellis.err: 1414 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-adx-trellis.err: 1416 decicycles in gcd, 8192 runs, 0 skips
acodec-adpcm-adx-trellis.err: 1417 decicycles in gcd, 16384 runs, 0 skips
acodec-adpcm-ima_qt.err: 1399 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-ima_qt.err: 1406 decicycles in gcd, 8192 runs, 0 skips
acodec-adpcm-ima_qt-trellis.err: 1487 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-ima_qt-trellis.err: 1489 decicycles in gcd, 8192 runs, 0 skips
adpcm_ms-stereo.err: 2751 decicycles in gcd, 4096 runs, 0 skips
alac-24-level-0.err: 684 decicycles in gcd, 4096 runs, 0 skips
alac-24-level-1.err: 684 decicycles in gcd, 4096 runs, 0 skips
alac-24-level-2.err: 696 decicycles in gcd, 4096 runs, 0 skips
alac-24-lpc-orders.err: 680 decicycles in gcd, 4096 runs, 0 skips
amrwb-8k85.err: 1052 decicycles in gcd, 1023 runs, 1 skips
dca-core.err: 1800 decicycles in gcd, 511 runs, 1 skips
dca-core.err: 1850 decicycles in gcd, 1023 runs, 1 skips
dca-xll.err: 1439 decicycles in gcd, 4096 runs, 0 skips
filter-metadata-scenedetect.err: 794 decicycles in gcd, 4096 runs, 0 skips
filter-pixdesc-gbrp14le.err: 660 decicycles in gcd, 127 runs, 1 skips
filter-pixfmts-hflip.err: 761 decicycles in gcd, 63 runs, 1 skips
filter-pixfmts-hflip.err: 731 decicycles in gcd, 127 runs, 1 skips
flac-24-comp-8.err: 665 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-canl3_sony_c.err: 824 decicycles in gcd, 1023 runs, 1 skips
h264-conformance-canl3_sony_c.err: 772 decicycles in gcd, 2047 runs, 1 skips
h264-conformance-frext-hpcafl_bcrm_c.err: 767 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcaflnl_bcrm_c.err: 724 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcamapalq_bcrm_b.err: 812 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcvfl_bcrm_a.err: 761 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcvflnl_bcrm_a.err: 764 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-ls_sva_d.err: 815 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-ls_sva_d.err: 820 decicycles in gcd, 8192 runs, 0 skips
h264-conformance-ls_sva_d.err: 808 decicycles in gcd, 16383 runs, 1 skips
lagarith-yuy2.err: 789 decicycles in gcd, 15 runs, 1 skips
lagarith-yuy2.err: 726 decicycles in gcd, 31 runs, 1 skips
lossless-meridianaudio.err: 1268 decicycles in gcd, 4096 runs, 0 skips
lossless-meridianaudio.err: 1279 decicycles in gcd, 8189 runs, 3 skips
lossless-meridianaudio.err: 1305 decicycles in gcd, 16381 runs, 3 skips
lossless-truehd-5.1-downmix-2.0.err: 1036 decicycles in gcd, 4096 runs, 0 skips
lossless-truehd-5.1.err: 1016 decicycles in gcd, 4096 runs, 0 skips
opus-testvector01.err: 1246 decicycles in gcd, 4096 runs, 0 skips
opus-testvector05.err: 1025 decicycles in gcd, 4096 runs, 0 skips
opus-testvector07.err: 1002 decicycles in gcd, 4096 runs, 0 skips
opus-testvector07.err: 1008 decicycles in gcd, 8191 runs, 1 skips
opus-testvector10.err: 1241 decicycles in gcd, 2047 runs, 1 skips
ra-288.err: 656 decicycles in gcd, 4096 runs, 0 skips
sipr-16k.err: 681 decicycles in gcd, 4096 runs, 0 skips
swr-resample-fltp-44100-2626.err: 704 decicycles in gcd, 127 runs, 1 skips
twinvq.err: 2410 decicycles in gcd, 4096 runs, 0 skips
vqf-demux.err: 2441 decicycles in gcd, 511 runs, 1 skips
vqf-demux.err: 2506 decicycles in gcd, 1023 runs, 1 skips
vqf-demux.err: 2478 decicycles in gcd, 2047 runs, 1 skips
vqf-demux.err: 2405 decicycles in gcd, 4095 runs, 1 skips
vsynth2-r210.err: 697 decicycles in gcd, 127 runs, 1 skips
vsynth2-r210.err: 698 decicycles in gcd, 255 runs, 1 skips
vsynth_lena-ffv1-v0.err: 723 decicycles in gcd, 127 runs, 1 skips
vsynth_lena-mpeg2-ilace.err: 1655 decicycles in gcd, 2047 runs, 1 skips
wavpack-cuesheet.err: 1199 decicycles in gcd, 4096 runs, 0 skips
wavpack-cuesheet.err: 1173 decicycles in gcd, 8192 runs, 0 skips
new:
aac-al18_44.err: 742 decicycles in gcd, 4096 runs, 0 skips
aac-am00_88.err: 743 decicycles in gcd, 4096 runs, 0 skips
aac-am00_88.err: 742 decicycles in gcd, 8192 runs, 0 skips
aac-er_eld1001np_44_ep0.err: 757 decicycles in gcd, 4096 runs, 0 skips
aac-er_eld2000np_48_ep0.err: 768 decicycles in gcd, 4096 runs, 0 skips
aac-er_eld2100np_48_ep0.err: 598 decicycles in gcd, 4096 runs, 0 skips
aac-fixed-al18_44.err: 745 decicycles in gcd, 4096 runs, 0 skips
aac-fixed-er_eld1001np_44_ep0.err: 881 decicycles in gcd, 4096 runs, 0 skips
aac-fixed-er_eld2000np_48_ep0.err: 813 decicycles in gcd, 4096 runs, 0 skips
ac3-fixed-2.0.err: 462 decicycles in gcd, 31 runs, 1 skips
ac3-fixed-2.0.err: 548 decicycles in gcd, 63 runs, 1 skips
ac3-fixed-2.0.err: 586 decicycles in gcd, 127 runs, 1 skips
ac3-fixed-2.0.err: 590 decicycles in gcd, 255 runs, 1 skips
acodec-adpcm-adx.err: 736 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-adx.err: 733 decicycles in gcd, 8192 runs, 0 skips
acodec-adpcm-adx.err: 739 decicycles in gcd, 16384 runs, 0 skips
acodec-adpcm-adx-trellis.err: 878 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-adx-trellis.err: 877 decicycles in gcd, 8192 runs, 0 skips
acodec-adpcm-adx-trellis.err: 877 decicycles in gcd, 16384 runs, 0 skips
acodec-adpcm-ima_qt.err: 724 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-ima_qt.err: 724 decicycles in gcd, 8192 runs, 0 skips
acodec-adpcm-ima_qt-trellis.err: 702 decicycles in gcd, 4096 runs, 0 skips
acodec-adpcm-ima_qt-trellis.err: 724 decicycles in gcd, 8192 runs, 0 skips
adpcm_ms-stereo.err: 736 decicycles in gcd, 4096 runs, 0 skips
alac-24-level-0.err: 406 decicycles in gcd, 4096 runs, 0 skips
alac-24-level-1.err: 415 decicycles in gcd, 4096 runs, 0 skips
alac-24-level-2.err: 418 decicycles in gcd, 4096 runs, 0 skips
alac-24-lpc-orders.err: 434 decicycles in gcd, 4096 runs, 0 skips
dca-xll.err: 953 decicycles in gcd, 4096 runs, 0 skips
delphine-cin-video.err: 786 decicycles in gcd, 511 runs, 1 skips
filter-metadata-scenedetect.err: 655 decicycles in gcd, 4096 runs, 0 skips
filter-pixfmts-il.err: 584 decicycles in gcd, 127 runs, 1 skips
filter-pixfmts-null.err: 866 decicycles in gcd, 3 runs, 1 skips
filter-pixfmts-null.err: 548 decicycles in gcd, 7 runs, 1 skips
filter-pixfmts-null.err: 671 decicycles in gcd, 15 runs, 1 skips
filter-pixfmts-null.err: 548 decicycles in gcd, 31 runs, 1 skips
filter-pixfmts-null.err: 504 decicycles in gcd, 63 runs, 1 skips
filter-pixfmts-null.err: 497 decicycles in gcd, 127 runs, 1 skips
filter-pixfmts-scale.err: 537 decicycles in gcd, 63 runs, 1 skips
filter-pixfmts-scale.err: 555 decicycles in gcd, 127 runs, 1 skips
flac-16-chmode-left_side.err: 899 decicycles in gcd, 510 runs, 2 skips
flac-24-comp-8.err: 366 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcafl_bcrm_c.err: 583 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcaflnl_bcrm_c.err: 691 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcamapalq_bcrm_b.err: 735 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcvfl_bcrm_a.err: 699 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-frext-hpcvflnl_bcrm_a.err: 679 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-ls_sva_d.err: 590 decicycles in gcd, 4096 runs, 0 skips
h264-conformance-ls_sva_d.err: 589 decicycles in gcd, 8192 runs, 0 skips
h264-conformance-ls_sva_d.err: 585 decicycles in gcd, 16383 runs, 1 skips
interplay-mve-16bit.err: 695 decicycles in gcd, 511 runs, 1 skips
lavf-ffm.err: 918 decicycles in gcd, 2047 runs, 1 skips
lossless-meridianaudio.err: 668 decicycles in gcd, 4096 runs, 0 skips
lossless-meridianaudio.err: 635 decicycles in gcd, 8192 runs, 0 skips
lossless-meridianaudio.err: 670 decicycles in gcd, 16384 runs, 0 skips
lossless-truehd-5.1-downmix-2.0.err: 511 decicycles in gcd, 4096 runs, 0 skips
lossless-truehd-5.1.err: 632 decicycles in gcd, 4096 runs, 0 skips
opus-testvector01.err: 849 decicycles in gcd, 4096 runs, 0 skips
opus-testvector05.err: 579 decicycles in gcd, 4096 runs, 0 skips
opus-testvector07.err: 669 decicycles in gcd, 4096 runs, 0 skips
opus-testvector07.err: 661 decicycles in gcd, 8192 runs, 0 skips
ra-288.err: 422 decicycles in gcd, 4096 runs, 0 skips
sipr-16k.err: 436 decicycles in gcd, 4096 runs, 0 skips
twinvq.err: 775 decicycles in gcd, 4096 runs, 0 skips
vqf-demux.err: 708 decicycles in gcd, 4096 runs, 0 skips
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151011/a62b6aa4/attachment.sig>
More information about the ffmpeg-devel
mailing list