[FFmpeg-devel] [PATCH] avcodec/jpeg2000: replace naive pow call with smarter exp2fi

Michael Niedermayer michaelni at gmx.at
Tue Dec 8 19:49:58 CET 2015


On Mon, Dec 07, 2015 at 09:50:49PM -0500, Ganesh Ajjanagadde wrote:
> pow is a very wasteful function for this purpose. A low hanging fruit
> would be simply to replace with exp2f, and that does yield some speedup.
> However, there are 2 drawbacks of this:
> 1. It does not exploit the integer nature of the argument.
> 2. (minor) Some platforms lack a proper exp2f routine, making benefits available
> only to non broken libm.
> 3. exp2f does not solve the same issue that plagues pow, namely terrible
> worst case performance. This is a fundamental issue known as the
> "table-maker's dilemma" recognized by Prof. Kahan himself and
> subsequently elaborated and researched by many others. All this is clear from benchmarks below.
> 
> This exploits the IEEE-754 format to get very good performance even in
> the worst case for integer powers of 2. This solves all the issues noted
> above. Function tested with clang usan over [-1000, 1000] (beyond range of
> relevance for this, which is [-255, 255]), patch itself with FATE.
> 
> Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
> the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
> Low number of runs also given to prove the point about worst case:
> 
> pow:
>  216270 decicycles in pow,       1 runs,      0 skips
>  110175 decicycles in pow,       2 runs,      0 skips
>   56085 decicycles in pow,       4 runs,      0 skips
>   29013 decicycles in pow,       8 runs,      0 skips
>   15472 decicycles in pow,      16 runs,      0 skips
>    8689 decicycles in pow,      32 runs,      0 skips
>    5295 decicycles in pow,      64 runs,      0 skips
>    3599 decicycles in pow,     128 runs,      0 skips
>    2748 decicycles in pow,     256 runs,      0 skips
>    2304 decicycles in pow,     511 runs,      1 skips
>    2072 decicycles in pow,    1022 runs,      2 skips
>    1963 decicycles in pow,    2044 runs,      4 skips
>    1894 decicycles in pow,    4091 runs,      5 skips
>    1860 decicycles in pow,    8184 runs,      8 skips
> 
> exp2f:
>  134140 decicycles in pow,       1 runs,      0 skips
>   68110 decicycles in pow,       2 runs,      0 skips
>   34530 decicycles in pow,       4 runs,      0 skips
>   17677 decicycles in pow,       8 runs,      0 skips
>    9175 decicycles in pow,      16 runs,      0 skips
>    4931 decicycles in pow,      32 runs,      0 skips
>    2808 decicycles in pow,      64 runs,      0 skips
>    1747 decicycles in pow,     128 runs,      0 skips
>    1208 decicycles in pow,     256 runs,      0 skips
>     952 decicycles in pow,     512 runs,      0 skips
>     822 decicycles in pow,    1024 runs,      0 skips
>     765 decicycles in pow,    2047 runs,      1 skips
>     722 decicycles in pow,    4094 runs,      2 skips
>     693 decicycles in pow,    8190 runs,      2 skips
> 
> exp2fi:
>    2740 decicycles in pow,       1 runs,      0 skips
>    1530 decicycles in pow,       2 runs,      0 skips
>     955 decicycles in pow,       4 runs,      0 skips
>     622 decicycles in pow,       8 runs,      0 skips
>     477 decicycles in pow,      16 runs,      0 skips
>     368 decicycles in pow,      32 runs,      0 skips
>     317 decicycles in pow,      64 runs,      0 skips
>     291 decicycles in pow,     128 runs,      0 skips
>     277 decicycles in pow,     256 runs,      0 skips
>     268 decicycles in pow,     512 runs,      0 skips
>     265 decicycles in pow,    1024 runs,      0 skips
>     263 decicycles in pow,    2048 runs,      0 skips
>     263 decicycles in pow,    4095 runs,      1 skips
>     260 decicycles in pow,    8191 runs,      1 skips
> 
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>

probably ok

thx

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151208/1c780a3a/attachment.sig>


More information about the ffmpeg-devel mailing list