[FFmpeg-cvslog] avcodec/jpeg2000: replace naive pow call with smarter exp2fi
Ganesh Ajjanagadde
git at videolan.org
Wed Dec 9 04:03:58 CET 2015
ffmpeg | branch: master | Ganesh Ajjanagadde <gajjanagadde at gmail.com> | Mon Dec 7 21:31:41 2015 -0500| [42868ca569f33b91b0e61ecc3065e7199e9ca58a] | committer: Ganesh Ajjanagadde
avcodec/jpeg2000: replace naive pow call with smarter exp2fi
pow is a very wasteful function for this purpose. A low hanging fruit
would be simply to replace with exp2f, and that does yield some speedup.
However, there are 2 drawbacks of this:
1. It does not exploit the integer nature of the argument.
2. (minor) Some platforms lack a proper exp2f routine, making benefits available
only to non broken libm.
3. exp2f does not solve the same issue that plagues pow, namely terrible
worst case performance. This is a fundamental issue known as the
"table-maker's dilemma" recognized by Prof. Kahan himself and
subsequently elaborated and researched by many others. All this is clear from benchmarks below.
This exploits the IEEE-754 format to get very good performance even in
the worst case for integer powers of 2. This solves all the issues noted
above. Function tested with clang usan over [-1000, 1000] (beyond range of
relevance for this, which is [-255, 255]), patch itself with FATE.
Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
Low number of runs also given to prove the point about worst case:
pow:
216270 decicycles in pow, 1 runs, 0 skips
110175 decicycles in pow, 2 runs, 0 skips
56085 decicycles in pow, 4 runs, 0 skips
29013 decicycles in pow, 8 runs, 0 skips
15472 decicycles in pow, 16 runs, 0 skips
8689 decicycles in pow, 32 runs, 0 skips
5295 decicycles in pow, 64 runs, 0 skips
3599 decicycles in pow, 128 runs, 0 skips
2748 decicycles in pow, 256 runs, 0 skips
2304 decicycles in pow, 511 runs, 1 skips
2072 decicycles in pow, 1022 runs, 2 skips
1963 decicycles in pow, 2044 runs, 4 skips
1894 decicycles in pow, 4091 runs, 5 skips
1860 decicycles in pow, 8184 runs, 8 skips
exp2f:
134140 decicycles in pow, 1 runs, 0 skips
68110 decicycles in pow, 2 runs, 0 skips
34530 decicycles in pow, 4 runs, 0 skips
17677 decicycles in pow, 8 runs, 0 skips
9175 decicycles in pow, 16 runs, 0 skips
4931 decicycles in pow, 32 runs, 0 skips
2808 decicycles in pow, 64 runs, 0 skips
1747 decicycles in pow, 128 runs, 0 skips
1208 decicycles in pow, 256 runs, 0 skips
952 decicycles in pow, 512 runs, 0 skips
822 decicycles in pow, 1024 runs, 0 skips
765 decicycles in pow, 2047 runs, 1 skips
722 decicycles in pow, 4094 runs, 2 skips
693 decicycles in pow, 8190 runs, 2 skips
exp2fi:
2740 decicycles in pow, 1 runs, 0 skips
1530 decicycles in pow, 2 runs, 0 skips
955 decicycles in pow, 4 runs, 0 skips
622 decicycles in pow, 8 runs, 0 skips
477 decicycles in pow, 16 runs, 0 skips
368 decicycles in pow, 32 runs, 0 skips
317 decicycles in pow, 64 runs, 0 skips
291 decicycles in pow, 128 runs, 0 skips
277 decicycles in pow, 256 runs, 0 skips
268 decicycles in pow, 512 runs, 0 skips
265 decicycles in pow, 1024 runs, 0 skips
263 decicycles in pow, 2048 runs, 0 skips
263 decicycles in pow, 4095 runs, 1 skips
260 decicycles in pow, 8191 runs, 1 skips
Reviewed-by: Michael Niedermayer <michael at niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=42868ca569f33b91b0e61ecc3065e7199e9ca58a
---
libavcodec/jpeg2000.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/libavcodec/jpeg2000.c b/libavcodec/jpeg2000.c
index c988409..afd0966 100644
--- a/libavcodec/jpeg2000.c
+++ b/libavcodec/jpeg2000.c
@@ -192,6 +192,21 @@ void ff_jpeg2000_set_significance(Jpeg2000T1Context *t1, int x, int y,
// static const uint8_t lut_gain[2][4] = { { 0, 0, 0, 0 }, { 0, 1, 1, 2 } }; (unused)
+static inline float exp2fi(int x) {
+ /* Normal range */
+ if (-126 <= x && x <= 128)
+ return av_int2float(x+127 << 23);
+ /* Too large */
+ else if (x > 128)
+ return INFINITY;
+ /* Subnormal numbers */
+ else if (x > -150)
+ return av_int2float(1 << (x+149));
+ /* Negligibly small */
+ else
+ return 0;
+}
+
static void init_band_stepsize(AVCodecContext *avctx,
Jpeg2000Band *band,
Jpeg2000CodingStyle *codsty,
@@ -221,7 +236,7 @@ static void init_band_stepsize(AVCodecContext *avctx,
* R_b = R_I + log2 (gain_b )
* see ISO/IEC 15444-1:2002 E.1.1 eqn. E-3 and E-4 */
gain = cbps;
- band->f_stepsize = pow(2.0, gain - qntsty->expn[gbandno]);
+ band->f_stepsize = exp2fi(gain - qntsty->expn[gbandno]);
band->f_stepsize *= qntsty->mant[gbandno] / 2048.0 + 1.0;
break;
default:
More information about the ffmpeg-cvslog
mailing list