[Ffmpeg-devel] mpegaudiodec.c and armv5te optimizations
Wed Oct 4 00:47:23 CEST 2006
On Tuesday 03 October 2006 23:34, Michael Niedermayer wrote:
> > I would like to ask those who are familiar with mp3 decoding algorithm
> > in mpegaudiodec.c better if there could be any really nasty things
> > happening after changing current
> > #define MULH(a,b) (((int64_t)(a) * (int64_t)(b))>>32)
> > #define FIXHR(a) ((int)((a) * (1LL<<32) + 0.5))
> > to something like
> > #define MULH(a,b) (((int64_t)(a) * (int16_t)(b))>>16)
> > #define FIXHR(a) ((int16_t)((a) * (1LL<<16) + 0.5))
> > in low quality decoding mode.
> > I tried to decode a few mp3 files and the difference does not seem to be
> > very noticeable (samples seem to differ +-4 at most).
> i think the change should be ok (for ARM) for x86 it should be slower
Sure, I just wanted to know if reduction of precision of these constants from
32-bit to 16-bit could have any other negative effect. And this optimization
can really only be used for processors that have a special instruction for
Anyway, here is a simple patch attached.
Tested with the latest mplayer SVN (with some tweaks to get it compiled with
HAVE_ARMV5TE defined). Configured using:
CFLAGS="-O4 -pipe -ffast-math -fomit-frame-pointer -mcpu=arm926ej-s -DHAVE_ARMV5TE" ./configure --disable-libavcodec_mpegaudio_hp
Results of decoding mp3 file to /dev/null:
ffmp3 (current): 58.7 seconds
ffmp3 (patched ): 56.6 seconds
libmad: 46.2 seconds
Effect is minimal and quite disappointing. We gain very little, but lose some
precision. But it is understandable, compiler can't load and pack two 16-bit
constants in a register, also it does not take into account 1 clock penalty
if the result of multiplication is used immediately in the next instruction.
So for any really useful results, fully assembler optimized code is required.
This correlates with another test. I also tried to benchmark simple_idct.c in
dct-test code with 16-bit multiplication macros using armv5te instructions
enabled, it gets only something like ~5% improvement while current
assembler optimized simple_idct_armv5te.S code provides ~50% better
performance (and I think it still can be improved).
Maybe I'll try to do proper fast low quality armv5te mp3 decoding
The only interesting thing left to try is to run this patched version on intel
xscale cpu. From the previous tests, getting rid of 32bit*32bit->64bit
multiplications (compiling with low precision) also gained quite little for
my arm926ej-s cpu, but it had quite a big difference for xscale as tested by
Aurelien Jacobs. Maybe this patch has some more noticeable effect on
xscale as well.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 547 bytes
Desc: not available
More information about the ffmpeg-devel