[Ffmpeg-devel] [PATCH] simple_idct_armv5te optimization

Siarhei Siamashka siarhei.siamashka
Sat Sep 30 18:09:22 CEST 2006


Hello All,

Here is some patch for improving simple idct performance for armv5te. It
contains rows processing almost completely rewritten taking instructions
sheduling into account and avoiding any redundant data load operations (almost
all data is processed in registers, there is even one extra mostly unused 'lr'
register left :)).

For benchmarking I have used a modified dct-test on Nokia 770:
./dct-test -i
...
IDCT INT: err_inf=1 err2=0.01318437 syserr=0.00285000 maxout=266 
blockSumErr=64
IDCT INT: 136.6 kdct/s
...
IDCT SIMPLE-C: err_inf=1 err2=0.00667969 syserr=0.00130000 maxout=266 
blockSumErr=64
IDCT SIMPLE-C: 103.6 kdct/s
...
IDCT SIMPLE-ARM: err_inf=1 err2=0.00667500 syserr=0.00130000 maxout=266 
blockSumErr=64
IDCT SIMPLE-ARM: 140.8 kdct/s
...
IDCT SIMPLE-ARMv5TE: err_inf=1 err2=0.00667969 syserr=0.00130000 maxout=266 
blockSumErr=64
IDCT SIMPLE-ARMv5TE: 153.4 kdct/s

After patch:
IDCT SIMPLE-ARMv5TE: err_inf=1 err2=0.00667969 syserr=0.00130000 maxout=266 
blockSumErr=64
IDCT SIMPLE-ARMv5TE: 158.8 kdct/s

Columns processing can be optimized in a similar way later and in general
there seems to be a lot of things that can be improved (pixels clipping for
example). We can get a really fast video decoding after a few iterations. It
is good that this work was started :)

By the way, it would be interesting to test this code on some intel xscale
processors (reading intel manual, seems like xscale is very sensitive about
instructions ordering and can have lots of performance penalties).

PS. I really start to like armv5te dsp instructions :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple_idct_armv5te_rows.diff
Type: text/x-diff
Size: 17155 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20060930/596baeed/attachment.diff>



More information about the ffmpeg-devel mailing list