[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().

Justin Ruggles justin.ruggles
Thu Feb 3 02:08:44 CET 2011


---
On 01/31/2011 03:19 PM, Ronald S. Bultje wrote:
> On Mon, Jan 31, 2011 at 2:53 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> > On Mon, 31 Jan 2011, Justin Ruggles wrote:
>> >
>>> >> I get some very weird mmx2 results when I remove the first sub and
>>> >> change jae to ja.
>>> >>
>>> >> Athlon64 X2 6000+
>>> >> sse2: 3006 -> 2753
>>> >> mmx2: 5228 -> 5453
>>> >>  mmx: 5490 -> 5430
>>> >>
>>> >> Atom 330
>>> >> sse2:  6834 -> 3779
>>> >> mmx2:  9951 -> 10525
>>> >>  mmx: 11390 -> 11325
>>> >>
>>> >> Both CPUs are consistent in the change, except that on Athlon64 the mmx2
>>> >> version is slower than the mmx version.  What do you suggest?
>> >
>> > I usually blame such weird results on code alignment, but I have no
>> > systematic way to fix them.
> Same here, try adding an ALIGN <num> (8 or 16) directly before a loop
> statement, or disassemble before/after and see where alignment could
> cause issues.

Thanks for the suggestion.  Below is a chart of the results for
adding ALIGN 8 and ALIGN 16 before each of the 2 loops.

LOOP1/LOOP2   MMX   MMX2   SSE2
-------------------------------
NONE/NONE :  5270   5283   2757
   NONE/8 :  5200   5077   2644
  NONE/16 :  5723   3961   2161
   8/NONE :  5214   5339   2787
      8/8 :  5198*  5083   2722
     8/16 :  5936   3902   2128
  16/NONE :  6613   4788   2580
     16/8 :  5490   3702   2020
    16/16 :  5474   3680*  2000*

The attached patch uses ALIGN 8 for both loops for MMX and ALIGN 16
for both loops for mmxext and sse2.

 libavcodec/Makefile         |    6 ++-
 libavcodec/ac3dsp.c         |   51 ++++++++++++++++++++++++++++++++
 libavcodec/ac3dsp.h         |   44 ++++++++++++++++++++++++++++
 libavcodec/ac3enc.c         |   35 ++++------------------
 libavcodec/x86/Makefile     |    4 ++
 libavcodec/x86/ac3dsp.asm   |   67 +++++++++++++++++++++++++++++++++++++++++++
 libavcodec/x86/ac3dsp_mmx.c |   45 +++++++++++++++++++++++++++++
 libavcodec/x86/x86util.asm  |   10 ++++++
 8 files changed, 232 insertions(+), 30 deletions(-)
 create mode 100644 libavcodec/ac3dsp.c
 create mode 100644 libavcodec/ac3dsp.h
 create mode 100644 libavcodec/x86/ac3dsp.asm
 create mode 100644 libavcodec/x86/ac3dsp_mmx.c

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-x86-optimized-versions-of-exponent_min.patch
Type: text/x-patch
Size: 12978 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110202/572e03bb/attachment.bin>



More information about the ffmpeg-devel mailing list