[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().

Justin Ruggles justin.ruggles
Mon Jan 31 19:18:18 CET 2011


On 01/31/2011 12:21 AM, Loren Merritt wrote:

>> +cglobal ac3_exponent_min_%1, 3,4,2, exp, reuse_blks, expn, offset
>> +    cmp  reuse_blksq, 0
> 
> shl sets flags.

ok, this does fine with:
shl reuse_blksq, 8
jz .end

>> +    je .end
>> +    sub        expnq, mmsize
>> +    shl  reuse_blksq, 8
>> +.nextexp:
>> +    mov      offsetq, reuse_blksq
>> +    mova          m0, [expq+offsetq]
>> +    sub      offsetq, 256
>> +.nextblk:
>> +    PMINUB        m0, [expq+offsetq], m1
>> +    sub      offsetq, 256
>> +    jae .nextblk
>> +    mova      [expq], m0
>> +    add         expq, mmsize
>> +    sub        expnq, mmsize
>> +    jae .nextexp
> 
> ja, and remove the first sub


I get some very weird mmx2 results when I remove the first sub and
change jae to ja.

Athlon64 X2 6000+
sse2: 3006 -> 2753
mmx2: 5228 -> 5453
 mmx: 5490 -> 5430

Atom 330
sse2:  6834 -> 3779
mmx2:  9951 -> 10525
 mmx: 11390 -> 11325

Both CPUs are consistent in the change, except that on Athlon64 the mmx2
version is slower than the mmx version.  What do you suggest?

-Justin




More information about the ffmpeg-devel mailing list