[FFmpeg-devel] [PATCH] Add x86-optimized function ac3_or_abs_int16() and use in log2_tab().
Sat Feb 12 13:48:23 CET 2011
Loren Merritt <lorenm at u.washington.edu> writes:
>>+%macro PABSW2_MMX 6 ; dst1, dst2, src1, src2, temp1, temp2
>>+ mova %1, %3
>>+ mova %2, %4
>>+ mova %5, %1
>>+ mova %6, %2
>>+ psraw %5, 15
>>+ psraw %6, 15
>>+ pxor %1, %5
>>+ pxor %2, %6
>>+ psubw %1, %5
>>+ psubw %2, %6
>>+%macro PABSW2_SSSE3 6 ; dst1, dst2, src1, src2, unused, unused
>>+ pabsw %1, %3
>>+ pabsw %2, %4
> Already in x86util.asm
> But you don't actually want to compute (bit-or of abs), right? You
> want to compute (log2 of max of abs). Since MMX has min/max
> instructions and doesn't have abs, try running signed min/max first
> and doing abs only once in the tail.
> That way might be faster in C too, on cpus with scalar cmov/min/max
> and without scalar abs.
So the description could be made more general, allowing both approaches.
mans at mansr.com
More information about the ffmpeg-devel