[FFmpeg-devel] [PATCH] Altivec split-radix FFT

Guillaume POIRIER poirierg
Wed Aug 26 00:07:05 CEST 2009


Hi,

On Mon, Aug 24, 2009 at 1:43 PM, Loren Merritt<lorenm at u.washington.edu> wrote:
> Fixed oprofile thanks to M?ns.
> Now measures 1.85x FFT speedup, on the sizes vorbis uses.
>
> On Mon, 24 Aug 2009, Guillaume POIRIER wrote:
>
>>> I used raw asm rather than intrinsics because gcc adds a ginormous
>>> overhead
>>> to each function call. Is there anything I need to do to make it work on
>>> ppc64, if it doesn't already?
>>
>> I'll look into this. I think all you need to do is avoid refering to
>> the general purpose registers' name explicitely.
>
> You mean using registers' number instead? That's the only thing gas
> supports.

I have no idea what I'm talking about ;-)

Your ASM looks OK at first look, but there's smth wrong about it that
doesn't make it PPC64-compatible:

Starting program: /home/guillaume/ffmpeg-svn/libavcodec/fft-test
FFT 512 test
Checking...

Program received signal SIGSEGV, Segmentation fault.
0x1001115800000000 in ?? ()
(gdb) up
#1  0x00000000100107d0 in .ff_fft_calc_altivec ()
(gdb) disassemble $pc-32 $pc+32
Dump of assembler code from 0x100107b0 to 0x100107f0:
0x00000000100107b0 <.ff_fft_calc_altivec+524>:  ld      r9,-31728(r2)
0x00000000100107b4 <.ff_fft_calc_altivec+528>:  rldicr  r0,r0,3,60
0x00000000100107b8 <.ff_fft_calc_altivec+532>:  add     r9,r9,r0
0x00000000100107bc <.ff_fft_calc_altivec+536>:  ld      r11,0(r9)
0x00000000100107c0 <.ff_fft_calc_altivec+540>:  mtctr   r11
0x00000000100107c4 <.ff_fft_calc_altivec+544>:  stw     r2,-4(r1)
0x00000000100107c8 <.ff_fft_calc_altivec+548>:  li      r2,16
0x00000000100107cc <.ff_fft_calc_altivec+552>:  bctrl
0x00000000100107d0 <.ff_fft_calc_altivec+556>:  lwz     r2,-4(r1)
0x00000000100107d4 <.ff_fft_calc_altivec+560>:  ld      r9,608(r31)
0x00000000100107d8 <.ff_fft_calc_altivec+564>:  lwz     r0,0(r9)
0x00000000100107dc <.ff_fft_calc_altivec+568>:  extsw   r0,r0
0x00000000100107e0 <.ff_fft_calc_altivec+572>:  cmpwi   cr7,r0,4
0x00000000100107e4 <.ff_fft_calc_altivec+576>:  bgt-    cr7,0x10010810
<.ff_fft_calc_altivec+620>
0x00000000100107e8 <.ff_fft_calc_altivec+580>:  ld      r11,616(r31)
0x00000000100107ec <.ff_fft_calc_altivec+584>:  ld      r9,608(r31)
End of assembler dump.
(gdb) print $r2
$1 = 16
(gdb) print $r1
$2 = 17359809783712

I need to look into this. I've never ported code to PPC64, so now's a
good time to start...


>> You patch doesn't apply cleanly here:
>> patch -p1 --dry-run < ../fft_altivec.diff
>> [...]
>> Did I miss something?
>
> That command works for me, on top of svn-r19689.

The problem was that my version of "patch" was confused by the CR/LF
line endings. Fixed locally.

Guillaume
-- 
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.

Ogden Nash  - "The trouble with a kitten is that when it grows up,
it's always a cat." -
http://www.brainyquote.com/quotes/authors/o/ogden_nash.html



More information about the ffmpeg-devel mailing list