[FFmpeg-devel] TEXTRELs in fft_mmx.asm (was Re: [PATCH] split-radix FFT)

Dominik 'Rathann' Mierzejewski dominik
Sat Nov 1 22:21:01 CET 2008


Hi.

Sorry for dragging out an old thread, but...

[...]
> From 548a4d20ec39d14829f46ae2ee0325908d095097 Mon Sep 17 00:00:00 2001
> From: Loren Merritt <pengvado at akuvian.org>
> Date: Wed, 23 Jul 2008 22:55:09 -0600
> Subject: [PATCH] split-radix FFT
>  c is 1.9x faster than previous c (on various x86 cpus), sse is 1.6x faster than previous sse.
> 
> ---
>  libavcodec/Makefile         |    5 +
>  libavcodec/dsputil.h        |    9 +-
>  libavcodec/fft.c            |  371 ++++++++++++++++++++++------------
>  libavcodec/i386/fft_3dn.c   |  111 +----------
>  libavcodec/i386/fft_3dn2.c  |  110 ++---------
>  libavcodec/i386/fft_mmx.asm |  467 +++++++++++++++++++++++++++++++++++++++++++
>  libavcodec/i386/fft_sse.c   |  149 ++++----------
>  7 files changed, 783 insertions(+), 439 deletions(-)
>  create mode 100644 libavcodec/i386/fft_mmx.asm
> 
[...]
> diff --git a/libavcodec/i386/fft_mmx.asm b/libavcodec/i386/fft_mmx.asm
> new file mode 100644
> index 0000000..c0a9bd5
> --- /dev/null
> +++ b/libavcodec/i386/fft_mmx.asm
[...]
> +%macro DECL_FFT 2-3 ; nbits, cpu, suffix
> +%xdefine list_of_fft fft4%2, fft8%2
> +%if %1==5
> +%xdefine list_of_fft list_of_fft, fft16%2
> +%endif
> +
> +%assign n 1<<%1
> +%rep 17-%1
> +%assign n2 n/2
> +%assign n4 n/4
> +%xdefine list_of_fft list_of_fft, fft %+ n %+ %3%2
> +
> +align 16
> +fft %+ n %+ %3%2:
> +    call fft %+ n2 %+ %2
> +    add r0, n*4 - (n&(-2<<%1))
> +    call fft %+ n4 %+ %2
> +    add r0, n*2 - (n2&(-2<<%1))
> +    call fft %+ n4 %+ %2
> +    sub r0, n*6 + (n2&(-2<<%1))
> +    lea r1, [ff_cos_ %+ n GLOBAL]
> +    mov r2d, n4/2
> +    jmp pass%3%2
> +
> +%assign n n*2
> +%endrep
> +%undef n
> +
> +align 8
> +dispatch_tab%3%2: pointer list_of_fft
> +
> +; On x86_32, this function does the register saving and restoring for all of fft.
> +; The others pass args in registers and don't spill anything.
> +cglobal ff_fft_dispatch%3%2, 2,5,0, z, nbits
> +    lea r2, [dispatch_tab%3%2 GLOBAL]
> +    mov r2, [r2 + (nbitsq-2)*gprsize]
> +    call r2
> +    RET
> +%endmacro ; DECL_FFT
> +
> +DECL_FFT 5, _sse
> +DECL_FFT 5, _sse, _interleave
> +DECL_FFT 4, _3dn
> +DECL_FFT 4, _3dn, _interleave
> +DECL_FFT 4, _3dn2
> +DECL_FFT 4, _3dn2, _interleave

... these 6 macros seem to be causing textrels even on x86_64.
I've already given up on avoiding textrels in FFmpeg on x86_32,
but on x86_64 this is the only problematic case.

Here's how I found them:

$ ./configure --enable-shared --disable-static --enable-gpl --enable-swscale --enable-postproc --enable-avfilter --enable-avfilter-lavf --enable-pthreads
$ make
...
yasm -f elf -DARCH_X86_64 -m amd64 -DPIC -g dwarf2 -I i386/ -o i386/fft_mmx.o i386/fft_mmx.asm
...

Note that fft_mmx.asm is compiled into PIC.

$ eu-readelf -l libavcodec.so.52
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x44966c 0x44966c R E 0x200000
  LOAD           0x449670 0x0000000000649670 0x0000000000649670 0x014c78 0x2b8530 RW  0x200000
  DYNAMIC        0x4510e0 0x00000000006510e0 0x00000000006510e0 0x0001f0 0x0001f0 RW  0x8
  NOTE           0x000190 0x0000000000000190 0x0000000000000190 0x000024 0x000024 R   0x4
  GNU_EH_FRAME   0x4295c0 0x00000000004295c0 0x00000000004295c0 0x005ee4 0x005ee4 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x8

 Section to Segment mapping:
  Segment Sections...
   00      [RO: .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame]
   01      .ctors .dtors .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
   02      .dynamic
   03      [RO: .note.gnu.build-id]
   04      [RO: .eh_frame_hdr]
   05     

(note the size of the first (text) section)

$ eu-readelf -r libavcodec.so.52

Relocation section [ 7] '.rela.dyn' for section [ 0] '' at offset 0x10048 contains 3839 entries:
  Offset              Type            Value               Addend Name
  0x000000000036b948  X86_64_RELATIVE 000000000000000000  +3582944 
  0x000000000036b950  X86_64_RELATIVE 000000000000000000  +3583024 
  0x000000000036b958  X86_64_RELATIVE 000000000000000000  +3583200 
...
  0x000000000036cba8  X86_64_RELATIVE 000000000000000000  +3590800 
  0x000000000036cbb0  X86_64_RELATIVE 000000000000000000  +3590864 
  0x000000000036cbb8  X86_64_RELATIVE 000000000000000000  +3590928 
  0x00000000006496a0  X86_64_RELATIVE 000000000000000000  +4358183 
  0x00000000006496b0  X86_64_RELATIVE 000000000000000000  +3716537 
  0x00000000006496c0  X86_64_RELATIVE 000000000000000000  +3716541 
...

Every address that is in the range of a segment which is loaded without
write permission indicates a text relocation[1]. Note that all the
relocations at the beginning fall within the first section, which has
only Read and Execute permissions.

Let's find where they come from:
$ for addr in `eu-readelf -r libavcodec.so.52 | grep 0x000000000036 | awk '{print $1;}'` ; do eu-addr2line -f -S -e libavcodec.so.52 $addr ; done | grep asm | sort -u
i386/fft_mmx.asm:461
i386/fft_mmx.asm:462
i386/fft_mmx.asm:463
i386/fft_mmx.asm:464
i386/fft_mmx.asm:465
i386/fft_mmx.asm:466

Loren (or anyone else familiar with the code): is it possible to avoid them?

PS. Same can be done with the tools from binutils (without the eu- prefix).

[1] http://people.redhat.com/drepper/textrelocs.html

-- 
MPlayer http://mplayerhq.hu | Livna http://rpm.livna.org
There should be a science of discontent. People need hard times and
oppression to develop psychic muscles.
	-- from "Collected Sayings of Muad'Dib" by the Princess Irulan




More information about the ffmpeg-devel mailing list