[FFmpeg-devel] [PATCH] fft_tab_neon is not PIC enough
jreiser at bitwagon.com
Sun Jul 15 16:55:34 CEST 2012
On 07/15/2012 02:04 AM, Reimar Döffinger wrote:
> On 14 Jul 2012, at 17:24, John Reiser <jreiser at bitwagon.com> wrote:
>> fft_tab_neon has addresses in .rodata, which is bad for shared libraries.
> Could you please be more specific about what exactly "bad" means?
"Bad" means "does not fulfill the expectations of residing in a read-only
segment." The question becomes, "After what time does 'read-only' apply?"
That is, what is the earliest time after which there are no more writes?
For a physical ROM, the time is immediately after hardware manufacturing.
For the current code, the time is after return from dlopen() or LoadLibrary()
of the shared library into the current process. That's a large
difference in time, and it matters for some environments.
My environment wants "no more writes to .text" after placement of the
library into the file system of an operating system, and current code
does not work with this restriction. Current code requires writing
those addresses during dlopen() or LoadLibrary() into a process.
For Linux or Android, a shared library containing current fft_tab_neon
is marked with the property DT_TEXTREL, which means that the read-only
.text segment requires relocation. After mmap()ing the .text segment
as read-only (no PF_W in .p_flags) then dlopen() applies PROT_W to the
.text segment using mprotect(), relocates the addresses, then removes
PROT_W from the .text segment. The whole operation takes time and
complex code. My environment does not support those extra operations.
>> The attached patch provides the additional relocation by changing
>> the table to an array of branch instructions. The patch also
>> checks the bounds of the index. In Thumb mode the 11-bit displacement
>> field of a short unconditional branch can span +/- 1024 words
>> which is +/- 2048 bytes. By moving the routine to near the middle,
>> then the other code could double in size before exceeding
>> the limit on span.
> This seems extremely brittle to me, especially since I am not aware of a reason why the linker couldn't reorder the functions in whatever way it wants, so the absolute minimum would be supporting an offset that can handle the size of the whole .text segment (or reorganize the code so the linker will not be able to separate it, e.g. by putting everything in a single function, but that is still ugly and brittle).
> For other architectures we have solved this problem by storing the difference between a label and the symbols in a table (those will always be real constants), at full 32 or 64 bits and manually add those onto the PC, then jump to the calculated address.
If you want the property that the linker can change placement of the fft<N>
functions arbitrarily, then change the residence of the fft_tab_neon table
from .rodata to just .data, and discard the patch I submitted. (There is
an assembler pseudo-operation 'const' which enhances placement into section
.rodata, but I could not find something equivalent for section .data.)
The current code is broken for my environment. The patched code works in
my environment and all other known current cases. Enhancements are welcome,
but DT_TEXTREL is prohibited.
More information about the ffmpeg-devel