[Ffmpeg-devel] [RFC] Addition of JIT accelerated scaler for ARM into libswscale

Siarhei Siamashka siarhei.siamashka
Tue Jan 23 23:39:00 CET 2007

On Tuesday 23 January 2007 14:30, Reimar Doeffinger wrote:

> > A natural solution for getting good scaler performance is to use JIT
> > style dynamic code generation. I spent full two days on the last weekend
> > and got some initial scaler implementation working (it is quite simple
> > and straightforward and uses less than 300 lines of code):
> > https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_noki
> >a770/?root=mplayer
> What is the point of those four mprotects?? AFAICT at most you would
> want to do one mprotect at the end to remove the write permission, but
> if that is worth the extra dependency...

One thing that is a bit different on ARM is that instruction cache coherency
is not guaranteed automatically for self modifying code and explicit cache
flush is required. Cache flush is performed by privileged instructions and
can't be done in user mode. So operating system should provide some
API for cache flushing. There is "Instruction Memory Barriers" part in ARM
Architecture Reference Manual [1], it contains the recommendation for
operating systems to use 'SWI 0xF00000' instruction to do syscall for
providing this functionality. I did some search in the web for ARM, dynamic
code generation and cache flushing and found some chunk of code that is 
used in mono virtual machine to do cache flush [2]:

mono_arch_flush_icache (guint8 *code, gint size)
	__asm __volatile ("mov r0, %0\n"
			"mov r1, %1\n"
			"mov r2, %2\n"
			"swi 0x9f0002       @ sys_cacheflush"
			: /* no outputs */
			: "r" (code), "r" (code + size), "r" (0)
			: "r0", "r1", "r3" );

So syscall number for linux is actually different from what is recommended by
ARM and apparently this code is not portable (systems other than linux may use
something different).

It would be reasonable to assume that when we do mmap to request an 
executable block of memory, instructions cache would be already flushed for
this area. But unfortunately there seem to be some issues because of probably
some bugs: 

So all there mprotect calls in my code are done in order to ensure that cache
flushing works correctly. So we do:
* mmap some memory block with permission to execute code from it
* generate a simple code for function that should return 0
* call this code and check that it really returned 0
* do mprotect to disable and reenable code execution (and hope that it does
cache flush)
* generate a simple code  for function that should return 1
* call this code and check that it really returned 1 (without mprotect calls
it would still return 0)
* finally do mprotect to disable and reenable code execution to have
instructions cache flushed again

After all these steps have been successfully completed, we can be sure that
everything works as expected. The only possible reason for this code to break
is when original mmapped buffer already contains some cached instructions 
and the third step would result in a crash. But we can't do anything to
prevent this anyway (and probability of crash in this situation should be
extremely low). I just want to be sure that a broken mmap (which does not
flush cache) will not result in the following pattern:
* we do mmap and generate some scaling code inside of this buffer
* we need to change video resolution, buffer is unmapped and we do mmap 
again getting buffer at the same address with already cached instructions
* we generate new scaling code
* attempt of calling generated code results in old scaler code execution
because it is fetched from cache, resulting in undefined behaviour

Anyway, I think that using mmap/mprotect should be the most portable
way of maintaining instruction cache coherency. It should probably work on 
all POSIX systems.

[1] http://www.arm.com/community/academy/resources.html
[2] http://svn.myrealbox.com/source/trunk/mono/mono/mini/mini-arm.c

More information about the ffmpeg-devel mailing list