[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions

Siarhei Siamashka siarhei.siamashka
Mon Jan 14 07:50:49 CET 2008

On 13 January 2008, Vadim Lebedev wrote:
> I'm running your program as follows:
> gcc 4.1.2  -O3 -fomit-frame-pointer  -msse -o vector_fmul_test
> vector_fmul_test.c
> ./vector_fmul_test 2000
> And the output is:
> Function: 'vector_fmul_c', time=73.910 (cycles/element=288.713)
> Function: 'vector_fmul_c_unrolled', time=73.010 (cycles/element=285.195)
> Function: 'vector_fmul_c_other_unrolled', time=72.999
> (cycles/element=285.152)
> Function: 'vector_fmul_c_simd', time=0.141 (cycles/element=0.552)
> Any idea why it is so slow (except simd case)?

The most easy way (and probably the only reliable) to check what's happening
is to add '-S' gcc option to get assembly output and have a look at the code
generated. Or use objdump tool. For example, a completely random guess is that
gcc might have tried to use software floating point math emulation for
whatever reason.

Anyway, x86 results are quite boring. It would be most interesting to hear 
from the fellow non-x86 users ;) I noticed that the use ffvorbis on sh4
architecture has been mentioned recently.

More information about the ffmpeg-devel mailing list