[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions

Loren Merritt lorenm
Tue Jan 8 09:11:00 CET 2008

On Tue, 8 Jan 2008, Siarhei Siamashka wrote:

> And 'vector_fmul_*' functions look like a 'low hanging fruit' in the sense
> that they seem to be quite easy to optimize :) But there is another
> interesting thing, C implementation of these functions is very straightforward
> and it does not even unroll loops. But assembly or other SIMD optimizations
> exist only for x86 and ppc at the moment for these functions. Is it
> intentional and code readability is the main priority for them? Or some tweaks
> could be added to improve 'generic C' code performance?

My logic was: I could tell at a glance that SIMD would be faster than 
scalar x86 code, so I wrote the SSE. Once that was done, the C version was 
not used on any of my CPUs. So there was no point in optimizing it when I 
couldn't benchmark what effect any potential optimization would have on 
any CPU it's actually used on.

The same goes for you: If you write a VFP version, then what reason do you 
have for tweaking the C to run better on your ARM? Unless you have reason 
to believe that other ARM processors without VFP still have the same 
scalar float characteristics.

--Loren Merritt

More information about the ffmpeg-devel mailing list