[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext

Jason Garrett-Glaser darkshikari
Tue Apr 21 05:29:04 CEST 2009


2009/4/20 Ramiro Polla <ramiro.polla at gmail.com>:
> On Mon, Apr 20, 2009 at 9:40 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Mon, Apr 20, 2009 at 02:29:09AM -0300, Ramiro Polla wrote:
>>> On Mon, Apr 20, 2009 at 12:14 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>> > On Sun, Apr 19, 2009 at 10:10:05PM -0300, Ramiro Polla wrote:
>>> >> Attached file move MLP's dot product to DSPContext. The filter order
>>> >> is a maximum of 8, and in the rematrix stage it's a maximum of 5+2
>>> >> channels for MLP and 7+0 channels for TrueHD, so it all fits in 8
>>> >> (hopefully) optimized functions.
>>> >
>>> > the functions are too small, the call overhead is too much
>>> > 1-8 multiplicatons and 1-8 additions is not enough ...
>>>
>>> I thought that would happen too, but strangely there was a speedup.
>>
>> you wrote the whole function in asm() and that was slower?
>
> Attached are three asm variants: sse2, sse4, and altivec.
>
> Here are the benchmarks:
>
> - on x86
> current: ?3700ms
> array of functions in dspcontext:
> c ? ? ?: ?3300ms
> sse2 ? : ?3400ms
> sse4 ? : ?3200ms
> inlined in mlpdec.c:
> c ? ? ?: ?3500ms
> sse2 ? : ?3200ms
> sse4 ? : ?3100ms
>
> - on x86_64 (can't run sse4)
> current: ?2070ms
> array of functions in dspcontext:
> c ? ? ?: ?2600ms (badly vectorized)
> c ? ? ?: ?1920ms (not vectorized)
> sse2 ? : ?2450ms
> inlined in mlpdec.c:
> c ? ? ?: ?2800ms (badly vectorized)
> c ? ? ?: ?1980ms (not vectorized)
> sse2 ? : ?2450ms

Have you tried benching it on a 64-bit system with SSE4?

Dark Shikari



More information about the ffmpeg-devel mailing list