[FFmpeg-devel] [PATCH 3/3] Use DSPContext.vector_fmul() and DSPContext.vector_fmul_reverse() in floating-point version of apply_window(). 46% faster in function apply_window().

Sat Jan 22 18:52:28 CET 2011

Justin Ruggles <justin.ruggles at gmail.com> writes:

> On 01/18/2011 10:42 AM, Michael Niedermayer wrote:
>
>> On Wed, Jan 05, 2011 at 04:32:40PM -0500, Justin Ruggles wrote:
>>> On 01/05/2011 04:06 PM, Loren Merritt wrote:
>>>
>>>> On Tue, 4 Jan 2011, Justin Ruggles wrote:
>>>>
>>>>> Currently we have vector_fmul() for: C, neon, vfp, altivec, 3dnow, sse
>>>>>
>>>>> I implemented vector_fmul_copy() for C, altivec, 3dnow, and sse to use 2
>>>>> src and 1 dst. The Altivec version of vector_fmul_copy() has not been
>>>>> tested, but I implemented it in the hope that someone else will test and
>>>>> review it.  Here are some benchmarks on my Athlon64. benchmark numbers
>>>>> are in dezicycles.
>>>>>
>>>>> I also tried to rewrite the current C version in SSE.  It was faster
>>>>> than the fmul_copy+fmul_reverse since it basically merges the 2 loops,
>>>>> but it was slower than vector_fmul_copy(512).  I left that out of the
>>>>> patch.  If anyone is interested I can send it...
>>>>
>>>> I predict that all of the vector_fmul_* mentioned here are memory-bound on 
>>>> intel and arithmetic-bound on amd.
>>>>
>>>> Is there any reason to keep both the 2-arg and 3-arg version of 
>>>> vector_fmul?
>>>
>>>
>>> I tested using vector_fmul_copy with same value for src0 and dst and it
>>> ended up being slower.  I thought it was weird, so I kept both versions.
>>>  Maybe I did something wrong in my tests though...
>>>
>>> Also, I'll try benchmarking these on my laptop (Intel Atom 330, 64-bit
>>> Ubuntu).
>> 
>> Is there a patch i should review left in this thread or should i be waiting
>> for a new one?
>
> new patch attached.  i did more testing, and changing the existing
> vector_fmul() works fine.
>
> -Justin
>
>
> From 5cfe33808452718da0e475694018c56ccb077b2b Mon Sep 17 00:00:00 2001
> From: Justin Ruggles <justin.ruggles at gmail.com>
> Date: Thu, 13 Jan 2011 15:28:06 -0500
> Subject: [PATCH] Change DSPContext.vector_fmul() from dst=dst*src to dest=src0*src1.
> MIME-Version: 1.0
> Content-Type: multipart/mixed; boundary="------------1.7.0.4"
>
> This is a multi-part message in MIME format.
> --------------1.7.0.4
> Content-Type: text/plain; charset=UTF-8; format=fixed
> Content-Transfer-Encoding: 8bit
>
> ---
>  libavcodec/aacenc.c                |    2 +-
>  libavcodec/ac3enc.c                |    4 +-
>  libavcodec/ac3enc_fixed.c          |    2 +-
>  libavcodec/ac3enc_float.c          |   16 ++++--------
>  libavcodec/arm/dsputil_init_neon.c |    2 +-
>  libavcodec/arm/dsputil_neon.S      |   45 +++++++++++++++++------------------
>  libavcodec/atrac3.c                |    2 +-
>  libavcodec/dsputil.c               |    4 +-
>  libavcodec/dsputil.h               |    2 +-
>  libavcodec/nellymoserenc.c         |    6 ++--
>  libavcodec/ppc/float_altivec.c     |   10 ++++----
>  libavcodec/twinvq.c                |    4 +-
>  libavcodec/vorbis_dec.c            |    2 +-
>  libavcodec/x86/dsputil_mmx.c       |   24 +++++++++---------
>  14 files changed, 60 insertions(+), 65 deletions(-)

OK, but you missed the ARM VFP function.  I'll fix that and push.

-- 
M?ns Rullg?rd
mans at mansr.com