[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext

Michael Niedermayer michaelni
Fri May 15 17:11:52 CEST 2009


On Wed, May 13, 2009 at 05:03:03PM -0300, Ramiro Polla wrote:
> Hi,
> 
> On Wed, Apr 29, 2009 at 9:58 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Wed, Apr 29, 2009 at 01:15:14AM -0300, Ramiro Polla wrote:
> [...]
> >> +void ff_mlp_filter_channel_x86_64(int32_t *firbuf, const int32_t *fircoeff, int firorder,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int32_t *iirbuf, const int32_t *iircoeff, int iirorder,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned int filter_shift, int32_t mask, int blocksize,
> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int32_t *sample_buffer)
> >> +{
> >> + ? ?void *firjump = ff_mlp_firtable_x86_64[firorder];
> >> + ? ?void *iirjump = ff_mlp_iirtable_x86_64[iirorder];
> >> +
> >> + ? ?blocksize = -blocksize;
> >> +
> >> + ? ?__asm__ volatile(
> >> + ? ? ? ?"1: ? ? ? ? ? ? ? ? ? ? ? ?\n\t"
> >> + ? ? ? ?"xor ? ? %%rsi ? ? ?, %%rsi\n\t"
> >> + ? ? ? ?"jmp ? ?*%[firjump] ? ? ? ?\n\t"
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x1c, ff_mlp_firorder_x86_64_8)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x18, ff_mlp_firorder_x86_64_7)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x14, ff_mlp_firorder_x86_64_6)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x10, ff_mlp_firorder_x86_64_5)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x0c, ff_mlp_firorder_x86_64_4)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x08, ff_mlp_firorder_x86_64_3)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x04, ff_mlp_firorder_x86_64_2)
> >> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x00, ff_mlp_firorder_x86_64_1)
> >> + ? ? ? ?MANGLE(ff_mlp_firorder_x86_64_0)":\n\t"
> >> + ? ? ? ?"jmp ? ?*%[iirjump] ? ? ? ?\n\t"
> >> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x0c, ff_mlp_iirorder_x86_64_4)
> >> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x08, ff_mlp_iirorder_x86_64_3)
> >> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x04, ff_mlp_iirorder_x86_64_2)
> >> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x00, ff_mlp_iirorder_x86_64_1)
> >
> > you probably could put some of the coeffs in registers
> 
> Added the 3 first FIR coeffs until gcc started complaining that there
> were no more free regs.
> 
> >> + ? ? ? ?MANGLE(ff_mlp_iirorder_x86_64_0)":\n\t"
> >
> >> + ? ? ? ?"mov ? ? %%rsi ? ? ?, %%rax\n\t"
> >
> > useless
> 
> Removed.
> 
> >> + ? ? ? ?"shr ? ? %%cl ? ? ? , %%rax\n\t"
> >> +
> >> + ? ? ? ?"mov ? ? %%rax ? ? ?, %%rdx\n\t"
> >> + ? ? ? ?"add ? ?(%[sample]) , %%rax\n\t"
> >> + ? ? ? ?"and ? ? %[mask] ? ?, %%rax\n\t"
> >> + ? ? ? ?"sub ? ? ? ? ? ? ?$4, ?%[firbuf]\n\t"
> >> + ? ? ? ?"sub ? ? ? ? ? ? ?$4, ?%[iirbuf]\n\t"
> >
> > these 2 buffers can apparently be merged simplifying addressing
> 
> Merged, and coeffs too.
> 
> >> + ? ? ? ?"mov ? ? %%eax ? ? ?, (%[firbuf])\n\t"
> >> + ? ? ? ?"mov ? ? %%eax ? ? ?, (%[sample])\n\t"
> >
> > this looks mildly redundant ...
> 
> I tried removing firbuf and instead using *sample directly, but this
> led to slower code.
> 
> I also tried switching sample_buffer from
> [MAX_BLOCKSIZE][MAX_CHANNELS] to [MAX_CHANNELS][MAX_BLOCKSIZE] so that
> I could access the members more closely, but this also led to slower
> code overall.
> 
> I renamed the MUL macros as per Mans' suggestion, and reworked most of
> the asm code (32-bit now has keeps some pointers in registers and is
> much faster). I also removed the attempt to manually schedule MUL32
> because it led to uglier code and Dark_Shikari suggested it wouldn't
> do much good because of out-of-order execution anyways.
> 
> Order of patches:
> include_mlp_h.diff
> join_states_coeffs.diff
> x86_filter.diff
> 
> speedup:
> 32-bit: 12.59%
> 64-bit:  9.98%
> 
> I haven't pursued sse4 anymore because the x86_32 code is very close
> in speed, and I have other work to do.
> 
> Ramiro Polla

>  mlpdsp.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> ea3cc210c99e4f980a38ff26342846adae7e7dd6  include_mlp_h.diff

ok

[...]

>  dsputil.h |    4 ++--
>  mlp.h     |    2 +-
>  mlpdec.c  |   15 ++++++++-------
>  mlpdsp.c  |    8 ++++++--
>  4 files changed, 17 insertions(+), 12 deletions(-)
> b4a586612c90d2e5430ac416f16dbaa12f282383  join_states_coeffs.diff

ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I hate to see young programmers poisoned by the kind of thinking
Ulrich Drepper puts forward since it is simply too narrow -- Roman Shaposhnik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090515/2de48a34/attachment.pgp>



More information about the ffmpeg-devel mailing list