[FFmpeg-devel] [PATCH 04/11] x86: dcadsp: implement SSE lfe_dir
christophe.gisquet at gmail.com
Tue Feb 11 08:41:52 CET 2014
2014-02-11 3:01 GMT+01:00 Michael Niedermayer <michaelni at gmx.at>:
> On Fri, Feb 07, 2014 at 10:35:22PM +0100, Christophe Gisquet wrote:
> I think you can merge the scale factor into the lfe_fir_* tables
> avoiding some instructions
I'm not sure I see how to do this. The scale is not constant (varying
across calls), and if I understand lfe_fir_* tables to refer to the
input (not coeffs), then indeed it's 4 or 8 inputs. I'm already
scaling them with mulps IN, SCALE (rather than on output, see
my modification to the C code in another patch).
On the other hand, there are 32 or 64 coefficients, so I better not scale those.
> also the coeff table looks constant so you can reorder it any
> way at no cost
You're right, I can probably remove 1 or 2 shuffling insn per call.
But as it might be insn-set dependent, then I would need to provide a
table shuffling function for init of the table. Unless I can make sure
the neon implementation needs the same, and until someone demonstrate
> Not sure whats the fastest way to implement this but
> you could form all 4 needed permutations of the input and then do a
> simpler 4x(mova, mulps, addps) inner loop
You mean, e.g., increase the table size to store the potential
permutations (they might be insn set dependent), and load the proper
one? Then, if I'm not mistaken, I always save one shuffle but need to
reload the input and redo a mulps.
> I maybe have missed a detail here or there but i suspect this can
> be done more efficiently than how its implemented (with differently
> ordered coeff tables)
I think so too, but the coeffs are used in an order then in another,
so I think this will save the shuffling of the 4/8-input before the
Thanks for the review,
More information about the ffmpeg-devel