[FFmpeg-devel] [PATCH 2/5] truehd: break out part of rematrix_channels into platform-specific callback.
michaelni at gmx.at
Thu Mar 20 20:10:13 CET 2014
On Thu, Mar 20, 2014 at 05:59:54PM -0000, Ben Avison wrote:
> On Thu, 20 Mar 2014 17:03:37 -0000, Michael Niedermayer <michaelni at gmx.at> wrote:
> >On Thu, Mar 20, 2014 at 04:06:12PM -0000, Ben Avison wrote:
> >>rematrix_channels does (accum >> 14) afterwards though, so unlike
> >>mlp_filter_channel (where sometimes the post-accumulate shift is 0) I
> >>think you always need the upper 32 bits of the product.
> >the lsbs of the matrix coeffs are 0 in many cases or at least many of
> >the ones ive seen so far, thus the coeffs can be shifted down and
> >the result of the accumulation shifted up which is equivalent to
> >shifting by less than 14
> That would only work for a coefficient where the 14 lsbs were zero, so
> only applies to 0x4000, 0x8000 and 0xC000 (assuming 0 is already special
if samples are 16 or 24 bit then theres 8 or 16bit left for the matrix
> cased). And it only works when matrix_noise_shift==0.
noise_buffer is 8bit and its shifted left by at least 8 bit
so you certainly have 8lsb 0 that you can shift away
> Unless you're thinking of run-time assembly, however, that means the
> number of permutations to expand for each matrix row has gone up from
> 2^8=256 (each of 8 terms may be multiplied or not) to 3^8=6561 (each
> coefficient may be multiplied before or after the shift, or not used at
> all). Multiplies aren't *that* slow on modern CPUs that it's worth
> testing and branching over them inside the inner loop - a branch
> mispredict sets us back a time equivalent to the pipeline length, so if
> it's 50/50 whether we do or not then the average time would be
> (pipeline_length + mul_cycles) / 2, which is typically rather more than
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Old school: Use the lowest level language in which you can solve the problem
New school: Use the highest level language in which the latest supercomputer
can solve the problem without the user falling asleep waiting.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel