[FFmpeg-devel] [PATCH 0/6] truehd: ARM optimisations

Wed Mar 19 21:58:47 CET 2014

Hi Christophe,

> I don't expect my old work to bring much (and the branch [based on the
> fork] is probably outdated), but I noticed the coefficients used by
> mlp_filter_channel were guaranteed to be 16bits. I was expecting to be
> able afterwards not to use 64b arithmetics but that failed.

God spot! (And to put it on record, it's guaranteed to be expressable as a
*signed* 16-bit int, which is likely to be important for some
architectures.)

As you noticed, the upper 32 bits of the product are significant, at
least when filter_shift is non-zero. I do actually take advantage of the
fact that when filter_shift == 0, I can use 32 x 32 -> 32-bit multiplies
rather than 32 x 32 -> 64-bit ones, which are a slightly faster operation.

Unfortunately, ARM doesn't have a 16 x 32 -> 64-bit multiply, and the
16 x 32 -> 32-bit one returns the most-significant 32 bits, so it
wouldn't even be any use for the filter_shift == 4 case because carries
up from the least-significant 16 bits of the accumulator would be lost.

Packing the filter coefficients more tightly into a 16-bit array might be
advantageous on some architectures, but it doesn't look like it would on
ARM. Neither ARMv6 nor NEON seem to allow parallel multiplications of
multiplicands of differing widths, so they'd have to spend time unpacking
the 16-bit values back into 32-bit format before they could use them.

Ben