[FFmpeg-devel] [PATCH 0/9] DCA (DTS) decoder optimisations for ARMv6

Mon Jul 22 12:41:22 CEST 2013

On Mon, Jul 15, 2013 at 06:28:08PM +0100, Ben Avison wrote:
> I present here a patch series aimed at making DCA decode practical
> on the Raspberry Pi. This uses an ARM1176JZF-S core, which is
> ARMv6Z + VFPv2. Since DCA is a floating point codec, the
> optimisations mostly rely upon hand-scheduled VFP code and the use
> of short vectors. Note that short vectors are deprecated on Cortex-A8
> and unsupported on Cortex-A9 and later, so the existing NEON
> implementations remain the preferred code for ARMv7 or later.
> 
> Note that some of these patches result in floating point operations
> being performed in a different order, with corresponding effects upon
> rounding, so you might not always see a binary-identical result.
> Additional subtle changes may be caused by the fact that I'm
> configuring the VFP to RunFast mode, which amongst other things will
> flush denormalised numbers to 0.
> 
> I'm afraid I haven't been able to prove this using "make fate" since
> I have been unable to find a base revision in git that passes the
> tests even without any of my patches applied. This even goes for
> supposedly known good revisions from fate.ffmpeg.org, such as
> 786b096 (illustrated at http://tinyurl.com/p5hqrue). I haven't
> identified whether this is due to toolchain or hardware differences:
> I'm using gcc (Debian 4.6.3-14+rpi1) 4.6.3 on ARM1176JZF-S, the one
> on fate.ffmpeg.org is gcc 4.4.7 (Ubuntu/Linaro 4.4.7-1ubuntu2) and
> presumably a Cortex-A9.
> 
> Two of the optimisations rely upon new function pointers. The changes
> to the C code to utilise these pointers are platform-independent, and
> are given in separate patches from the optimisations themselves.
> 
> Benchmarks presented here were gathered using gperftools, a
> statistical sampler. The numbers are the number of samples when
> decoding a 30 minute test stream, averaged over 4 runs; lower numbers
> represent faster operation.
> 
> The combined effect of this patch series is a speedup of
> approximately 67%.
> 
> Ben Avison (9):
>   [ARMv6] Add VFP-accelerated version of synth_filter_float
>   [ARMv6] Add VFP-accelerated version of int32_to_float_fmul_scalar
>   New fmtconvert method, int32_to_float_fmul_scalar_array
>   [ARMv6] Add VFP-accelerated version of
>     int32_to_float_fmul_scalar_array
>   [ARMv6] Add VFP-accelerated version of imdct_half
>   [ARMv6] Add VFP-accelerated version of dca_lfe_fir
>   [ARMv6] Add VFP-accelerated version of fft16
>   New dcadsp method, qmf_32_subbands
>   [ARMv6] Add VFP-accelerated version of qmf_32_subbands

Patchset merged

please check that what was merged fully works

thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130722/66ad96d2/attachment.asc>