[FFmpeg-devel] [PATCH 7/9] sbcenc: add MMX optimizations
James Almer
jamrial at gmail.com
Sat Dec 23 20:35:28 EET 2017
On 12/23/2017 3:01 PM, Aurelien Jacobs wrote:
> This was originally based on libsbc, and was fully integrated into ffmpeg.
>
> Rough speed test:
> C version: speed= 592x
> MMX version: speed= 785x
> ---
> libavcodec/sbcdsp.c | 3 +
> libavcodec/sbcdsp.h | 2 +
> libavcodec/x86/Makefile | 2 +
> libavcodec/x86/sbcdsp.asm | 284 +++++++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/sbcdsp_init.c | 51 ++++++++
> 5 files changed, 342 insertions(+)
> create mode 100644 libavcodec/x86/sbcdsp.asm
> create mode 100644 libavcodec/x86/sbcdsp_init.c
[...]
> +;*******************************************************************
> +;void ff_sbc_calc_scalefactors(int32_t sb_sample_f[16][2][8],
> +; uint32_t scale_factor[2][8],
> +; int blocks, int channels, int subbands)
> +;*******************************************************************
> +INIT_MMX mmx
> +cglobal sbc_calc_scalefactors, 5, 7, 3, sb_sample_f, scale_factor, blocks, channels, subbands, ptr, blk
> + ; subbands = 4 * subbands * channels
> + shl subbandsd, 2
> + cmp channelsd, 2
> + jl .loop_1
> + shl subbandsd, 1
> +
> +.loop_1:
> + sub subbandsq, 8
> + lea ptrq, [sb_sample_fq + subbandsq]
> +
> + ; blk = (blocks - 1) * 64;
> + lea blkq, [blocksq - 1]
> + shl blkd, 6
> +
> + movq m0, [scale_mask]
I insist, this can be easily loaded outside the loop. You have enough
spare regs to store a copy.
In any case, the patch looks good regardless of the above.
More information about the ffmpeg-devel
mailing list