[FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

Mon Nov 27 01:36:43 EET 2017

On 2017-11-27 00:17, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley <james.darnley at gmail.com> wrote:
>> @@ -152,13 +152,13 @@ RET
>>  %macro FUNCTION_BODY_32 0
>>
>>  %if ARCH_X86_64
>> -    cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
>> +    cglobal flac_enc_lpc_32, 5, 7, 8, mmsize*4, res, smp, len, order,
>> coefs
>>
> 
> Why x4, shouldn't this be x2?

I write 3 mm registers more to the stack.  The first one is the sign
extension for my hacked qword arithmetic shift added in the first 32-bit
patch.  The new 3 are to store the "odd" values created in the first
inner loop.

I admit that this is a rather ugly construction for a little speed gain
but I think I've seen other ugly things since writing this.