[FFmpeg-devel] [PATCH] g722 decoder, no licensing fame

Sat Apr 4 22:59:58 CEST 2009

Hi
On Apr 4, 2009, at 1:46 PM, Michael Niedermayer wrote:

> On Sat, Apr 04, 2009 at 12:39:46PM -0700, Kenan Gillet wrote:
>>
>> On Apr 4, 2009, at 12:06 PM, Kenan Gillet wrote:
>>
>>> Hi
>>> On Apr 4, 2009, at 11:46 AM, Kenan Gillet wrote:
>>>>
>>>> On Apr 3, 2009, at 9:42 PM, Michael Niedermayer wrote:
>>>>
>>>>> On Tue, Mar 31, 2009 at 11:34:34PM -0700, Kenan Gillet wrote:
>>>
>>> [...]
>>>>>> +/**
>>>>>> + * adaptive predictor
>>>>>> + *
>>>>>> + * @note On x86 using the MULL macro in a loop is slower than  
>>>>>> not
>>>>>> using the macro.
>>>>>> + */
>>>>>> +static void do_adaptive_prediction(struct G722Band *band,  
>>>>>> const int
>>>>>> cur_diff)
>>>>>> +{
>>>>>> +    int sg[2], limit, i, cur_part_reconst;
>>>>>> +
>>>>>> +    band->qtzd_reconst_mem[1] = band->qtzd_reconst_mem[0];
>>>>>> +    band->qtzd_reconst_mem[0] = av_clip_int16((band- 
>>>>>> >s_predictor +
>>>>>> cur_diff) << 1);
>>>>>> +
>>>>>> +    cur_part_reconst = band->s_zero + cur_diff < 0;
>>>>>> +
>>>>>> +    sg[0] = sign_lookup[cur_part_reconst !=
>>>>>> band->part_reconst_mem[0]];
>>>>>> +    sg[1] = sign_lookup[cur_part_reconst ==
>>>>>> band->part_reconst_mem[1]];
>>>>>
>>>>> i dont see why a LUT should be used here, its not really more  
>>>>> readable
>>>>> and i doubt its faster.
>>>>
>>>> it is faster
>>>>
>>>> on Core 2 Duo 2Ghz , gcc 4.2.1
>>>> LUT based:
>>>> testing for 64kb 16KHz: [OK][ 1574 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 262130 runs, 14 skips ]
>>>> testing for 56kb 16KHz: [OK][ 1668 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 262110 runs, 34 skips ]
>>>> testing for 48kb 16KHz: [OK][ 1607 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 262112 runs, 32 skips ]
>>>> testing for 64Kb  8KHz: [OK][ 1584 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 131066 runs, 6 skips ]
>>>> testing encoding for 64Kb  16KHz: [OK][ 1558 dezicycles in
>>>> do_adaptive_prediction, 262108 runs, 36 skips ]
>>>>
>>>>
>>>> non-LUT:
>>>> testing for 64kb 16KHz: [OK][ 1686 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 262126 runs, 18 skips ]
>>>> testing for 56kb 16KHz: [OK][ 1719 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 262107 runs, 37 skips ]
>>>> testing for 48kb 16KHz: [OK][ 1689 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 262126 runs, 18 skips ]
>>>> testing for 64Kb  8KHz: [OK][ 1676 dezicycles in  
>>>> do_adaptive_prediction,
>>>> 131065 runs, 7 skips ]
>>>> testing encoding for 64Kb  16KHz: [OK][ 1673 dezicycles in
>>>> do_adaptive_prediction, 262116 runs, 28 skips ]
>>>>
>>>> i can revert to non-LUT if you prefer
>>>
>>> sorry for the noise, it though the comment was about using the sg  
>>> array
>>> not just the lookup table.
>>> remove the use of the lookup table locally
>>
>> I knew I should not have answered before benchmarking,
>> conter intuitive but the LUT is still faster
>
> what code did you use for the non LUT variant?
>
> 1 | - (a==b)

i used a==b? -1 : 1

but LUT is still faster

1 | - (a==b)
     sg[0] = 1 | -(cur_part_reconst == band->part_reconst_mem[0]);
     sg[1] = 1 | -(cur_part_reconst != band->part_reconst_mem[1]);

testing for 64kb 16KHz: [OK][ 1463 dezicycles in  
do_adaptive_prediction, 262126 runs, 18 skips ]
testing for 56kb 16KHz: [OK][ 1550 dezicycles in  
do_adaptive_prediction, 262116 runs, 28 skips ]
testing for 48kb 16KHz: [OK][ 1502 dezicycles in  
do_adaptive_prediction, 262111 runs, 33 skips ]
testing for 64Kb  8KHz: [OK][ 1474 dezicycles in  
do_adaptive_prediction, 131065 runs, 7 skips ]
testing encoding for 64Kb  16KHz: [OK][ 1447 dezicycles in  
do_adaptive_prediction, 262108 runs, 36 skips ]

LUT
     sg[0] = sign_lookup[cur_part_reconst != band->part_reconst_mem[0]];
     sg[1] = sign_lookup[cur_part_reconst == band->part_reconst_mem[1]];
testing for 64kb 16KHz: [OK][ 1443 dezicycles in  
do_adaptive_prediction, 262110 runs, 34 skips ]
testing for 56kb 16KHz: [OK][ 1519 dezicycles in  
do_adaptive_prediction, 262121 runs, 23 skips ]
testing for 48kb 16KHz: [OK][ 1477 dezicycles in  
do_adaptive_prediction, 262100 runs, 44 skips ]
testing for 64Kb  8KHz: [OK][ 1454 dezicycles in  
do_adaptive_prediction, 131058 runs, 14 skips ]
testing encoding for 64Kb  16KHz: [OK][ 1427 dezicycles in  
do_adaptive_prediction, 262123 runs, 21 skips ]

BTW thanks for your check sign trick, it allowed some nice  
simplifications and some speedup :)

Kenan