[FFmpeg-devel] [PATCH] g722 decoder, no licensing fame
Kenan Gillet
kenan.gillet
Sat Apr 4 22:59:58 CEST 2009
Hi
On Apr 4, 2009, at 1:46 PM, Michael Niedermayer wrote:
> On Sat, Apr 04, 2009 at 12:39:46PM -0700, Kenan Gillet wrote:
>>
>> On Apr 4, 2009, at 12:06 PM, Kenan Gillet wrote:
>>
>>> Hi
>>> On Apr 4, 2009, at 11:46 AM, Kenan Gillet wrote:
>>>>
>>>> On Apr 3, 2009, at 9:42 PM, Michael Niedermayer wrote:
>>>>
>>>>> On Tue, Mar 31, 2009 at 11:34:34PM -0700, Kenan Gillet wrote:
>>>
>>> [...]
>>>>>> +/**
>>>>>> + * adaptive predictor
>>>>>> + *
>>>>>> + * @note On x86 using the MULL macro in a loop is slower than
>>>>>> not
>>>>>> using the macro.
>>>>>> + */
>>>>>> +static void do_adaptive_prediction(struct G722Band *band,
>>>>>> const int
>>>>>> cur_diff)
>>>>>> +{
>>>>>> + int sg[2], limit, i, cur_part_reconst;
>>>>>> +
>>>>>> + band->qtzd_reconst_mem[1] = band->qtzd_reconst_mem[0];
>>>>>> + band->qtzd_reconst_mem[0] = av_clip_int16((band-
>>>>>> >s_predictor +
>>>>>> cur_diff) << 1);
>>>>>> +
>>>>>> + cur_part_reconst = band->s_zero + cur_diff < 0;
>>>>>> +
>>>>>> + sg[0] = sign_lookup[cur_part_reconst !=
>>>>>> band->part_reconst_mem[0]];
>>>>>> + sg[1] = sign_lookup[cur_part_reconst ==
>>>>>> band->part_reconst_mem[1]];
>>>>>
>>>>> i dont see why a LUT should be used here, its not really more
>>>>> readable
>>>>> and i doubt its faster.
>>>>
>>>> it is faster
>>>>
>>>> on Core 2 Duo 2Ghz , gcc 4.2.1
>>>> LUT based:
>>>> testing for 64kb 16KHz: [OK][ 1574 dezicycles in
>>>> do_adaptive_prediction,
>>>> 262130 runs, 14 skips ]
>>>> testing for 56kb 16KHz: [OK][ 1668 dezicycles in
>>>> do_adaptive_prediction,
>>>> 262110 runs, 34 skips ]
>>>> testing for 48kb 16KHz: [OK][ 1607 dezicycles in
>>>> do_adaptive_prediction,
>>>> 262112 runs, 32 skips ]
>>>> testing for 64Kb 8KHz: [OK][ 1584 dezicycles in
>>>> do_adaptive_prediction,
>>>> 131066 runs, 6 skips ]
>>>> testing encoding for 64Kb 16KHz: [OK][ 1558 dezicycles in
>>>> do_adaptive_prediction, 262108 runs, 36 skips ]
>>>>
>>>>
>>>> non-LUT:
>>>> testing for 64kb 16KHz: [OK][ 1686 dezicycles in
>>>> do_adaptive_prediction,
>>>> 262126 runs, 18 skips ]
>>>> testing for 56kb 16KHz: [OK][ 1719 dezicycles in
>>>> do_adaptive_prediction,
>>>> 262107 runs, 37 skips ]
>>>> testing for 48kb 16KHz: [OK][ 1689 dezicycles in
>>>> do_adaptive_prediction,
>>>> 262126 runs, 18 skips ]
>>>> testing for 64Kb 8KHz: [OK][ 1676 dezicycles in
>>>> do_adaptive_prediction,
>>>> 131065 runs, 7 skips ]
>>>> testing encoding for 64Kb 16KHz: [OK][ 1673 dezicycles in
>>>> do_adaptive_prediction, 262116 runs, 28 skips ]
>>>>
>>>> i can revert to non-LUT if you prefer
>>>
>>> sorry for the noise, it though the comment was about using the sg
>>> array
>>> not just the lookup table.
>>> remove the use of the lookup table locally
>>
>> I knew I should not have answered before benchmarking,
>> conter intuitive but the LUT is still faster
>
> what code did you use for the non LUT variant?
>
> 1 | - (a==b)
i used a==b? -1 : 1
but LUT is still faster
1 | - (a==b)
sg[0] = 1 | -(cur_part_reconst == band->part_reconst_mem[0]);
sg[1] = 1 | -(cur_part_reconst != band->part_reconst_mem[1]);
testing for 64kb 16KHz: [OK][ 1463 dezicycles in
do_adaptive_prediction, 262126 runs, 18 skips ]
testing for 56kb 16KHz: [OK][ 1550 dezicycles in
do_adaptive_prediction, 262116 runs, 28 skips ]
testing for 48kb 16KHz: [OK][ 1502 dezicycles in
do_adaptive_prediction, 262111 runs, 33 skips ]
testing for 64Kb 8KHz: [OK][ 1474 dezicycles in
do_adaptive_prediction, 131065 runs, 7 skips ]
testing encoding for 64Kb 16KHz: [OK][ 1447 dezicycles in
do_adaptive_prediction, 262108 runs, 36 skips ]
LUT
sg[0] = sign_lookup[cur_part_reconst != band->part_reconst_mem[0]];
sg[1] = sign_lookup[cur_part_reconst == band->part_reconst_mem[1]];
testing for 64kb 16KHz: [OK][ 1443 dezicycles in
do_adaptive_prediction, 262110 runs, 34 skips ]
testing for 56kb 16KHz: [OK][ 1519 dezicycles in
do_adaptive_prediction, 262121 runs, 23 skips ]
testing for 48kb 16KHz: [OK][ 1477 dezicycles in
do_adaptive_prediction, 262100 runs, 44 skips ]
testing for 64Kb 8KHz: [OK][ 1454 dezicycles in
do_adaptive_prediction, 131058 runs, 14 skips ]
testing encoding for 64Kb 16KHz: [OK][ 1427 dezicycles in
do_adaptive_prediction, 262123 runs, 21 skips ]
BTW thanks for your check sign trick, it allowed some nice
simplifications and some speedup :)
Kenan
More information about the ffmpeg-devel
mailing list