[FFmpeg-devel] [PATCH] AAC: unroll parts of decode_spectrum_and_dequant()

Måns Rullgård mans
Tue Dec 9 14:28:53 CET 2008


Michael Niedermayer wrote:
> On Mon, Dec 08, 2008 at 08:04:10PM -0800, Jason Garrett-Glaser wrote:
>> On Mon, Dec 8, 2008 at 7:58 PM, Jason Garrett-Glaser
>> <darkshikari at gmail.com> wrote:
>> > On Mon, Dec 8, 2008 at 7:34 PM, Alex Converse <alex.converse at gmail.com>
>> wrote:
>> >> On Mon, Dec 8, 2008 at 9:33 PM, Jason Garrett-Glaser
>> >> <darkshikari at gmail.com>wrote:
>> >>
>> >>> On Mon, Dec 8, 2008 at 3:43 PM, Alex Converse <alex.converse at gmail.com>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > The attached patch unrolling sections of decode spectrum saves me
>> 5.48%
>> >>> on
>> >>> > my mpeg4-lc-256kbps stream on my core2 duo.
>> >>> >
>> >>> > Regards,
>> >>> > Alex Converse
>> >>>
>> >>> If dim can only be 2 or 4, wouldn't it be better to do
>> >>>
>> >>> if( dim == 4 ) {
>> >>> do dim 4 stuff
>> >>> }
>> >>> do dim 2 stuff
>> >>>
>> >>> The switch seems unnecessary.
>> >>>
>> >>
>> >> Idiomatically I like the switch better but your way is faster. When I did
>> >> that I also tried reverting access back to forward order and got a slight
>> >> speed up. This way made the unsigned loop just like the other three, so I
>> >> added that one for another benchmarked verified speed up.
>> >>
>> >> The net gain is a 12% decrease in cycles over the original vs 5% before.
>> >
>> > if (vq_ptr[2]) coef[coef_tmp_idx + 2] = 1 - 2*(int)get_bits1(gb);
>> > if (vq_ptr[3]) coef[coef_tmp_idx + 3] = 1 - 2*(int)get_bits1(gb);
>> >
>> > Isn't that a rather unnecessary int -> float conversion?  I'd think
>> > you could do much better than that considering there are only two
>> > possible input values...
>> >
>> > Dark Shikari
>> >
>>
>> Simple proposal for the above:
>>
>> static const float lookup[2] = {1.0, -1.0};
>> if (vq_ptr[2]) coef[coef_tmp_idx + 2] = lookup[get_bits1(gb)];
>
>
> something like:
> if (vq_ptr[2]) ((uint32_t*)coef)[coef_tmp_idx + 2] = (get_bits1(gb)<<31) +
> 0x3F800000;
>
> might be even faster
> but i agree with robert that this should be a seperate patch

Strict aliasing violation.  Depending on CPU it might also be slower.
Most FPUs can generate +-1 constants efficiently.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list