[FFmpeg-devel] [PATCH] h264 CAVLC coeff_token decoder based on CLZ

Sun Jan 24 12:56:04 CET 2010

On Sun, Jan 24, 2010 at 3:24 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sun, Jan 24, 2010 at 03:24:52AM +0100, Michael Niedermayer wrote:
>> On Sat, Jan 23, 2010 at 06:09:13PM -0800, Jason Garrett-Glaser wrote:
>> > On Sat, Jan 23, 2010 at 6:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > > On Sat, Jan 23, 2010 at 04:15:28PM -0800, Jason Garrett-Glaser wrote:
>> > >> On Sat, Jan 23, 2010 at 11:03 AM, Pascal Massimino
>> > >> <pascal.massimino at gmail.com> wrote:
>> > >> > On Sat, Jan 23, 2010 at 10:18 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
>> > >> >
>> > >> >> On Sat, Jan 23, 2010 at 03:28:53AM +0300, Anatoliy Nenashev wrote:
>> > >> >> > Hi all!
>> > >> >> > I have made some investigations in H264 CAVLC coeff_token decoder.
>> > >> >> > In attached patch you can see special implementation of VLC decoder for
>> > >> >> > coeff_token which is based on CLZ (count leading zeros).
>> > >> >> > This method reduce size of VLC decoding tables for coeff_token from
>> > >> >> > (520+332+280+256)*2 = 2776 byte to (2*4*16 + 64 + 67 + 63 + 63) = 385
>> > >> >> byte.
>> > >> >>
>> > >> >
>> > >> > FWIW: these table are not called that often,
>> > >>
>> > >> ~8-24 times per MB isn't that often?
>> > >
>> > > at least 12 times per MB isnt often because the loop filter SSE2 code is
>> > > writen in yasm and thus cant be inlined forcing us to do 12 calls per MB
>> > > ;)
>> >
>> > GCC's stupidity costs far more clocks than any function call. ?If you
>> > don't like function call overhead on x86_32, patches are welcome to
>> > add fastcall support to the yasm macros.
>>
>> i dont like it on x86_64 either
>>
>>
>> >
>> > Furthermore, I suspect you'd get more benefit from inlining
>> > filter_mb_edgev and friends and *NOT* inlining the deblock code than
>> > doing the reverse.
>>
>> i expected this as well, and i tried it at least twice already but gcc ...
>> I also tried to split the slow loop filter path in intra/inter and tried
>> to unroll the loop for the first iteration to be handled seperately and
>> and and.
>> gcc doesnt seem to like me and my code
>
> heres an example:
> if (alpha&&beta){
> from my inlined code is turned into:
> ? ? ? ?testl ? %r15d, %r15d
> ? ? ? ?setne ? %r9b
> ? ? ? ?testl ? %r13d, %r13d
> ? ? ? ?setne ? %r14b
> ? ? ? ?testb ? %r9b, %r9b
> ? ? ? ?je ? ? ?.L260
> ? ? ? ?testb ? %r14b, %r14b
> ? ? ? ?je ? ? ?.L260
>
> why is gcc doing this?

Use if(alpha&beta), problem solved.

Dark Shikari