[FFmpeg-devel] [PATCH] h264 CAVLC coeff_token decoder based on CLZ

Sun Jan 24 03:09:13 CET 2010

On Sat, Jan 23, 2010 at 6:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Jan 23, 2010 at 04:15:28PM -0800, Jason Garrett-Glaser wrote:
>> On Sat, Jan 23, 2010 at 11:03 AM, Pascal Massimino
>> <pascal.massimino at gmail.com> wrote:
>> > On Sat, Jan 23, 2010 at 10:18 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
>> >
>> >> On Sat, Jan 23, 2010 at 03:28:53AM +0300, Anatoliy Nenashev wrote:
>> >> > Hi all!
>> >> > I have made some investigations in H264 CAVLC coeff_token decoder.
>> >> > In attached patch you can see special implementation of VLC decoder for
>> >> > coeff_token which is based on CLZ (count leading zeros).
>> >> > This method reduce size of VLC decoding tables for coeff_token from
>> >> > (520+332+280+256)*2 = 2776 byte to (2*4*16 + 64 + 67 + 63 + 63) = 385
>> >> byte.
>> >>
>> >
>> > FWIW: these table are not called that often,
>>
>> ~8-24 times per MB isn't that often?
>
> at least 12 times per MB isnt often because the loop filter SSE2 code is
> writen in yasm and thus cant be inlined forcing us to do 12 calls per MB
> ;)

GCC's stupidity costs far more clocks than any function call.  If you
don't like function call overhead on x86_32, patches are welcome to
add fastcall support to the yasm macros.

Furthermore, I suspect you'd get more benefit from inlining
filter_mb_edgev and friends and *NOT* inlining the deblock code than
doing the reverse.

Dark Shikari