[Ffmpeg-devel] [PATCH] H264 cabac vlc reading code

Uoti Urpala uoti.urpala
Mon Oct 16 02:19:09 CEST 2006

On Mon, 2006-10-16 at 01:19 +0200, Michael Niedermayer wrote:
> > What's more remarkable is that in my test with the latest version GCC
> > generated equally fast CABAC code from pure C with this patch.
> that part is strange, have you tried to let gcc put range and low into
> registers? maybe gcc reuses the low & range variables in registers in the
> c code and that of course doesnt work with the current asm ...

Haven't tried any additional benchmarks yet, I thought about testing
that when I changed the code but then forgot about it.

> this will break on gcc 2.95 and its unneeded, use the %1234 style syntax
> if you disslike the numbers due to readability, use a macro
> %[range] vs. %"RANGE" both are equally readable

I don't want to waste effort on gcc 2.95 support. Even if it's equally
readable it's clumsier to create and change the macros.

> > -        if( get_cabac( &h->cabac, &h->cabac_state[85 + get_cabac_cbf_ctx( h, cat, n ) ] ) == 0 ) {
> > +        if (get_cabac_special(&cabac, &h->cabac_state[85 + get_cabac_cbf_ctx( h, cat, n ) ] ) == 0 ) {
> this changes a static to a always_inline static function, which is a change
> which should be benchmarked independantly of the rest

The uses in decode_cabac_residual were already inlined at -O4, so this
shouldn't affect benchmarks. I haven't tested the effect at -O2 (I would
expect it to be positive since START/STOP_TIMER for cabac was noticeably
better at -O4 even though overall performance for h264 wasn't).

> tabs are forbidden in ffmpeg svn, and besides mixing tabs and spaces for
> indention is a bad idea independant of the space vs. ta question

I didn't try to polish the patch to be directly committable, that
version was mainly intended for testing. Even if committed at least some
more things related to it should be checked before leaving the code:
- Whether it's worth it to have that asm version at all since the
changes make C equally fast at least on my machine
- Whether the original asm version with pointer for CABACContext should
be left for the other uses of get_cabac() outside
decode_cabac_residual(), where the function is not inlined with the
CABACContext variables on stack (in which case gcc seemed to generate
worse code)

> to the change itself, id like to understand why its not faster anymore and why
> the c code ends up equally fast as the asm code before this gets commited, ill
> look at the generated code if noone else is faster ...

I think it's not too surprising that gcc can generate reasonably good
code with stack variables, after all it's mostly straightforward
arithmetic. I haven't tried checking the generated code though.

More information about the ffmpeg-devel mailing list