[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC
rscheidegger_lists at hispeed.ch
Tue Apr 17 18:14:19 CEST 2012
Am 14.04.2012 18:09, schrieb Reimar Döffinger:
> On Sat, Apr 14, 2012 at 03:44:35AM +0200, Roland Scheidegger wrote:
>> I ran some numbers (with ffmpeg -benchmark so not only cabac) and the
>> assembly get_cabac improves performance in some test video by roughly
>> 3%. Enabling the assembly decode_significance function improves
>> performance by another 2.5% or so, overall I get just about a bit more
>> than a 5% increase. Dunno if I should be disappointed by that or not but
>> it's better than nothing...
> Honestly? 5% by just optimizing a few functions (particularly for one
> of the codecs that already is among the better optimized ones) is a
> _huge_ improvement.
> I've worked on projects where I was happy to get 0.5% to 1% after a
> few hours work...
Well that was on a K8. I've also done some quick tests on a Core2 class
chip and there the improvement is much less (less than half). In fact
one of the inline asm functions (get_cabac_bypass_sign_x86) seems to be
slower there actually if anything (the numbers are a bit too close to
tell without running more tests), which is a bit strange. I also wrote
the last missing basic cabac inline asm function (get_cabac_bypass_x86)
and that may be a tiny bit faster but too close to tell again (on a
Core2). Granted these functions don't take that much time anyway.
Maybe it's the cmov which makes the difference. All AMD chips have very
fast cmov (same latency and throughput as any other simple alu or mov
op) whereas it's worse on intel (throughput might not matter much but
typically latency 2, except on P4 where Prescott has latency of 9.5 and
Northwood isn't much better...).
More information about the ffmpeg-devel