[FFmpeg-devel] [PATCH] h264: assembly version of get_cabac for x86_64 with PIC
rscheidegger_lists at hispeed.ch
Sat Apr 14 03:44:35 CEST 2012
Am 14.04.2012 01:50, schrieb Roland Scheidegger:
> Am 13.04.2012 20:24, schrieb Roland Scheidegger:
>> Am 13.04.2012 19:34, schrieb Reimar Döffinger:
>>> On Fri, Apr 13, 2012 at 04:34:34PM +0200, Roland Scheidegger wrote:
>>>> There are similar functions which could get the same treatment but they
>>>> are less frequently used and since this isn't very nice as we can't use the
>>>> same assembly template focus on this function alone for now.
>>> Huh? The only other ones seem to be decode_significance as far as I can
>>> tell, and the only reason seems to be that they use
>>> Have you tried making them use your BRANCHLESS_GET_CABAC macro?
>>> Code is in libavcodec/x86/h264_i386.h.
>> Yes those are the ones I was talking about. decode_significance_8x8_x86
>> (but not decode_significance_x86) also uses MANGLE itself so needs a
>> change there. Other than that, the macro arguments are quite different
>> hence these need some more ifdefs to make it work.
>> If it's worth it I can give it a try - I just didn't see them in a
>> profile because they are part of decode_cabac_residual_internal (which
>> itself does show up quite prominently).
>>> since the macros are used multiple times the label might cause issues.
>>> So far that reason and
>>> for better readability it might be better to require the surrounding
>>> code to have such a label instead of having it in the macro?
>> I don't think it should cause problems, the code already had a label
>> before. Maybe if moving the label outside the lea should be moved out of
>> the macro too (and instead the reg holding the offset passed in) which
>> would save the decode_significance functions using BRANCHLESS_GET_CABAC
>> twice from performing this lea twice. I don't think that helps
>> readability, though...
> So something like the attached patch maybe?
> I think though the use of labels in different asm blocks is fishy - the
> only reason it is in a different block is because otherwise I want it to
> be an output argument, which means all input arguments get renumbered
> and hence would need completely separate significance functions instead
> of the same ones with some ifdefs (I actually tried to use c to get the
> rip-relative offset but failed pretty miserably trying to pass the label
> to the inline assembly - the offset itself isn't a problem).
> In any case, I'm not sure of the performance difference yet. I can say
> though now these functions indeed show up very high on the list with
> profiling so I guess at least potentially it could help.
I ran some numbers (with ffmpeg -benchmark so not only cabac) and the
assembly get_cabac improves performance in some test video by roughly
3%. Enabling the assembly decode_significance function improves
performance by another 2.5% or so, overall I get just about a bit more
than a 5% increase. Dunno if I should be disappointed by that or not but
it's better than nothing...
(Even with benchmark mode cpu load never got above 75% or so, I guess
that's some issue with multithreaded decoder.)
More information about the ffmpeg-devel