[FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

John Cox jc at kynesim.co.uk
Fri Jan 22 14:29:29 CET 2016


On Fri, 22 Jan 2016 12:18:29 +0100, you wrote:

>Hi,
>
>2016-01-20 15:27 GMT+01:00 John Cox <jc at kynesim.co.uk>:
>> The by22 code gained me an overall factor of two in the abs level decode
>> - the gains do depend a lot on the quantity of residual - you gain a lot
>> more on I-frames than you do otherwise as they tend to have much longer
>> residuals.  The higher the bitrate the more useful this code is.  But as
>> you note it didn't use vast amounts of time relative to everything else
>> anyway.
>>
>> The reworking / simplification of the loop(s) around the abs level
>> decode and the scaling gave me the biggest single improvement.
>
>The thing is, it provided no gain on no Win64 system I had at hand. Or
>very minor, once I switched off things. The amount of new/changed code
>would make it worth discussing, were it not for actual gains on arm.

I think on ARM that things fitted with its register limit more often -
either way it was useful.  Much of the simplificatin work was structural
so it was possible for me to extract simple functions to code in asm.

>> After that the reworking of get_sig_ceoff_flag_idxs was a useful gain
>
>Yes, this is the most agreeable part of the non-applied parts.
>
>> Special caseing the single coeff path gave a similar gain
>
>This is a big slowdown on Win64 and UHD-bluray like sequences, but
>that can be switched off in that case.

I'm a bit surprised that it generated a big slowdown - some cache must
be running just on the edge, but yes if you normally have hi-bitrate
stuff then it isn't wanted.  On my test streams the bitrates were
normally quite low - quite unlike what I would expect from blu-ray
sequences.

Default it to off on x86 but on on ARM?

>> After that the scale rework - now probably 75% faster than it was
>> previously but it wasn't taking a huge amount of time.
>
>The work is done, I don't mind.
>
>> And after that all the other bits - my experience with optimising this
>> sort of code (I did a lot of work on a TI H.264 implementation in the
>> past) is that no single change is going to do everything, you just have
>> to polish everything until it goes fast enough.
>
>Sure. There may be positive interactions, but my own figures showed
>the sigmap/greater than flags were the only ones worth optimizing on
>Win64.

Very plausibly

>> Sorry - I don't quite understand what you've said here.
>
>Doesn't matter anymore, I think I have just laid out the parts
>actually mattering, and for haswell/Win64 (ie x86_64).

I think you've cleared up my misunderstanding in the expanded comments
above.

>I'll reply more in depth to the new patchset, but not until you're on
>holidays. Which should leave me more time for reviewing it, so all the
>better.

Good oh.

JC


More information about the ffmpeg-devel mailing list