[FFmpeg-devel] [PATCH] swscale: add unscaled copy from yuv420p10 to p010

Timo Rothenpieler timo at rothenpieler.org
Fri Sep 2 12:55:58 EEST 2016


>>>
>>> …or is that really old-school and a modern compiler does all that when optimising?
>>>
>>> Or is readability considered more important than marginal gains in performance?
>>>
>>> Oliver (time travelling from the 1980s)
>>
>> You would still have to add the remaining stride.
>> The linesize is usually larger than the width, so each line is properly
>> aligned.
>>
>> So with your code, you'd still need something like
>>
>> dstUV += dstStride[1] / 2 - 2 * x;
>> src[2] += srcStride[1] / 2 - x;
>> src[2] += srcStride[1] / 2 - x;
>>
>> after it.
> 
> No, the lines after it remain unchanged - only the temporary variables are looping along the x.
> 
> src[1] += srcStride[1] / 2;
> src[2] += srcStride[2] / 2;
> dstUV += dstStride[1] / 2;


It is indeed very slightly faster.

Old:
[bench @ 0x2cbfb20] t:0.006181 avg:0.006270 max:0.013702 min:0.006080
New:
[bench @ 0x33bcb20] t:0.006195 avg:0.006225 max:0.013718 min:0.006060

It seems to be 0.5ms faster on average.


More information about the ffmpeg-devel mailing list