[FFmpeg-devel] [PATCH 3/3] avcodec/scpr: optimize shift loop.

Sat Sep 9 01:15:43 EEST 2017

On 9/8/2017 6:47 PM, Kieran Kunhya wrote:
> On Fri, 8 Sep 2017 at 22:29 Michael Niedermayer <michael at niedermayer.cc>
> wrote:
> 
>> Speeds code up from 50sec to 15sec
>>
>> Fixes Timeout
>> Fixes: 3242/clusterfuzz-testcase-5811951672229888
>>
>> Found-by: continuous fuzzing process
>> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>> Signed-off-by
>> <https://github.com/google/oss-fuzz/tree/master/projects/ffmpegSigned-off-by>:
>> Michael Niedermayer <michael at niedermayer.cc>
>> ---
>>  libavcodec/scpr.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/scpr.c b/libavcodec/scpr.c
>> index 37fbe7a106..2ef63a7bf8 100644
>> --- a/libavcodec/scpr.c
>> +++ b/libavcodec/scpr.c
>> @@ -827,7 +827,16 @@ static int decode_frame(AVCodecContext *avctx, void
>> *data, int *got_frame,
>>              return ret;
>>
>>          for (y = 0; y < avctx->height; y++) {
>> -            for (x = 0; x < avctx->width * 4; x++) {
>> +            if (!(((uintptr_t)dst) & 7)) {
>> +                uint64_t *dst64 = (uint64_t *)dst;
>> +                int w = avctx->width>>1;
>> +                for (x = 0; x < w; x++) {
>> +                    dst64[x] = (dst64[x] << 3) & 0xFCFCFCFCFCFCFCFCULL;
>> +                }
>> +                x *= 8;
>> +            } else
>> +                x = 0;
>> +            for (; x < avctx->width * 4; x++) {
>>                  dst[x] = dst[x] << 3;
>>              }
>>              dst += frame->linesize[0];
>> --
>> 2.14.1
>>
> 
> This is as clear as mud.

It reads eight bytes at a time if the buffer is sufficiently aligned,
then finishes reading the remaining bytes one at a time.
If the buffer is unaligned, it reads everything one byte at a time like
it used to.

See ff_h2645_extract_rbsp() and add_bytes_c() for another example of
this optimization.