[FFmpeg-devel] [PATCH] IFF: Add error checking to byterun1 decoder

Wed May 26 15:21:03 CEST 2010

Ronald S. Bultje a ?crit :
> Hi,
>
> On Sun, May 23, 2010 at 11:31 AM, Sebastian Vater
> <cdgs.basty at googlemail.com> wrote:
>   
>> Fixed. For preventing the function become non-inlined then I have added
>> av_always_inline to it. I also pass AVCodecContext
>> to it.
>>     
>
> The inling should be in a separate patch.
>   

Fixed by removing it completely!

Doing some benchmarking resulted in "not noticeable" average speed
change at all, but since the av_always_inline makes code larger, so I
decided to remove it.

Benchmark results with without av_always_inline (ffplay Ooze.iff):
33600 dezicycles in decode_byterun, 1 runs, 0 skips
23900 dezicycles in decode_byterun, 2 runs, 0 skips
15900 dezicycles in decode_byterun, 4 runs, 0 skips
9910 dezicycles in decode_byterun, 8 runs, 0 skips
9577 dezicycles in decode_byterun, 16 runs, 0 skips
8592 dezicycles in decode_byterun, 32 runs, 0 skips
8551 dezicycles in decode_byterun, 64 runs, 0 skips
8100 dezicycles in decode_byterun, 128 runs, 0 skips
8117 dezicycles in decode_byterun, 256 runs, 0 skips
8081 dezicycles in decode_byterun, 512 runs, 0 skips
8289 dezicycles in decode_byterun, 1024 runs, 0 skips
9601 dezicycles in decode_byterun, 2048 runs, 0 skips
10337 dezicycles in decode_byterun, 4095 runs, 1 skips
10001 dezicycles in decode_byterun, 8191 runs, 1 skips

Benchmark results with av_always_inline (ffplay Ooze.iff):
102800 dezicycles in decode_byterun, 1 runs, 0 skips
58300 dezicycles in decode_byterun, 2 runs, 0 skips
33430 dezicycles in decode_byterun, 4 runs, 0 skips
18765 dezicycles in decode_byterun, 8 runs, 0 skips
14060 dezicycles in decode_byterun, 16 runs, 0 skips
10840 dezicycles in decode_byterun, 32 runs, 0 skips
9713 dezicycles in decode_byterun, 64 runs, 0 skips
8715 dezicycles in decode_byterun, 128 runs, 0 skips
8488 dezicycles in decode_byterun, 255 runs, 1 skips
8315 dezicycles in decode_byterun, 511 runs, 1 skips
8395 dezicycles in decode_byterun, 1022 runs, 2 skips
9807 dezicycles in decode_byterun, 2046 runs, 2 skips
10508 dezicycles in decode_byterun, 4093 runs, 3 skips
10154 dezicycles in decode_byterun, 8188 runs, 4 skips

>   
>> @@ -226,27 +226,43 @@ static void decodeplane32(uint32_t *dst, const uint8_t *buf, int buf_size, int p
>>          const int8_t value = *buf++;
>>          if (value >= 0) {
>> -            length = value + 1;
>> -            memcpy(dst + x, buf, FFMIN3(length, dst_size - x, buf_end - buf));
>> +            const int length = (unsigned) value + 1;
>> +            if (length > dst_size || length > (int) (buf_end - buf)) { // overflow?
>> +                av_log(avctx, AV_LOG_ERROR, "IFF byterun1 stream buffer overflow\n");
>> +                return AVERROR_INVALIDDATA;
>> +            }
>> +            memcpy(dst, buf, length);
>> +            dst_size -= length;
>> +            dst += length;
>>              buf += length;
>>     
> [..]
>   
>>          } else if (value > -128) {
>> -            length = -value + 1;
>> -            memset(dst + x, *buf++, FFMIN(length, dst_size - x));
>> -        } else { // noop
>> -            continue;
>> +            const int length = (unsigned) -value + 1;
>> +            if (length > dst_size || buf >= buf_end) { // overflow?
>> +                av_log(avctx, AV_LOG_ERROR, "IFF byterun1 stream buffer overflow\n");
>> +                return AVERROR_INVALIDDATA;
>> +            }
>> +            memset(dst, *buf++, length);
>> +            dst_size -= length;
>> +            dst += length;
>> +        } else if (buf >= buf_end) { // noop, return error on overflow, though
>> +            av_log(avctx, AV_LOG_ERROR, "IFF byterun1 stream buffer overflow\n");
>> +            return AVERROR_INVALIDDATA;
>>          }
>> -        x += length;
>> -    }
>> +    } while (dst_size > 0);
>>      return buf - buf_start;
>>  }
>>     
>
> IMO this does a lot more than just adding error checking? Is this
> faster? Or just different?
>   

It is faster, but that's because of error checking which makes the FFMIN
stuff and the for body-checking completely useless.

I first thought it was faster because of the do { ... } while but I was
testing that the last half hour and the reason is that the error
checking is done before the FFMIN stuff so gcc does a small but
different enough register allocation which increases the speed.

Running the same code without error checking didn't cause a measurable
change (with do ... while and FFMIN instead of the if's).

I guess the reason is that FFMIN/FFMIN3 are macros and these evaluate
the dst_size - x, buf_end - buf twice (triple for FFMIN3), which the
normal if error-checker doesn't.

So the speedup is directly related to adding error checking.

Wow! Never thought that adding normal error checking can actually make
code run faster instead of slower.

BTW, with the last patch I also forgot to add doxygen new avctx
parameter, which I fixed with this patch, too.

-- 

Best regards,
                   :-) Basty/CDGS (-:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: iff-byterun1-error.patch
Type: text/x-diff
Size: 4483 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100526/300dce33/attachment.patch>