[FFmpeg-devel] [PATCH] IFF: Change decodeplane8 while to do ... while for getting 1% speedup

Sebastian Vater cdgs.basty
Mon May 10 00:50:56 CEST 2010


M?ns Rullg?rd a ?crit :
> Sebastian Vater <cdgs.basty at googlemail.com> writes:
>
>   
>> Benchmarking it resulted in a 1% speedup using MRLake.iff.
>>     
>
> Just a note on benchmarking.  Any perceived speedup of less than 5% or
> so must be carefully checked by statistical methods.  Run-to-run
> variance is often of that magnitude, especially if there is anything
> else running on the system.
>   

Since it's averaged for each call (which is around 4096 times for
MRLake.iff), that issue shouldn't be a problem here.
After all I get differences +/- 10 dezicycles each run...I think it's
accurate enough.

BTW, for those interested, here is the difference between while { ... }
and do { ... } while):

   1.
      START_TIMER;
   2.
          const uint64_t *lut = plane8_lut[plane];
   3.
          while (buf_size--) {
   4.
              const uint64_t v = AV_RN64A(dst) | lut[*buf++];
   5.
              AV_WN64A(dst, v);
   6.
              dst += 8;
   7.
          }
   8.
       
   9.
          STOP_TIMER("decodeplane8");
  10.
           9d0:       0f b6 75 00             movzbl 0x0(%ebp),%esi
  11.
           9d4:       83 c5 01                add    $0x1,%ebp
  12.
           9d7:       8b 4c 24 54             mov    0x54(%esp),%ecx
  13.
           9db:       8b 5f 04                mov    0x4(%edi),%ebx
  14.
           9de:       8b 07                   mov    (%edi),%eax
  15.
           9e0:       8b 54 f1 04             mov    0x4(%ecx,%esi,8),%edx
  16.
           9e4:       0b 04 f1                or     (%ecx,%esi,8),%eax
  17.
           9e7:       09 da                   or     %ebx,%edx
  18.
           9e9:       89 07                   mov    %eax,(%edi)
  19.
           9eb:       89 57 04                mov    %edx,0x4(%edi)
  20.
           9ee:       83 c7 08                add    $0x8,%edi
  21.
           9f1:       3b 6c 24 2c             cmp    0x2c(%esp),%ebp
  22.
           9f5:       75 d9                   jne    9d0
      <decode_frame_ilbm+0x580>
  23.
       
  24.
          START_TIMER;
  25.
          const uint64_t *lut = plane8_lut[plane];
  26.
          do {
  27.
              const uint64_t v = AV_RN64A(dst) | lut[*buf++];
  28.
              AV_WN64A(dst, v);
  29.
              dst += 8;
  30.
          } while (--buf_size);
  31.
          STOP_TIMER("decodeplane8");
  32.
       
  33.
           9b0:       0f b6 75 00             movzbl 0x0(%ebp),%esi
  34.
           9b4:       83 c5 01                add    $0x1,%ebp
  35.
           9b7:       8b 4c 24 44             mov    0x44(%esp),%ecx
  36.
           9bb:       8b 07                   mov    (%edi),%eax
  37.
           9bd:       8b 57 04                mov    0x4(%edi),%edx
  38.
           9c0:       0b 04 f1                or     (%ecx,%esi,8),%eax
  39.
           9c3:       0b 54 f1 04             or     0x4(%ecx,%esi,8),%edx
  40.
           9c7:       89 07                   mov    %eax,(%edi)
  41.
           9c9:       89 57 04                mov    %edx,0x4(%edi)
  42.
           9cc:       83 c7 08                add    $0x8,%edi
  43.
           9cf:       83 eb 01                sub    $0x1,%ebx
  44.
           9d2:       75 dc                   jne    9b0
      <decode_frame_ilbm+0x560>

-- 

Best regards,
                   :-) Basty/CDGS (-:




More information about the ffmpeg-devel mailing list