[FFmpeg-devel] [PATCH] Optimization of original IFF codec

Michael Niedermayer michaelni
Fri Apr 23 18:33:08 CEST 2010


On Fri, Apr 23, 2010 at 12:26:37PM +0200, Sebastian Vater wrote:
> Arg, was too fast, sorry, please review this one instead.
> 
> Forgot to inline the plane decoder and the declare function prototype as
> const unsigned instead of signed...
> 
> Sebastian Vater a ?crit :
> > Hi again!
> >
> > Sebastian Vater a ?crit :
> >   
> >> I'll do a patch for this soon, but I want to mention though, that the
> >> code in IFF does fiddle with bit-planar modes, thus the data is expected
> >> to be calculated bit-wise. In that case I find it more readable to use
> >> << and >> instead of * and /.
> >>   
> >>     
> > So here is the patch, converting signed to unsigned where appreciated
> > for optimization.

amongth all these optimizations, i am wondering how much faster things become
does that inline speed the code up?
does the changing to unsigned?
you can test easily by using the START/STOP_TIMER makros


[...]

> @@ -95,12 +96,16 @@
>   * @param plane plane number to decode as
>   */
>  #define DECLARE_DECODEPLANE(suffix, type) \
> -static void decodeplane##suffix(void *dst, const uint8_t *const buf, int buf_size, int bps, int plane) \
> +static inline void decodeplane##suffix(void *dst, \
> +                                       const uint8_t *const buf, \
> +                                       const unsigned int buf_size, \
> +                                       const unsigned int bps, \
> +                                       const unsigned int plane) \
>  { \
>      GetBitContext gb; \
> -    int i, b; \
> +    unsigned int i, b; \
>      init_get_bits(&gb, buf, buf_size * 8); \
> -    for(i = 0; i < (buf_size * 8 + bps - 1) / bps; i++) { \
> +    for(i = 0; i < ((buf_size << 3) + bps - 1) / bps; i++) { \

(buf_size * 8 + bps - 1) / bps
could be done outside the loop


>          for (b = 0; b < bps; b++) { \
>              ((type *)dst)[ i*bps + b ] |= get_bits1(&gb) << plane; \
>          } \

and the 2 loops look like they could be done as one loop

that loop then can be unrolled by a factor of 4 and its inside for the
uint8_t type case be implemented like:
    v= lut[get_bits(&gb, 4)];
    AV_WN32A(dst+b, AV_RN32A(dst+b) | v);

(dst being a uint8_t pointer here, void pointers suck as one cannot add
 to them)


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100423/c6de86ce/attachment.pgp>



More information about the ffmpeg-devel mailing list