[FFmpeg-devel] [PATCH/RFC] ALT_BITSTREAMREADER optimizations for the platforms with strict alignment

Måns Rullgård mans
Mon Dec 8 13:03:01 CET 2008


Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> Hello,
>
> The attached patch improves performance for ALT_BITSTREAMREADER on
> the platforms with strictly aligned or slow unaligned memory
> accesses. Current implementation results in the generation of 4
> memory accesses reading data byte after byte for such platforms,
> which is a bit excessive (only reads for 3 bytes are needed to
> ensure availability of 17 bits in bitstreamreader cache). Legacy ARM
> cores (which do not support unaligned memory accesses) can benefit
> from it.
>
> Unfortunately FFmpeg contains some code which violates MIN_CACHE_BITS == 17
> requirement. One of such examples is 'svq1_dec.c'. Patch to fix it
> is also attached, but it probably makes sense to add a new function
> which would accept 1-25 range ('get_bits_medium'?) and not to lose
> performance on the platforms where ALT_BITSTREAM_READER actually can
> provide 25 bits. Some tweaks in this area are possible.
>
> Regression test still fails, so there are some other bugs to find and fix.
>
> -- 
> Best regards,
> Siarhei Siamashka
>
> Index: libavcodec/bitstream.h
> ===================================================================
> --- libavcodec/bitstream.h	(revision 16028)
> +++ libavcodec/bitstream.h	(working copy)
> @@ -424,9 +424,29 @@
>  #   define SKIP_CACHE(name, gb, num)\
>          name##_cache >>= (num);
>  # else
> +
> +#if !ENABLE_FAST_UNALIGNED

Please indent preprocessor directives similarly to surrounding code.
Also, we usually use the HAVE_/CONFIG_ macros in preprocessor tests.

> +#   undef MIN_CACHE_BITS
> +#   define MIN_CACHE_BITS 17
>  #   define UPDATE_CACHE(name, gb)\
> +    {\

Please use do { } while(0) for multi-statement macros.

> +        const uint8_t *bufptr = ((const uint8_t *)(gb)->buffer)+(name##_index>>3);\
> +        int shiftcount = name##_index & 7;\
> +        uint8_t b1 = bufptr[1];\
> +        uint8_t b0 = bufptr[2];\
> +        uint8_t b2 = bufptr[0];\

Those variables are very confusingly named...

> +        shiftcount += 8;\
> +        name##_cache = b0 | (b1 << 8);\
> +        name##_cache |= (b2 << 16);\

... and it looks like AV_RB24().

> +        name##_cache <<= shiftcount;\
> +    }\

If it's faster, it's obviously a good thing, but I'll let you iron out
the bugs before analysing it further.

> Index: libavcodec/svq1dec.c
> ===================================================================
> --- libavcodec/svq1dec.c	(revision 16028)
> +++ libavcodec/svq1dec.c	(working copy)
> @@ -195,7 +195,7 @@
>  
>  #define SVQ1_CALC_CODEBOOK_ENTRIES(cbook)\
>        codebook = (const uint32_t *) cbook[level];\
> -      bit_cache = get_bits (bitbuf, 4*stages);\
> +      bit_cache = get_bits_long (bitbuf, 4*stages);\
>        /* calculate codebook entries for this vector */\
>        for (j=0; j < stages; j++) {\
>          entries[j] = (((bit_cache >> (4*(stages - j - 1))) & 0xF) + 16*j) << (level + 1);\
> @@ -652,7 +652,7 @@
>    init_get_bits(&s->gb,buf,buf_size*8);
>  
>    /* decode frame header */
> -  s->f_code = get_bits (&s->gb, 22);
> +  s->f_code = get_bits_long (&s->gb, 22);
>  
>    if ((s->f_code & ~0x70) || !(s->f_code & 0x60))
>      return -1;

This looks correct.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list