[Ffmpeg-devel] A question about 'dct_unquantize_h263_intra'

Michael Niedermayer michaelni
Tue Jan 2 18:44:23 CET 2007


Hi

On Tue, Jan 02, 2007 at 04:23:00AM +0200, Siarhei Siamashka wrote:
> Hello all,
> 
> I started optimizing 'dct_unquantize_h263_intra' for ARM (armv5te). Attached
> patch improves performance already ('dct_unquantize_h263_helper_armv5te' is
> twice faster than 'dct_unquantize_h263_helper_c' , also there is a visible
> improvement for overall video decoding performance). This code is a
> straightforward optimization of 'mpegvideo.c' (only assuming that result of
> multiplication does not overflow 16-bits). Right now it takes about 7 cycles
> to process each element. But I checked 'mpegvideo_mmx.c' and got some 
> more optimization ideas.
> 
> Is it safe to assume:
> 1. Result of 'level = level * qmul - qadd' will never overflow signed 16-bits?

yes


> 2. DCTELEM *block is always at least 8 bytes aligned?

yes if not its a bug


> 3. Processing extra elements after block[nCoeffs] is safe (up to but not 
> including block[(nCoeffs + 7) / 8 * 8])?

block[0 .. 63] is always safe
nCoeffs <= 64


[...]
> +
> +#include "../dsputil.h"
> +#include "../mpegvideo.h"
> +#include "../avcodec.h"
> +
> +/**
> + * h263 dequantizer supplementary function, it is performance critical
> + * and needs to have optimized implementations for each architecture
> + */
> +static inline void dct_unquantize_h263_helper_c(DCTELEM *block, int qmul, int qadd, int count)
> +{
> +    int i, level;
> +    for (i = 0; i < count; i++) {
> +        level = block[i];
> +        if (level) {
> +            if (level < 0) {
> +                level = level * qmul - qadd;
> +            } else {
> +                level = level * qmul + qadd;
> +            }
> +            block[i] = level;
> +        }
> +    }
> +}

this looks like a duplicate of dct_unquantize_h263_inter_c() ?



> +
> +/* GCC 3.1 or higher is required to support symbolic names in assembly code */
> +#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 1))
> +
> +/**
> + * Code optimized for armv5te, uses fast single cycle 16-bit dsp multiply
> + * instruction, is unrolled to process 4 elements per iteration and has
> + * code sheduled to avoid pipeline stalls. Should take 7 cycles
> + * per element on ARM926EJ-S (Nokia 770)
> + */
> +#define dct_unquantize_h263_helper_armv5te(__block, __qmul, __qadd, __count) \

things starting with __ are reserved in C please dont use such names


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070102/91322bfe/attachment.pgp>



More information about the ffmpeg-devel mailing list