[FFmpeg-devel] Trans.: a64multienc.c and drawutils.c optimisations

Wed Dec 28 12:06:14 CET 2011

On Tue, Dec 27, 2011 at 10:59:03PM +0100, yann.lepetitcorps at free.fr wrote:
> +void ff_memset16(int16_t *dst, int16_t *val, int num)
> +{
> +    int i;
> +    int16_t  set16 = *val;
> +
> +    for(i=0;i<num;i++)
> +        *dst++ = set16;

It usually is a good idea to both benchmark and look at the compiler
output.
Usually, this way of writing it is the worst way, with a lot of
compilers creating two increments for each loop iteration.
If you trust the compiler, this way of writing often produces the best
code (as long as the loop is simple enough for the compiler to "get"
it):

> +    for(i=0;i<num;i++)
> +        dst[i] = set16;

If you don't trust the compiler, this variant should make it more
explicit what you want the end-result to look like:
int16_t *end = dst + num;
while (dst < end)
  *dst++ = set16;
Disadvantage: compiler potentially will not recognize it as a loop
and thus not do advanced optimizations like auto-vectorization etc.
Of course depending on the specifics it might make a little to a lot
more sense to unroll the loop.

> +void ff_memset_sized(char *dst, char *src, int num, int stepsize)
> +{
> +    int i;
> +
> +    for (i = 0; i < num; i++, dst += stepsize)
> +        memcpy(dst, src, stepsize);
> +}

Of course there's the question if one single macro (or av_always_inline
function) with this content wouldn't serve the same purpose as all those
different functions.
For non-x86 alignment might be a bit of an issue though (as in, this
variant doesn't tell the compiler that dst will always be aligned to
stepsize).