[FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Martin Vignali martin.vignali at gmail.com
Tue Oct 10 22:54:16 EEST 2017


>
>> This is still slower then the memset numbers from the first test, why
>> the high variation in there?
>>
>>
>
Hello,

Maybe the result in my first email was not very clear

For the results below i run the checkasm test 10 times in each case and
take the faster.


Original benchmark (similar to the current way in the proresdec)

using these func

static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count)
{
    int i;
    for (i = 0; i < block_count; i++) {
        memset(blocks+(i << 6), 0, sizeof(int16_t) * 64);
    }
}

static void ff_clear_blocks_prores_sse(int16_t * blocks, ptrdiff_t
block_count){
    int i;
    for (i = 0; i < block_count; i++)
        ff_clear_block_sse(blocks+(i<<6));
}

static void ff_clear_blocks_prores_avx(int16_t * blocks, ptrdiff_t
block_count){
    int i;
    for (i = 0; i < block_count; i++)
        ff_clear_block_avx(blocks+(i<<6));
}

blockdsp.clear_blocks_prores_c: 570.3
blockdsp.clear_blocks_prores_sse: 325.8
blockdsp.clear_blocks_prores_avx: 190.3



new version
blockdsp.clear_blocks_prores_c: 138.3
blockdsp.clear_blocks_prores_sse: 274.6
blockdsp.clear_blocks_prores_avx: 137.6

with the new patch

using for the c version
static void clear_blocks_prores_c(int16_t * blocks, ptrdiff_t block_count)
{
    memset(blocks, 0, sizeof(int16_t) * 64 * block_count);
}


Martin


More information about the ffmpeg-devel mailing list