[FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Hendrik Leppkes h.leppkes at gmail.com
Thu Oct 5 19:04:42 EEST 2017


On Thu, Oct 5, 2017 at 4:58 PM, Martin Vignali <martin.vignali at gmail.com> wrote:
> Hello,
>
> In attach patchs to add a dedicated func for clear_block inside
> prores decoding (proresdec2)
>
> currently slice decode func use a loop and call the blockdsp.clear_block
> func
>
> After some test, it seems to be slower, than memset (for me)
> I check using this "fake" func in the blockdsp
> static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t
> block_count){
>     int i;
>     for (i = 0; i < block_count; i++)
>         ff_clear_block_sse(blocks+(i<<6));
> }
>
> static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t
> block_count){
>     int i;
>     for (i = 0; i < block_count; i++)
>         ff_clear_block_avx(blocks+(i<<6));
> }
>
> the result in checkasm are (need patch in attach to reproduce the test) :
> using the loop
> blockdsp.clear_blocks_prores_c: 137.8
> blockdsp.clear_blocks_prores_sse: 292.0
> blockdsp.clear_blocks_prores_avx: 230.5
>
>
> Using the new asm func this is the result (Kaby Lake, os 10.12, Clang 8.1)
> blockdsp.clear_blocks_prores_c: 153.4
> blockdsp.clear_blocks_prores_sse: 284.4
> blockdsp.clear_blocks_prores_avx: 142.2
>
>

This is still slower then the memset numbers from the first test, why
the high variation in there?

- Hendrik


More information about the ffmpeg-devel mailing list