[FFmpeg-devel] libavcodec/blockdsp : add clear_blocks_prores func (SSE, AVX) for prores decoding

Martin Vignali martin.vignali at gmail.com
Thu Oct 5 17:58:44 EEST 2017


Hello,

In attach patchs to add a dedicated func for clear_block inside
prores decoding (proresdec2)

currently slice decode func use a loop and call the blockdsp.clear_block
func

After some test, it seems to be slower, than memset (for me)
I check using this "fake" func in the blockdsp
static void ff_clear_blocks_prores_sse_loop(int16_t * blocks, ptrdiff_t
block_count){
    int i;
    for (i = 0; i < block_count; i++)
        ff_clear_block_sse(blocks+(i<<6));
}

static void ff_clear_blocks_prores_avx_loop(int16_t * blocks, ptrdiff_t
block_count){
    int i;
    for (i = 0; i < block_count; i++)
        ff_clear_block_avx(blocks+(i<<6));
}

the result in checkasm are (need patch in attach to reproduce the test) :
using the loop
blockdsp.clear_blocks_prores_c: 137.8
blockdsp.clear_blocks_prores_sse: 292.0
blockdsp.clear_blocks_prores_avx: 230.5


Using the new asm func this is the result (Kaby Lake, os 10.12, Clang 8.1)
blockdsp.clear_blocks_prores_c: 153.4
blockdsp.clear_blocks_prores_sse: 284.4
blockdsp.clear_blocks_prores_avx: 142.2

Pass fate test for me (X86_64)

Like the block_per_slice value in prores decoder, is multiply by 2 or 4
(depend of the codec), the asm function
can process two blocks in the same loop (in AVX)

I also put in attach a patch to fix comment, for clear_block dsp func,
(need 32 aligned now because of avx) (to avoid a "dedicated" thread on the
mailing list)

Martin
Jokyo Images
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-libavcodec-blockdsp-fix-comment.-clear_block-need-32.patch
Type: application/octet-stream
Size: 1002 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171005/7b71f75a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-libavcodec-blockdsp-add-clear_block_prores.patch
Type: application/octet-stream
Size: 6048 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171005/7b71f75a/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-libavcodec-proresdec2-use-clear_blocks_prores-for-ea.patch
Type: application/octet-stream
Size: 1626 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171005/7b71f75a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-libavcodec-blockdsp-cosmetic-indent.patch
Type: application/octet-stream
Size: 2694 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20171005/7b71f75a/attachment-0003.obj>


More information about the ffmpeg-devel mailing list