[FFmpeg-devel] HEVC patch : adding ASM functions

Ronald S. Bultje rsbultje at gmail.com
Mon Feb 24 03:31:22 CET 2014


On Fri, Feb 21, 2014 at 10:07 AM, Pierre Edouard Lepere <
Pierre-Edouard.Lepere at insa-rennes.fr> wrote:

> Hello
> this patch changes some c functions for the MC and adds ASM functions that
> are enabled through pointers.

> +    add               r9, %3                     ; add 4 for width loop
> +    cmp               r9, widthq                 ; cmp width
> +    jl                %2                         ; width loop

Loops over width is a pretty terrible idea. I understand not all MC
functions in HEVC are powers-of-two, but try to find some way to make the
width essentially be an index in a function pointer array so that
per-function, the width is a fixed integer and this loop can be completely
unrolled - it'll be much faster. This is _totally_ worth the code size
increase, because MC is the single biggest CPU user.

> +epel_extra_before   DB  1                        ;corresponds to
> +max_pb_size         DB  64                       ;corresponds to
MAX_PB_SIZE in hevc.h
> +
> +qpel_extra_before   DB  0,  3,  3,  2            ;corresponds to the
ff_hevc_qpel_extra_before in hevc.c
> +qpel_extra          DB  0,  6,  7,  6            ;corresponds to the
ff_hevc_qpel_extra in hevc.c
> +qpel_extra_after    DB  0,  3,  4,  4

None of this is used in this patch.

> +    cglobal hevc_put_hevc_epel_h16_8, 9, 12, 0 , dst, dststride, src,
> +    PUT_HEVC_EPEL_H    16, 8
> +    RET


So just generally, patch 1 seems to be adding MC SIMD (sse4, x86-64-only);
it would help if the commit message said that. The initialization in
dsp_init is a mess but Michael already pointed that out.

Patch 2 says "changed qpel functions", what was changed?

Patch 3 says "added new epel functions", but, it looks like it does the
same as patch 2, but for epel (chroma) rather than qpel (luma), is that

Patch 4 is the dsp_init portion of patch 1, and patch 5 cleans it up? Maybe
they should be merged?

Patch 6 - how about using yasm macros for this table generation? See
vp9mc.asm (it's basically times 4 db A, B, C, D to get db a, b, c, d, a, b,
c, d, ... plus some magic to export it).


More information about the ffmpeg-devel mailing list