[FFmpeg-devel] [PATCH 1/6] x86: huffyuvdsp: port mmx add_bytes to yasm

Michael Niedermayer michaelni at gmx.at
Thu May 29 14:36:37 CEST 2014


On Thu, May 29, 2014 at 09:10:35AM +0000, Christophe Gisquet wrote:
> 68c to 56c.
> ---
>  libavcodec/x86/huffyuvdsp.asm    | 32 ++++++++++++++++++++++++++++++++
>  libavcodec/x86/huffyuvdsp_init.c |  2 +-
>  libavcodec/x86/huffyuvdsp_mmx.c  | 32 +-------------------------------
>  3 files changed, 34 insertions(+), 32 deletions(-)
> 
> diff --git a/libavcodec/x86/huffyuvdsp.asm b/libavcodec/x86/huffyuvdsp.asm
> index f183ebe..7acab87 100644
> --- a/libavcodec/x86/huffyuvdsp.asm
> +++ b/libavcodec/x86/huffyuvdsp.asm
> @@ -163,3 +163,35 @@ cglobal add_hfyu_left_pred, 3,3,7, dst, src, w, left
>      ADD_HFYU_LEFT_LOOP 0, 1
>  .src_unaligned:
>      ADD_HFYU_LEFT_LOOP 0, 0
> +
> +INIT_MMX mmx
> +cglobal add_bytes, 3,4,4, dst, src, w, size
> +    mov  sizeq, wq

w is int/32bit this can leave trash in the high 32bit


> +    and  sizeq, -2*mmsize

same


> +    jz  .2
> +    add  dstq, sizeq
> +    add  srcq, sizeq
> +    neg  sizeq
> +.1:

> +    movu   m0, [dstq + sizeq]
> +    movu   m1, [srcq + sizeq]
> +    movu   m2, [dstq + sizeq + mmsize]
> +    movu   m3, [srcq + sizeq + mmsize]

these should be mova, so in case this gets extended to SSE* it
doesnt end up with unaligned slow movs


> +    paddb  m1, m0
> +    paddb  m3, m2

> +    movu   [dstq + sizeq], m1
> +    movu   [dstq + sizeq + mmsize], m3

these too

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140529/f8903085/attachment.asc>


More information about the ffmpeg-devel mailing list