[FFmpeg-devel] [PATCH 4/5] x86: hpeldsp: implement SSE2 versions
James Almer
jamrial at gmail.com
Thu May 22 22:13:50 CEST 2014
On 22/05/14 2:48 PM, Christophe Gisquet wrote:
> Those are mostly used in codecs older than H.264, eg MPEG-2.
>
> put16 versions:
> mmx mmx2 sse2
> x2: 1888 1185 552
> y2: 1778 1092 510
>
> avg16 xy2: 3509(mmx2) -> 2169(sse2)
> ---
> libavcodec/x86/hpeldsp.asm | 115 +++++++++++++++++++++++++++++++-----------
> libavcodec/x86/hpeldsp_init.c | 15 ++++++
> 2 files changed, 100 insertions(+), 30 deletions(-)
>
> diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
> index 2adead2..1d26c45 100644
> --- a/libavcodec/x86/hpeldsp.asm
> +++ b/libavcodec/x86/hpeldsp.asm
> @@ -35,21 +35,39 @@ SECTION_TEXT
>
> ; void ff_put_pixels8_x2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, int h)
> %macro PUT_PIXELS8_X2 0
> +%if cpuflag(sse2)
> +cglobal put_pixels16_x2, 4,5,4
> +%else
> cglobal put_pixels8_x2, 4,5
> +%endif
> lea r4, [r2*2]
> .loop:
> - mova m0, [r1]
> - mova m1, [r1+r2]
> - PAVGB m0, [r1+1]
> - PAVGB m1, [r1+r2+1]
> + movu m0, [r1+1]
> + movu m1, [r1+r2+1]
I assume movu is needed for the sse2 version, but unless i'm missing
something there's no need to force it on the mmx version.
Afaik, old CPUs (The kind that doesn't have SSE2) have slow unaligned
movs, so performance would be degraded where it matters.
More information about the ffmpeg-devel
mailing list