[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.

James Almer jamrial at gmail.com
Tue Jul 14 04:39:15 CEST 2015


On 12/07/15 8:33 PM, Ronald S. Bultje wrote:
> +INIT_XMM sse4
> +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w
> +    pxor              m0, m0
> +.loop:
> +    mova              m1, [sum0q+mmsize*0]
> +    mova              m2, [sum0q+mmsize*1]
> +    mova              m3, [sum0q+mmsize*2]
> +    mova              m4, [sum0q+mmsize*3]
> +    paddd             m1, [sum1q+mmsize*0]
> +    paddd             m2, [sum1q+mmsize*1]
> +    paddd             m3, [sum1q+mmsize*2]
> +    paddd             m4, [sum1q+mmsize*3]
> +    paddd             m1, m2
> +    paddd             m2, m3
> +    paddd             m3, m4
> +    paddd             m4, [sum0q+mmsize*4]
> +    paddd             m4, [sum1q+mmsize*4]
> +    TRANSPOSE4x4D      1, 2, 3, 4, 5
> +
> +    ; m1 = fs1, m2 = fs2, m3 = fss, m4 = fs12
> +    pslld             m3, 6
> +    pslld             m4, 6
> +    pmulld            m5, m1, m2                ; fs1 * fs2
> +    pmulld            m1, m1                    ; fs1 * fs1
> +    pmulld            m2, m2                    ; fs2 * fs2

If these values are guaranteed to be always positive then this could also
be implemented with pmuludq to get an sse2 version working. Although I'm
not sure if it's worth doing. It will be six pmuludq and an awful lot of
shuffling and unpacking when the speed up of the sse4 version is already
only ~2x the C version.

This was already oked (Same with the psnr sse2 code), so it should be
pushed already.


More information about the ffmpeg-devel mailing list