[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.
James Almer
jamrial at gmail.com
Tue Jul 14 04:39:15 CEST 2015
On 12/07/15 8:33 PM, Ronald S. Bultje wrote:
> +INIT_XMM sse4
> +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w
> + pxor m0, m0
> +.loop:
> + mova m1, [sum0q+mmsize*0]
> + mova m2, [sum0q+mmsize*1]
> + mova m3, [sum0q+mmsize*2]
> + mova m4, [sum0q+mmsize*3]
> + paddd m1, [sum1q+mmsize*0]
> + paddd m2, [sum1q+mmsize*1]
> + paddd m3, [sum1q+mmsize*2]
> + paddd m4, [sum1q+mmsize*3]
> + paddd m1, m2
> + paddd m2, m3
> + paddd m3, m4
> + paddd m4, [sum0q+mmsize*4]
> + paddd m4, [sum1q+mmsize*4]
> + TRANSPOSE4x4D 1, 2, 3, 4, 5
> +
> + ; m1 = fs1, m2 = fs2, m3 = fss, m4 = fs12
> + pslld m3, 6
> + pslld m4, 6
> + pmulld m5, m1, m2 ; fs1 * fs2
> + pmulld m1, m1 ; fs1 * fs1
> + pmulld m2, m2 ; fs2 * fs2
If these values are guaranteed to be always positive then this could also
be implemented with pmuludq to get an sse2 version working. Although I'm
not sure if it's worth doing. It will be six pmuludq and an awful lot of
shuffling and unpacking when the speed up of the sse4 version is already
only ~2x the C version.
This was already oked (Same with the psnr sse2 code), so it should be
pushed already.
More information about the ffmpeg-devel
mailing list