[FFmpeg-devel] [PATCH 5/5] pp: add SSE2 deInterlaceInterpolateCubic().

Michael Niedermayer michaelni at gmx.at
Sat Nov 17 15:59:17 CET 2012


On Sat, Nov 17, 2012 at 01:07:13PM +0100, Clément Bœsch wrote:
> 2124 decicycles in deInterlaceInterpolateCubic_C, 67100774 runs, 8090 skips
> 458 decicycles in deInterlaceInterpolateCubic_MMX2, 67107146 runs, 1718 skips
> 382 decicycles in deInterlaceInterpolateCubic_SSE2, 67107086 runs, 1778 skips
> ---
>  libpostproc/postprocess_template.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c
> index dc63032..0729e8f 100644
> --- a/libpostproc/postprocess_template.c
> +++ b/libpostproc/postprocess_template.c
> @@ -1497,13 +1497,30 @@ static inline void RENAME(deInterlaceInterpolateLinear)(uint8_t src[], int strid
>   */
>  static inline void RENAME(deInterlaceInterpolateCubic)(uint8_t src[], int stride)
>  {
> -#if TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> +#if TEMPLATE_PP_SSE2 || TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
>      src+= stride*3;
>      __asm__ volatile(
>          "lea (%0, %1), %%"REG_a"                \n\t"
>          "lea (%%"REG_a", %1, 4), %%"REG_d"      \n\t"
>          "lea (%%"REG_d", %1, 4), %%"REG_c"      \n\t"
>          "add %1, %%"REG_c"                      \n\t"
> +#if TEMPLATE_PP_SSE2
> +        "pxor %%xmm7, %%xmm7                    \n\t"
> +#define REAL_DEINT_CUBIC(a,b,c,d,e)\
> +        "movq " #a ", %%xmm0                    \n\t"\
> +        "movq " #b ", %%xmm1                    \n\t"\
> +        "movq " #d ", %%xmm2                    \n\t"\
> +        "movq " #e ", %%xmm3                    \n\t"\
> +        "pavgb %%xmm2, %%xmm1                   \n\t"\
> +        "pavgb %%xmm3, %%xmm0                   \n\t"\
> +        "punpcklbw %%xmm7, %%xmm0               \n\t"\
> +        "punpcklbw %%xmm7, %%xmm1               \n\t"\
> +        "psubw %%xmm1, %%xmm0                   \n\t"\
> +        "psraw $3, %%xmm0                       \n\t"\
> +        "psubw %%xmm0, %%xmm1                   \n\t"\
> +        "packuswb %%xmm1, %%xmm1                \n\t"\
> +        "movlps %%xmm1, " #c "                  \n\t"
> +#else //TEMPLATE_PP_SSE2

the code should be re structured to run these filters on larger blocks
that is at least 16pixel or the whole width

but until then this should be ok but the sse registers should be added
to the clobber list


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121117/7ecd9625/attachment.asc>


More information about the ffmpeg-devel mailing list