[FFmpeg-devel] [PATCH 5/5] pp: add SSE2 deInterlaceInterpolateCubic().
Clément Bœsch
ubitux at gmail.com
Sun Nov 18 16:47:25 CET 2012
On Sun, Nov 18, 2012 at 01:14:34AM +0100, Michael Niedermayer wrote:
> On Sat, Nov 17, 2012 at 11:14:11PM +0100, Clément Bœsch wrote:
> > On Sat, Nov 17, 2012 at 03:59:17PM +0100, Michael Niedermayer wrote:
> > > On Sat, Nov 17, 2012 at 01:07:13PM +0100, Clément Bœsch wrote:
> > > > 2124 decicycles in deInterlaceInterpolateCubic_C, 67100774 runs, 8090 skips
> > > > 458 decicycles in deInterlaceInterpolateCubic_MMX2, 67107146 runs, 1718 skips
> > > > 382 decicycles in deInterlaceInterpolateCubic_SSE2, 67107086 runs, 1778 skips
> > > > ---
> > > > libpostproc/postprocess_template.c | 25 ++++++++++++++++++++++---
> > > > 1 file changed, 22 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c
> > > > index dc63032..0729e8f 100644
> > > > --- a/libpostproc/postprocess_template.c
> > > > +++ b/libpostproc/postprocess_template.c
> > > > @@ -1497,13 +1497,30 @@ static inline void RENAME(deInterlaceInterpolateLinear)(uint8_t src[], int strid
> > > > */
> > > > static inline void RENAME(deInterlaceInterpolateCubic)(uint8_t src[], int stride)
> > > > {
> > > > -#if TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> > > > +#if TEMPLATE_PP_SSE2 || TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> > > > src+= stride*3;
> > > > __asm__ volatile(
> > > > "lea (%0, %1), %%"REG_a" \n\t"
> > > > "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
> > > > "lea (%%"REG_d", %1, 4), %%"REG_c" \n\t"
> > > > "add %1, %%"REG_c" \n\t"
> > > > +#if TEMPLATE_PP_SSE2
> > > > + "pxor %%xmm7, %%xmm7 \n\t"
> > > > +#define REAL_DEINT_CUBIC(a,b,c,d,e)\
> > > > + "movq " #a ", %%xmm0 \n\t"\
> > > > + "movq " #b ", %%xmm1 \n\t"\
> > > > + "movq " #d ", %%xmm2 \n\t"\
> > > > + "movq " #e ", %%xmm3 \n\t"\
> > > > + "pavgb %%xmm2, %%xmm1 \n\t"\
> > > > + "pavgb %%xmm3, %%xmm0 \n\t"\
> > > > + "punpcklbw %%xmm7, %%xmm0 \n\t"\
> > > > + "punpcklbw %%xmm7, %%xmm1 \n\t"\
> > > > + "psubw %%xmm1, %%xmm0 \n\t"\
> > > > + "psraw $3, %%xmm0 \n\t"\
> > > > + "psubw %%xmm0, %%xmm1 \n\t"\
> > > > + "packuswb %%xmm1, %%xmm1 \n\t"\
> > > > + "movlps %%xmm1, " #c " \n\t"
> > > > +#else //TEMPLATE_PP_SSE2
> > >
> > > the code should be re structured to run these filters on larger blocks
> > > that is at least 16pixel or the whole width
> > >
> >
> > I don't feel like doing such thing soon, so feel free to do it :)
> >
> > > but until then this should be ok but the sse registers should be added
> > > to the clobber list
> > >
> >
> > Added, new patch attached.
>
> should be ok
>
Applied.
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121118/6365b637/attachment.asc>
More information about the ffmpeg-devel
mailing list