[FFmpeg-devel] [PATCH] Fix emulated_edge_mc SSE code to not contain SSE2 instructions on x86-32.

Ronald S. Bultje rsbultje at gmail.com
Thu Oct 10 13:56:51 CEST 2013


Hi,

On Thu, Oct 10, 2013 at 7:35 AM, Michael Niedermayer <michaelni at gmx.at>wrote:

> On Wed, Oct 09, 2013 at 09:24:20PM -0400, Ronald S. Bultje wrote:
> > ---
> >  libavcodec/x86/videodsp.asm    | 64
> +++++++++++++++++++++++++-----------------
> >  libavcodec/x86/videodsp_init.c | 11 ++++++--
> >  2 files changed, 47 insertions(+), 28 deletions(-)
> >
> > diff --git a/libavcodec/x86/videodsp.asm b/libavcodec/x86/videodsp.asm
> > index cc2105d..aa865f5 100644
> > --- a/libavcodec/x86/videodsp.asm
> > +++ b/libavcodec/x86/videodsp.asm
> > @@ -100,7 +100,7 @@ cglobal emu_edge_hvar, 5, 6, 1, dst, dst_stride,
> start_x, n_words, h, w
> >      ; FIXME also write a ssse3 version using pshufb
> >      movzx            wd, byte [dstq+start_xq]   ;   w = read(1)
> >      imul             wd, 0x01010101             ;   w *= 0x01010101
> > -    movd             m0, wd
> > +    movd             m0, wd                     ;   FIXME this is sse2,
> not sse
> [...]
> > diff --git a/libavcodec/x86/videodsp_init.c
> b/libavcodec/x86/videodsp_init.c
> > index a709d03..b5bab72 100644
> > --- a/libavcodec/x86/videodsp_init.c
> > +++ b/libavcodec/x86/videodsp_init.c
> > @@ -219,9 +219,14 @@ static av_noinline void
> emulated_edge_mc_sse(uint8_t *buf, ptrdiff_t buf_stride,
> >                                               int block_w, int block_h,
> >                                               int src_x, int src_y, int
> w, int h)
> >  {
> > -    emulated_edge_mc(buf, buf_stride, src, src_stride, block_w, block_h,
> > -                     src_x, src_y, w, h, vfixtbl_sse,
> &ff_emu_edge_vvar_sse,
> > -                     hfixtbl_sse, &ff_emu_edge_hvar_sse);
> > +    emulated_edge_mc(buf, buf_stride, src, src_stride, block_w,
> block_h, src_x,
> > +                     src_y, w, h, vfixtbl_sse, &ff_emu_edge_vvar_sse,
> hfixtbl_sse,
> > +#if ARCH_X86_64
> > +                     &ff_emu_edge_hvar_sse
> > +#else
> > +                     &ff_emu_edge_hvar_mmx
> > +#endif
>
> this is a bit ugly but more important is to fix that bug so
> applied
>
> thanks


I'm not really sure how to properly fix this. I mean, I can temporarily
store on the stack (e.g. mov r0m, wd, followed by movss m0, r0m), but that
introduces an extra instruction in the loop, so we'd probably want a sse
and a sse2 version of the function (on x86-32 only of course; x86-64 would
simply default to the sse2 one), and then have 3 (mmx, sse or sse2) instead
of the current 2 wrappers in videodsp_init.c (again, only on x86-32; x86-64
would always use sse2).

In terms of binary size, it's not that bad, since it's only one small
function that would have 3 implementations (hvar); all others would simply
use the sse version in the sse2 wrapper. Does that sound ok? Loren, any
better ideas?

Ronald


More information about the ffmpeg-devel mailing list