[FFmpeg-devel] [PATCH] lavfi/vf_deshake: fix segfaults #2443

Mon Apr 15 19:45:24 CEST 2013

On Mon, Apr 15, 2013 at 11:43:31AM -0300, João Bernardo wrote:
> > >      2nd - SSE2 instruction PSADBW need memory aligned for 128bit
> > operands
> >  > (XMM)
> > >
> > > diff --git a/libavcodec/x86/motion_est.c b/libavcodec/x86/motion_est.c
> > > index 3ffb002..d828d8a 100644
> > > --- a/libavcodec/x86/motion_est.c
> > > +++ b/libavcodec/x86/motion_est.c
> > > @@ -104,8 +104,10 @@ static int sad16_sse2(void *v, uint8_t *blk2,
> > uint8_t
> > > *blk1, int stride, int h)
> > >          "1:                             \n\t"
> > >          "movdqu (%1), %%xmm0            \n\t"
> > >          "movdqu (%1, %4), %%xmm1        \n\t"
> > > -        "psadbw (%2), %%xmm0            \n\t"
> > > -        "psadbw (%2, %4), %%xmm1        \n\t"
> > > +        "movdqu (%2), %%xmm2            \n\t"
> > > +        "movdqu (%2, %4), %%xmm3        \n\t"
> > > +        "psadbw %%xmm2, %%xmm0          \n\t"
> > > +        "psadbw %%xmm3, %%xmm1          \n\t"
> >
> > The input to this function must be aligned to the blocksize of 8 or 16
> >
> 
> This is not possible with "rx" values on deshake filter (unless you do
> extra copying).

the code can be changed to work on an aligned grid, this does not need
extra copying, just dont start at rx but rather at FFALIGN(rx, 16) or
FFALGN(rx+15, 16)
besides, extra copying is negligible in speed, with rx/y being 20
there are 1600 16x16 compares for 1 copy of a 16x16 block

also if you care about speed change the 1600 compare search to one of
the predictive zonal style searches whatever the more modern ones are
called or even a old one like EPZS, that will give a 100x speedup
while the copy is < 1% slowdown with the current exhaustive search

[...]

> 
> 
> > please see dsputil.h:
> > typedef int (*me_cmp_func)(void /*MpegEncContext*/ *s, uint8_t
> > *blk1/*align width (8 or 16)*/, uint8_t *blk2/*align 1*/, int line_size,
> > int h)/* __attribute__ ((const))*/;
> >
> >
> Actually the requirement is for it be aligned on width 16 for XMM
> registers. On MMX registers, you can have the packed sum
> on misaligned data if you disable the check.

The API requires for width 16 an alignment of 16 for the first src

People who write SIMD, be that SSE2, MMX, NEON, altivec or other can
trust that no caller breaks this rule and all callers can trust that
it will work if they align the first src that way and dont align the
second.
You cant just pick one implementation and change it to work with
unaligned data and then expect that your one caller can saftely call it
with unaligned data in both src while the API still says it is 16byte
aligned.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130415/6e801e79/attachment.asc>