[FFmpeg-devel] [PATCH 2/5] lavfi/gradfun: fix rounding in MMX code.

Tue Dec 18 03:00:57 CET 2012

On Tue, Dec 18, 2012 at 02:31:24AM +0100, Clément Bœsch wrote:
> On Fri, Dec 07, 2012 at 11:49:40PM +0000, Loren Merritt wrote:
> > On Fri, 7 Dec 2012, Reimar Döffinger wrote:
> > > On 7 Dec 2012, at 16:28, Loren Merritt <lorenm at u.washington.edu> wrote:
> > >> On Fri, 7 Dec 2012, Reimar Döffinger wrote:
> > >>> "Clément Bÿÿsch" <ubitux at gmail.com> wrote:
> > >>>
> > >>>> Current code divide before increasing precision.
> > >>>
> > >>> Are you sure the shift can _never_ overflow? It is definitely very tight.
> > >>
> > >> I expect it can overflow, but the case where it does so also has m=0 for
> > >> sane filter strengths. So the appropriate fix would be to limit strength,
> > >> not change the asm.
> > >
> > > Since the proposal is to change the asm, do you mean the patch is right, it just should be combined with limiting the strength?
> > 
> > Yes.
> > 
> > > I have no objections about that, I just kind of assumed there was a good reason for doing things in this order.
> > 
> > The original order probably was about avoiding overflow, but limiting
> > strength is better.
> > 
> 
> Just to make sure I understand the issue correctly:
> 
> In the ASM, delta is expressed as u16, so delta<<2 will overflow for
> values > 0x3fff. In these cases we want that m, defined by:
>     int m = abs(delta) * thresh >> 16;     (1)
>     m = FFMAX(0, 127 - m);
> ...to be 0, so whatever the value of delta, the expression:
>     m = m * m * delta >> 14;
> ...will always be 0.
> 
> To do so, we just need to make sure that abs(delta) * thresh >> 16 to be
> always >= 127.
> 
> We currently have the threshold in the range from (1<<15)/0.51 to
> (1<<15)/255, with [0.51;255] being the range the user can specify. The
> higher the user specified threshold, the closer we get to 127 (from +oo).
> 
> We need to solve:
> 
>    delta * ((1<<15)/x) >> 16 >= 127 (based on (1)),
>      with delta > 0x3fff and x >= 0.51
> 
> => delta * (1<<15)/x / (1<<16) >= 127
>    delta / (2x) >= 127
> 
> so we have 0.51 <= x <= 2 * delta / 127
> 

Here is where is my mistake, the solution is 0.51 <= x <= 129/2, which is
indeed not in the range.

Does a threshold > 64 makes sense? If so, Michael proposed to fallback on
the C version in these cases. What do people prefer?

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121218/687cd5c7/attachment.asc>