# [FFmpeg-devel] [PATCH 2/5] lavfi/gradfun: fix rounding in MMX code.

Clément Bœsch ubitux at gmail.com
Tue Dec 18 02:31:24 CET 2012

```On Fri, Dec 07, 2012 at 11:49:40PM +0000, Loren Merritt wrote:
> On Fri, 7 Dec 2012, Reimar Döffinger wrote:
> > On 7 Dec 2012, at 16:28, Loren Merritt <lorenm at u.washington.edu> wrote:
> >> On Fri, 7 Dec 2012, Reimar Döffinger wrote:
> >>> "Clément Bÿÿsch" <ubitux at gmail.com> wrote:
> >>>
> >>>> Current code divide before increasing precision.
> >>>
> >>> Are you sure the shift can _never_ overflow? It is definitely very tight.
> >>
> >> I expect it can overflow, but the case where it does so also has m=0 for
> >> sane filter strengths. So the appropriate fix would be to limit strength,
> >> not change the asm.
> >
> > Since the proposal is to change the asm, do you mean the patch is right, it just should be combined with limiting the strength?
>
> Yes.
>
> > I have no objections about that, I just kind of assumed there was a good reason for doing things in this order.
>
> The original order probably was about avoiding overflow, but limiting
> strength is better.
>

Just to make sure I understand the issue correctly:

In the ASM, delta is expressed as u16, so delta<<2 will overflow for
values > 0x3fff. In these cases we want that m, defined by:
int m = abs(delta) * thresh >> 16;     (1)
m = FFMAX(0, 127 - m);
...to be 0, so whatever the value of delta, the expression:
m = m * m * delta >> 14;
...will always be 0.

To do so, we just need to make sure that abs(delta) * thresh >> 16 to be
always >= 127.

We currently have the threshold in the range from (1<<15)/0.51 to
(1<<15)/255, with [0.51;255] being the range the user can specify. The
higher the user specified threshold, the closer we get to 127 (from +oo).

We need to solve:

delta * ((1<<15)/x) >> 16 >= 127 (based on (1)),
with delta > 0x3fff and x >= 0.51

=> delta * (1<<15)/x / (1<<16) >= 127
delta / (2x) >= 127

so we have 0.51 <= x <= 2 * delta / 127

The smallest overflowing delta possible is 0x3fff, so x must be <=
2*0x3fff/127 = 258.

258 looks in the current [0.51;255] user range.

What am I missing?

Note: it's highly probable I failed at math again.

--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121218/e360e281/attachment.asc>
```