[FFmpeg-devel] [PATCHv4] avutil/common: add av_rint64_clip

Sat Nov 14 00:16:57 CET 2015

On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
> wrote:
>
>> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje <rsbultje at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> > wrote:
>> >
>> >> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje <rsbultje at gmail.com>
>> >> wrote:
>> >> > Hi Ganesh,
>> >> > On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" <
>> gajjanagadde at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> The rationale for this function is reflected in the documentation for
>> >> >> it, and is copied here:
>> >> >>
>> >> >> Clip a double value into the long long amin-amax range.
>> >> >> This function is needed because conversion of floating point to
>> integers
>> >> > when
>> >> >> it does not fit in the integer's representation does not necessarily
>> >> > saturate
>> >> >> correctly (usually converted to a cvttsd2si on x86) which saturates
>> >> > numbers
>> >> >> > INT64_MAX to INT64_MIN. The standard marks such conversions as
>> >> undefined
>> >> >> behavior, allowing this sort of mathematically bogus conversions.
>> This
>> >> > provides
>> >> >> a safe alternative that is slower obviously but assures safety and
>> >> better
>> >> >> mathematical behavior.
>> >> >> API:
>> >> >> @param a value to clip
>> >> >> @param amin minimum value of the clip range
>> >> >> @param amax maximum value of the clip range
>> >> >> @return clipped value
>> >> >>
>> >> >> Note that a priori if one can guarantee from the calling side that
>> the
>> >> >> double is in range, it is safe to simply do an explicit/implicit
>> cast,
>> >> >> and that will be far faster. However, otherwise this function should
>> be
>> >> >> used.
>> >> >>
>> >> >> avutil minor version is bumped.
>> >> >>
>> >> >> Reviewed-by: Ronald S. Bultje <rsbultje at gmail.com>
>> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> >> >> ---
>> >> >>  libavutil/common.h  | 30 ++++++++++++++++++++++++++++++
>> >> >>  libavutil/version.h |  2 +-
>> >> >>  2 files changed, 31 insertions(+), 1 deletion(-)
>> >> >>
>> >> >> diff --git a/libavutil/common.h b/libavutil/common.h
>> >> >> index 6f0f582..f4687ab 100644
>> >> >> --- a/libavutil/common.h
>> >> >> +++ b/libavutil/common.h
>> >> >> @@ -298,6 +298,33 @@ static av_always_inline av_const double
>> >> > av_clipd_c(double a, double amin, double
>> >> >>      else               return a;
>> >> >>  }
>> >> >>
>> >> >> +/**
>> >> >> + * Clip and convert a double value into the long long amin-amax
>> range.
>> >> >> + * This function is needed because conversion of floating point to
>> >> > integers when
>> >> >> + * it does not fit in the integer's representation does not
>> necessarily
>> >> > saturate
>> >> >> + * correctly (usually converted to a cvttsd2si on x86) which
>> saturates
>> >> > numbers
>> >> >> + * > INT64_MAX to INT64_MIN. The standard marks such conversions as
>> >> > undefined
>> >> >> + * behavior, allowing this sort of mathematically bogus conversions.
>> >> > This provides
>> >> >> + * a safe alternative that is slower obviously but assures safety
>> and
>> >> > better
>> >> >> + * mathematical behavior.
>> >> >> + * @param a value to clip
>> >> >> + * @param amin minimum value of the clip range
>> >> >> + * @param amax maximum value of the clip range
>> >> >> + * @return clipped value
>> >> >> + */
>> >> >> +static av_always_inline av_const int64_t av_rint64_clip_c(double a,
>> >> > int64_t amin, int64_t amax)
>> >> >> +{
>> >> >> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) &&
>> ASSERT_LEVEL
>> >> >>= 2
>> >> >> +    if (amin > amax) abort();
>> >> >> +#endif
>> >> >> +    // INT64_MAX+1,INT64_MIN are exactly representable as IEEE
>> doubles
>> >> >> +    if (a >=  9223372036854775808.0 || llrint(a) >= amax)
>> >> >> +        return amax;
>> >> >> +    if (a <= -9223372036854775808.0 || llrint(a) <= amin)
>> >> >> +        return amin;
>> >> >
>> >> > Doesn't this allow negative overflows in the max check? I think you
>> need
>> >> > both overflow checks before the min/max checks. Try the next float val
>> >> > smaller than int64_min as input with a small amax, eg 0. I bet it
>> >> returns 0
>> >> > instead of amin.
>> >>
>> >> They are needed. As you and others can clearly see, I am quite bad
>> >> with this stuff.
>> >
>> >
>> > Hm, so, getting back to my computer, I wanted to test this, and I have
>> this
>> > problem: llrint() works correctly for me for the "undefined" cases, i.e.,
>> > it already does what you're trying to fix in av_rint64_clip_c.
>> >
>> > llrint(-10223372056756029440.000000) returns -9223372036854775808
>> > llrint(10223372056756029440.000000) returns 9223372036854775807
>> >
>> > So, how do I reproduce that llrint() overflows?
>>
>> The link I gave originally
>> http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer
>> gives an illustration. Maybe the weird behavior happens only on
>> 9223372036854775808.0. This happens because INT64_MAX+1 is not
>> representable in long long, and hence signed overflow occurs yielding
>> INT64_MIN (of course undefined). Here is a minimal test program:
>> #include <stdio.h>
>> #include <math.h>
>>
>> int main(void) {
>>     printf("%lld\n", llrint(9223372036854775808.0));
>>     return 0;
>> }
>>
>
> 9223372036854775807
>
> Seems apple's libc got one thing right :)

I personally am not that charitable, looking more carefully at your
asm shows a cmplesd, suggesting slowdown. Here is a source reference:
https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c.
As usual, Apple dumps many implementations of llrint and it is unclear
which is actually being used on OS X at the moment (see e.g other
https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c),
but I digress.

They essentially all put special case code like the patch above. Thus
their function is inherently slower than the conformant GNU libm
implementation. A client may very well want a branch free llrint for
speed. Apple offers no performance choice here, forcing a fast llrint
to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected
by this slowdown.

On that note, I also don't know if Timothy has mentioned some
conversations I have had with him and Michael regarding fast libm and
its varying performance across our platforms of interest. I may send a
mail regarding this soon.

Anyway, will push soon.

>
> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel