[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.

Frank Barchard fbarchard
Tue Aug 18 00:46:11 CEST 2009


2009/8/17 M?ns Rullg?rd <mans at mansr.com>

> And you might know it's unbounded.
>
yes.  So a general purpose function has to do it the slow way.  But this
function is applied within swscaler on YUV data with a know range.


> > If you combined 3 bytes, its 768 values.
>
> "combined"?

Image operations are usually a function of 2 images, or 4 channels, which
puts a limit on how far out of range they can be.

>
>
> > I know if statements are increasingly efficient, and memory less
> efficient,
> > but the original code had 4 to 6 instructions and potentially 2 branches
> > taken per clipped value.
> > av_clip_uint8() can be optimized to a single instruction on most CPU's
>
> Yes, on those with dedicated clip instructions.  Others will need
> several instructions to support the full 32-bit range.  Even if the
> range is known to be smaller, a table lookup can be slower than a few
> compares and conditional instructions, and it poisons the cache
> needlessly.


Here's a benchmark on my code that is very similar.  This version, including
YUV conversion, runs in 2.97ms

static inline uint32 clip(int32 value) {

  if (value < 0)    return 0u;   if (value > 65535)     return 255u;
return static_cast<uint32>(value >> 8);}

*This code runs in 2.11ms*

static inline uint32 clip(int32 value) {  return
static_cast<uint32>(g_rgb_clip_table[((value) >> 8) +
kClipOverflow]);}

The table is read only, so the cache lines are not dirty, and image
data tends to be coherent and only use a portion of table.  The tables
for simple YUV clipping are 832 bytes.


>
> >> > On x86, there is cmov, but in the above code it would take cmp,
> >> > cmov, cmp, cmov to do each value, whereas the table method takes
> >> > one mov instruction.
> >>
> >> You're forgetting the address calculation.
> >
> > movzx eax,cliptbl[eax*4]
>
> Now you're back at the 4GB table.  And where did the value of
> "cliptbl" come from?  It would have to be loaded from somewhere.


cliptbl is an array.  You can index directly off arrays on x86.


>
>
> --
> M?ns Rullg?rd
> mans at mansr.com
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> https://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>



More information about the ffmpeg-devel mailing list