[FFmpeg-devel] [PATCH] Use av_clip_uint8 in swscale.

Ramiro Polla ramiro.polla
Mon Aug 17 04:00:42 CEST 2009


On Sat, Aug 15, 2009 at 1:34 PM, Reimar
D?ffinger<Reimar.Doeffinger at gmx.de> wrote:
> On Sat, Aug 15, 2009 at 12:27:49PM -0300, Ramiro Polla wrote:
>> diff --git a/swscale.c b/swscale.c
>> index c513066..340acfc 100644
>> --- a/swscale.c
>> +++ b/swscale.c
>> @@ -688,21 +688,12 @@ static inline void yuv2nv12XinC(const int16_t *lumFilter, const int16_t **lumSrc
>>
>> ?#define YSCALE_YUV_2_PACKEDX_C(type,alpha) \
>> ? ? ? ? ?YSCALE_YUV_2_PACKEDX_NOCLIP_C(type,alpha)\
>> - ? ? ? ?if ((Y1|Y2|U|V)&256)\
>> - ? ? ? ?{\
>> - ? ? ? ? ? ?if (Y1>255) ? Y1=255; \
>> - ? ? ? ? ? ?else if (Y1<0)Y1=0; ? \
>> - ? ? ? ? ? ?if (Y2>255) ? Y2=255; \
>> - ? ? ? ? ? ?else if (Y2<0)Y2=0; ? \
>> - ? ? ? ? ? ?if (U>255) ? ?U=255; ?\
>> - ? ? ? ? ? ?else if (U<0) U=0; ? ?\
>> - ? ? ? ? ? ?if (V>255) ? ?V=255; ?\
>> - ? ? ? ? ? ?else if (V<0) V=0; ? ?\
>> - ? ? ? ?}\
>> - ? ? ? ?if (alpha && ((A1|A2)&256)){\
>> - ? ? ? ? ? ?A1=av_clip_uint8(A1);\
>> - ? ? ? ? ? ?A2=av_clip_uint8(A2);\
>> - ? ? ? ?}
>> + ? ? ? ?Y1 = av_clip_uint8(Y1); \
>> + ? ? ? ?Y2 = av_clip_uint8(Y2); \
>> + ? ? ? ?U ?= av_clip_uint8(U ); \
>> + ? ? ? ?V ?= av_clip_uint8(V ); \
>> + ? ? ? ?A1 = av_clip_uint8(A1); \
>> + ? ? ? ?A2 = av_clip_uint8(A2); \
>
> This
>
>> - ? ? ? ? ? ?if ((u|v)&256){
>> - ? ? ? ? ? ? ? ?if (u<0) ? ? ? ?u=0;
>> - ? ? ? ? ? ? ? ?else if (u>255) u=255;
>> - ? ? ? ? ? ? ? ?if (v<0) ? ? ? ?v=0;
>> - ? ? ? ? ? ? ? ?else if (v>255) v=255;
>> - ? ? ? ? ? ?}
>> -
>> - ? ? ? ? ? ?uDest[i]= u;
>> - ? ? ? ? ? ?vDest[i]= v;
>> + ? ? ? ? ? ?uDest[i]= av_clip_uint8((chrSrc[i ? ? ? ]+64)>>7);
>> + ? ? ? ? ? ?vDest[i]= av_clip_uint8((chrSrc[i + VOFW]+64)>>7);
>
> And this need to be benchmarked (well, or at least have a look at the
> generated code.
> If clipping is very, very rare the original code might be faster.

Clipping seems to be very very rare. (I haven't come across any actually).
In yuv2yuv1(), using av_clip_uint8_t() inside if (&256) makes the code
go from ~38000 dezicycles to ~35000 dezicycles (weird since the
condition is never met). Using the attached synthetic benchmark code I
see that gcc generates cmov for the current code, but not for
av_clip_uint8(), so it makes the clip code a little slower when only 1
or 2 values are beyond the range.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clip2.c
Type: text/x-csrc
Size: 1250 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090816/27fbe925/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clip2_main.c
Type: text/x-csrc
Size: 1485 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090816/27fbe925/attachment-0001.c>



More information about the ffmpeg-devel mailing list