[FFmpeg-devel] [PATCH] improve yuv422p to rgb in libswscale

Baptiste Coudurier baptiste.coudurier
Wed Dec 1 08:27:54 CET 2010


On 11/30/10 11:19 PM, Jason Garrett-Glaser wrote:
> On Tue, Nov 30, 2010 at 7:34 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Tue, Nov 30, 2010 at 07:26:42PM -0800, Baptiste Coudurier wrote:
>>> On 11/30/2010 07:13 PM, Michael Niedermayer wrote:
>>>> On Tue, Nov 30, 2010 at 07:05:05PM -0800, Baptiste Coudurier wrote:
>>>>> On 11/30/2010 06:53 PM, Michael Niedermayer wrote:
>>>>>> On Tue, Nov 30, 2010 at 06:29:32PM -0800, Baptiste Coudurier wrote:
>>>>>>> On 11/30/2010 06:15 PM, Michael Niedermayer wrote:
>>>>>>>> On Tue, Nov 30, 2010 at 05:20:42PM -0800, Baptiste Coudurier wrote:
>>>>>>>>> On 11/30/2010 05:13 PM, Michael Niedermayer wrote:
>>>>>>>>>> On Tue, Nov 30, 2010 at 04:07:20AM -0800, Baptiste Coudurier wrote:
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>> $subject, use full vertical data when convert 422p, improve quality a lot.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Baptiste COUDURIER
>>>>>>>>>>> Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
>>>>>>>>>>> FFmpeg maintainer                                  http://www.ffmpeg.org
>>>>>>>>>>
>>>>>>>>>>>      x86/yuv2rgb_template.c |   25 ++++-------
>>>>>>>>>>>      yuv2rgb.c              |  109 ++++++-------------------------------------------
>>>>>>>>>>>      2 files changed, 26 insertions(+), 108 deletions(-)
>>>>>>>>>>> 16f384c9b114c76572a539511c42267ce2942c67  yuv422p_to_rgb.patch
>>>>>>>>>>
>>>>>>>>>> this looks like it would make 420p->rgb quite a bit slower.
>>>>>>>>>
>>>>>>>>> Do you think changing>>1 to>>vshift would make is quite a bit slower ?
>>>>>>>>
>>>>>>>> no but that stuff:
>>>>>>>> @@ -152,134 +144,102 @@
>>>>>>>>     YUV2RGBFUNC(yuv2rgb_c_48, uint8_t, 0)
>>>>>>>>         LOADCHROMA(0);
>>>>>>>>         PUTRGB48(dst_1,py_1,0);
>>>>>>>> -    PUTRGB48(dst_2,py_2,0);
>>>>>>>>
>>>>>>>>         LOADCHROMA(1);
>>>>>>>> -    PUTRGB48(dst_2,py_2,1);
>>>>>>>>         PUTRGB48(dst_1,py_1,1);
>>>>>>>>
>>>>>>>>         LOADCHROMA(2);
>>>>>>>>         PUTRGB48(dst_1,py_1,2);
>>>>>>>> -    PUTRGB48(dst_2,py_2,2);
>>>>>>>>
>>>>>>>>         LOADCHROMA(3);
>>>>>>>> -    PUTRGB48(dst_2,py_2,3);
>>>>>>>>         PUTRGB48(dst_1,py_1,3);
>>>>>>>>     ENDYUV2RGBLINE(48)
>>>>>>>>         LOADCHROMA(0);
>>>>>>>>         PUTRGB48(dst_1,py_1,0);
>>>>>>>> -    PUTRGB48(dst_2,py_2,0);
>>>>>>>>
>>>>>>>>         LOADCHROMA(1);
>>>>>>>> -    PUTRGB48(dst_2,py_2,1);
>>>>>>>>         PUTRGB48(dst_1,py_1,1);
>>>>>>>>     ENDYUV2RGBFUNC()
>>>>>>>>
>>>>>>>> and then running the code twice
>>>>>>>
>>>>>>> This is the C code, the mmx routine for yuv420 to rgb24 is already present.
>>>>>>>
>>>>>>> I don't understand running the code twice, can you please clarify ?
>>>>>>
>>>>>> yes, iam seeing this:
>>>>>> -    for (y=0; y<srcSliceH; y+=2) {\
>>>>>> +    for (y=0; y<srcSliceH; y++) {\
>>>>>>
>>>>>> and thus iam thinking it runs the code twice after the patch,
>>>>>> do i miss something?
>>>>>
>>>>> The old code was processing 2 lines at once, see py_1 and py_2, that's
>>>>> why a lot of code is removed in the macros.
>>>>
>>>> yes but i think its slower after the patch
>>>
>>> I guess it is a bit slower because of LOADCHROMA.
>>
>> yes
>>
>>
>>>
>>>>>
>>>>> Do you agree that the mmx change looks trivial ?
>>>>
>>>> yes mmx looks good
>>>
>>> Well, problem now is that mmx routine will have different behaviour, and
>>> will be better.
>>
>> they are already different
>>
>>
>>> I mean, the C routine is not supposed to be that fast is it ?
>>
>> our code is supposed to be fast
>>
>>
>>>
>>> Also adding a check for some user parameters would require adding tests
>>> in a lot of places I fear.
>>
>> just a seperate video filter that plays a bit with linesize and turns
>> 422 in 420 as a sideeffect
>>
> 
> Isn't all of this what FULL_CHROMA_INP is for or whatnot?
> 

Yes, it should be under FULL_CRHOMA_INP, just plugging the check in
function macros would be complicated, or code would have be duplicated
for yuv422p. Btw, this is the unscaled conversion.

-- 
Baptiste COUDURIER
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
FFmpeg maintainer                                  http://www.ffmpeg.org



More information about the ffmpeg-devel mailing list