[FFmpeg-devel] [PATCH] avcodec/magicyuv: add SIMD for median of 10bits

James Almer jamrial at gmail.com
Wed Dec 28 03:19:05 EET 2016


On 12/25/2016 3:14 PM, James Almer wrote:
> On 12/25/2016 1:11 PM, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Sat, Dec 24, 2016 at 9:29 AM, Paul B Mahol <onemda at gmail.com> wrote:
>>
>>> On 12/24/16, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Sat, Dec 24, 2016 at 6:09 AM, Paul B Mahol <onemda at gmail.com> wrote:
>>>>
>>>>> On 12/24/16, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Fri, Dec 23, 2016 at 6:18 PM, James Almer <jamrial at gmail.com>
>>> wrote:
>>>>>>
>>>>>>> On 12/23/2016 8:00 PM, Ronald S. Bultje wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Fri, Dec 23, 2016 at 12:32 PM, Paul B Mahol <onemda at gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> diff --git a/libavcodec/lossless_videodsp.h
>>> b/libavcodec/lossless_
>>>>>>>>> videodsp.h
>>>>>>>>>
>>>>>>>> [..]
>>>>>>>>
>>>>>>>>> @@ -32,6 +32,7 @@ typedef struct LLVidDSPContext {
>>>>>>>>>
>>>>>>>> [..]
>>>>>>>>
>>>>>>>>> +    void (*add_magy_median_pred_int16)(uint16_t *dst, const
>>>>> uint16_t
>>>>>>>>> *top, const uint16_t *diff, unsigned mask, int w, int *left, int
>>>>>>> *left_top);
>>>>>>>>>
>>>>>>>>
>>>>>>>> That seems wrong. Why would you add a magicuv-specific function to
>>>>>>>> losslessdsp-context which is intended for functions shared between
>>>>> many
>>>>>>>> (not just one) lossless codecs? You probably want a new dsp for
>>>>> magicyuv
>>>>>>>> specifically.
>>>>>>>>
>>>>>>>> I know this is tedious, but we're very specifically trying to
>>> prevent
>>>>>>>> dsputil from ever happening again.
>>>>>>>>
>>>>>>>> Ronald
>>>>>>>
>>>>>>> Some functions in this dsp are used only by huffyuv. Only one is used
>>>>>>> by
>>>>>>> both huffyuv and magicyuv.
>>>>>>> To properly apply what you mention, it would need to be split in two,
>>>>>>> huffyuvdsp and lldsp, then this new function added to a new dsp
>>> called
>>>>>>> magicyuvdsp.
>>>>>>
>>>>>>
>>>>>> That would be even better, yes.
>>>>>
>>>>> What about yasm code?
>>>>>
>>>>> I wanted that to be commented.
>>>>
>>>>
>>>> It's like dithering, it uses the immediately adjacent pixel in the next
>>>> loop iteration, can you really simd this effectively?
>>>
>>> Apparently, and someone is making money from it.
>>
>>
>> The parallelizable portion of it is the top-topleft, and you seem to do
>> that already. Other than that, I don't see much to be done. You can
>> probably use some mmxext instructions like pshufw to make life easier, but
>> I think you'll always be limited by the inherent limitation.
>>
>> Ronald
> 
> He can turn the movq + psrlq + psllq + por at the end of the loop into two
> movq + palignr for an ssse3 version of the function (still using mmx regs),
> but not much more than that i guess.
> And even that will probably not make a noticeable difference, assuming it's
> actually faster.

Looks like it's about 3% faster.



More information about the ffmpeg-devel mailing list