[FFmpeg-devel] [PATCH 3/4] libavcodec/qsvdec.c: The ff_qsv_decode() now guarantees the consumption of whole packet.

Thu Jul 23 17:59:54 CEST 2015

Hello Michael,

Thursday, July 23, 2015, 6:29:13 PM, you wrote:

>> +        } else {
>> +            bs.Data       = avpkt->data;
>> +            bs.DataLength = avpkt->size;
>> +        }

MN> Does this mean that each packet will be memcpy-ed ?
MN> this would slow things down
Exactly not. Copying uses only for quite rare case when decoder does
not consume several bytes at tail packet tail. Then these bytes copied
into fifo. Next time these bytes usually are totally consumed with new
packet and only reference to packets uses in general decoding.
For several test streams I'm observing that 2-3 bytes copying appear
only at begin of decoding, for first 5-10 frames.
>> -    do {
>> +    while (1) {
>>          ret = get_surface(avctx, q, &insurf);
>>          if (ret < 0)
>>              return ret;
>> +        do {
>> +            ret = MFXVideoDECODE_DecodeFrameAsync(q->session, avpkt->size ? &bs : NULL,
>> +                                                  insurf, &outsurf, &sync);
>> +            if (ret != MFX_WRN_DEVICE_BUSY)
>> +                break;
>> -        ret = MFXVideoDECODE_DecodeFrameAsync(q->session, avpkt->size ? &bs : NULL,
>> -                                              insurf, &outsurf, &sync);
>> -        if (ret == MFX_WRN_DEVICE_BUSY)

>> -            av_usleep(1);
>> +            av_usleep(500);

MN> looks like a unrelated change,
MN> should be in a seperate patch with explanation
These lines (new decoding loop) are absolutely necessary to consume all
available data in packets. Else data not consumes and input_fifo uses all
the time with wrong frames sequence output (duplicates and drops).

The reason of replacing of av_usleep(1) to av_usleep(500) is
following: I believe it is bad idea to ask hardware 1000000 times per
second if it is busy, especially in main decoding loop.
Peak qsv decoding performance is about 2000 fps when destination is GPU memory.
For system memory the performance is less then 1000fps usually. So delay in 500microseconds (i.e. 0.5 ms)
is much more appropriate.
I can move it to separate patch but since main decoding loop totally
re-designed possible it can be applied as?

-- 
Best regards,
 Ivan                            mailto:ivan.uskov at nablet.com