[FFmpeg-devel] [PATCH] Generic part of frame multithreading

Alexander Strange astrange
Tue Aug 26 09:23:06 CEST 2008


On Aug 20, 2008, at 2:26 PM, Michael Niedermayer wrote:

> On Tue, Aug 19, 2008 at 10:55:21PM -0400, Alexander Strange wrote:
>> This patch adds the new API and generic implementation for frame
>> multithreading. It doesn't cause any behavior changes, since it's  
>> just more
>> functions with no users yet.
>>
>> Some issues:
>
>> - No support for field pictures or slice+frame threading. Without  
>> these
>> there'll be some speed regressions for for some TV streams,  
>> especially for
>> PAFF. I'll try to get the first one done soon, as it's pretty easy.
>
> I assum this regression is limited to when threading is enabled? So  
> that
> there is no regression when its disabled?

Yes, it doesn't support multithreading field pictures, since they'd  
have two threads for the same AVFrame.
Decoding should be exactly the same without threads on. I did some  
experimenting and it seems like you can't easily tell which is the  
current field by reading the AVFrame - it looks like I'll have to add  
an ff_report_field_progress, pass MpegEncContext picture_structure to  
it, and keep two progress values for the same frame.

>> - The thread.h functions with "decode" in the name need to be  
>> renamed,
>> since they'll be used for encoding as well. I haven't thought of a  
>> better
>> name for "ff_report_predecode_done" that isn't too vague yet.
>
>> - https://roundup.mplayerhq.hu/roundup/ffmpeg/issue590
>
> Lets see if i understand this corectly
> * user apps never feed NULL into the decode function except at EOF
>  before as well as after your patch
> * user apps feed NULL into the decoder to flush out frames at EOF

Yes.

> * your decoder returns "no frames" while it actually still has frames
>  at EOF
> -> If i understand this correctly its a bug in the affected  
> decoder ...

In this case, it returns whatever the codec's decode() would have  
returned 3 frames ago (assuming 3 threads). Since h264 returns "no  
picture" if passed a top field, it returns that. I think "no picture"  
isn't really the same thing as "no frames", and if the threading  
dropped "no picture" results from the codec it might confuse clients.


>
>> - Some clients can't handle the extra decoder delay right, so I  
>> attached
>> another patch that disables it by default. I don't know whether  
>> it's a good
>> idea since we won't know when it's safe to to make it default again.
>
>> - I'm not sure the documentation is good enough.
>
> what documentation? ;)
>
> Beside that i wouldnt mind if you could describe how it works from a  
> high
> level point of view ...
> (could help me with the review as well as anyone who later wants to do
> something with it

Oops, I forgot to svn add the header....
Added with a lot more comments, and another document attached. I think  
I should explain more about report/await_decode_progress, since it  
doesn't mention it now.

>
>
> [...]
>> Index: libavcodec/w32thread.c
>> ===================================================================
>> --- libavcodec/w32thread.c	(revision 14856)
>> +++ libavcodec/w32thread.c	(working copy)
>> @@ -104,7 +104,10 @@
>>     ThreadContext *c;
>>     uint32_t threadid;
>>
>> +    if(s->thread_opaque) return 0;
>> +
>
> why is this needed?

avcodec_thread_init needs to see the codec capabilities field, so I  
call it from avcodec_open. This means it's called twice, so it needs  
to handle that. Maybe this could be removed when slice+frame threading  
is supported, but I've never understood why clients need to call  
avcodec_thread_init instead of just setting thread_count.

>>     s->thread_count= thread_count;
>> +    s->thread_algorithm= FF_THREAD_MULTISLICE;
>
> i suspect this should rather be a if( != ) error, at least when the  
> user
> explicitly requested frame threading

Done.

> [...]
>> @@ -847,7 +852,17 @@
>>     avctx->codec = codec;
>>     avctx->codec_id = codec->id;
>>     avctx->frame_number = 0;
>> -    if(avctx->codec->init){
>> +
>> +    if (ENABLE_THREADS) {
>> +        ret = avcodec_thread_init(avctx, avctx->thread_count);
>> +        if (ret < 0) {
>> +            av_freep(&avctx->priv_data);
>> +            avctx->codec= NULL;
>> +            goto end;
>> +        }
>> +    }
>> +
>
>> +    if(avctx->codec->init && !USE_FRAME_THREADING(avctx)){
>
> iam against the use of weird macros, exceptions to this are only
> when it improves readablity or speed. (it cannot improve speed in
> functions that are irrelevant to speed)
> And considering that this macro is not defined anywhere or a simple
> grep is not enough to find it, as is it significnatly reduces
> readablity as well.

This expands to (ENABLE_THREADS && avctx->active_thread_algorithm ==  
FF_THREAD_MULTIFRAME). The first part isn't needed, but leads to a  
very small optimization since we can have any codec threading code  
under if (0) when threads aren't compiled in. This is only about 3%  
size reduction for h264.o, so it's not really important - the only  
real reason is because the real code is a bit long. I think the name  
of this macro is pretty clear, but I can remove it if you want.

> [...]
>> @@ -980,6 +998,7 @@
>>
>> int avcodec_close(AVCodecContext *avctx)
>> {
>> +    int threaded = USE_FRAME_THREADING(avctx);
>>     entangled_thread_counter++;
>>     if(entangled_thread_counter != 1){
>>         av_log(avctx, AV_LOG_ERROR, "insufficient thread locking  
>> around avcodec_open/close()\n");
>> @@ -989,7 +1008,7 @@
>>
>>     if (ENABLE_THREADS && avctx->thread_opaque)
>>         avcodec_thread_free(avctx);
>> -    if (avctx->codec->close)
>> +    if (avctx->codec->close && !threaded)
>>         avctx->codec->close(avctx);
>>     avcodec_default_free_buffers(avctx);
>>     av_freep(&avctx->priv_data);
>
> why?

codec close has to be called inside frame thread close. If it's called  
before, another thread might be accessing it; if it's called after, it  
might call some threading support function after the threading context  
has been freed. This is avoidable, but I think the current way is  
simplest.

> and the the threaded var is redundant

It was needed there (not obvious because of the missing header); I  
simplified it.

>
>> @@ -1251,7 +1270,9 @@
>>
>> void avcodec_flush_buffers(AVCodecContext *avctx)
>> {
>> -    if(avctx->codec->flush)
>> +    if(USE_FRAME_THREADING(avctx))
>> +        ff_frame_thread_flush(avctx);
>> +    else if(avctx->codec->flush)
>>         avctx->codec->flush(avctx);
>> }
>
> why not
> if(frame_thraading)
>    ff_frame_thread_flush(avctx);
> if(avctx->codec->flush)
>    avctx->codec->flush(avctx);
> ?
>
> having ff_frame_thread_flush call avctx->codec->flush behind is not  
> good for
> sake of readablity

Same reason as above; doing it that way will work, but because of how  
release_buffer works with threading, the frames won't end up actually  
being released for a while, which is surprising. Now that I look at  
it, maybe I should comment that.

>
>
>
> [...]
>
>> @@ -767,7 +771,21 @@
>>      * - encoding: Set by user.\
>>      * - decoding: Set by libavcodec.\
>>      */\
>> -    int8_t *ref_index[2];
>> +    int8_t *ref_index[2];\
>> +\
>> +    /**\
>> +     * context owning the internal buffers\
>> +     * - encoding: Set by libavcodec.\
>> +     * - decoding: Set by libavcodec.\
>> +     */\
>> +    struct AVCodecContext *avctx;\
>
> id say this needs some more extensive documentation and likely a  
> better
> variable name as well
> when there are several AVCodecContext then avctx just isnt a good name
> anymore

Renamed + better comment.

>> +\
>> +    /**\
>> +     * Can be used by multithreading to track frames\
>> +     * - encoding: Set by libavcodec.\
>> +     * - decoding: Set by libavcodec.\
>> +     */\
>> +    void *thread_opaque;
>>
>> #define FF_QSCALE_TYPE_MPEG1 0
>> #define FF_QSCALE_TYPE_MPEG2 1
>
>> @@ -923,7 +941,7 @@
>>      * If non NULL, 'draw_horiz_band' is called by the libavcodec
>>      * decoder to draw a horizontal band. It improves cache usage.  
>> Not
>>      * all codecs can do that. You must check the codec capabilities
>> -     * beforehand.
>> +     * beforehand. May be called by different threads at the same  
>> time.
>
> ... when frame threading is enabled

draw_horiz_band is already called from seperate threads if slice  
threading is used, so this comment is unrelated to this patch. I'll  
submit it seperately.
Added to the other comments.

>
>>      * - encoding: unused
>>      * - decoding: Set by user.
>>      * @param height the height of the slice
>> @@ -951,7 +969,7 @@
>>      * Samples per packet, initialized when calling 'init'.
>>      */
>>     int frame_size;
>> -    int frame_number;   ///< audio or video frame number
>> +    int frame_number;   ///< Number of audio or video frames  
>> returned so far
>>     int real_pict_num;  ///< Returns the real picture number of  
>> previous encoded frame.
>>
>>     /**
>
>> @@ -1166,6 +1184,7 @@
>>      * If pic.reference is set then the frame will be read later by  
>> libavcodec.
>>      * avcodec_align_dimensions() should be used to find the  
>> required width and
>>      * height, as they normally need to be rounded up to the next  
>> multiple of 16.
>> +     * May be called by different threads, but not more than one  
>> at the same time.
>>      * - encoding: unused
>>      * - decoding: Set by libavcodec., user can override.
>>      */
>> @@ -1175,6 +1194,7 @@
>>      * Called to release buffers which were allocated with  
>> get_buffer.
>>      * A released buffer can be reused in get_buffer().
>>      * pic.data[*] must be set to NULL.
>> +     * May be called by different threads, but not more than one  
>> at the same time.
>>      * - encoding: unused
>>      * - decoding: Set by libavcodec., user can override.
>>      */
>
> did i miss the similar change to reget_buffer() ?

It seems like the purpose of reget_buffer() is to reuse the buffer  
with the data from the last frame, for codecs that draw on top of it?  
If that's the case, it's not compatible with frame threading.

>
>> @@ -2230,6 +2250,25 @@
>>      * - decoding: Set by user.
>>      */
>>     float drc_scale;
>> +
>> +    /**
>> +     * Whether this is a copy of the context which had init()  
>> called on it.
>> +     * This is used by multithreading - shared tables and picture  
>> pointers
>> +     * should be freed from the original context only.
>> +     * - encoding: Set by libavcodec.
>> +     * - decoding: Set by libavcodec.
>> +     */
>> +    int is_copy;
>> +
>
>> +    /**
>> +     * Which multithreading method to use, for codecs that support  
>> more than one.
>> +     * - encoding: Set by user, otherwise the default is used.
>> +     * - decoding: Set by user, otherwise the default is used.
>> +     */
>> +    int thread_algorithm;
>
>> +#define FF_THREAD_AUTO       0 //< Use the default. Will be  
>> changed by libavcodec after init.
>
> the changed is not so great ...
> havng lavc change variables the user set is something exceptional not
> something to handle "defaults" IMO
> besides DEFAULT is a better term tha AUTO i think

Changed to DEFAULT and split into two variables - having it as one  
variable with strange changes was causing problems anyway.

>> +#define FF_THREAD_MULTIFRAME 1 //< Decode more than one frame at  
>> once
>> +#define FF_THREAD_MULTISLICE 2 //< Decode more than one part of a  
>> single frame at once
>
> with
> #define FF_THREAD_MULTIFRAME 1
> #define FF_THREAD_MULTISLICE 2
> #define FF_THREAD_AUTO       3
>
> a simple thread_algorithm & FF_THREAD_MULTIFRAME could be used
>
>
>> } AVCodecContext;
>>
>> /**
>> @@ -2271,6 +2310,14 @@
>>     const char *long_name;
>>     const int *supported_samplerates;       ///< array of supported  
>> audio samplerates, or NULL if unknown, array is terminated by 0
>>     const enum SampleFormat *sample_fmts;   ///< array of supported  
>> sample formats, or NULL if unknown, array is terminated by -1
>> +
>> +    /**
>> +     * @defgroup multithreading Multithreading support functions.
>> +     * @{
>> +     */
>> +    int (*init_copy)(AVCodecContext *);     ///< called after  
>> copying initially, re-allocate all writable tables
>> +    int (*update_context)(AVCodecContext *, AVCodecContext  
>> *from); ///< copy everything needed from the last thread before  
>> every new frame
>> +    /** @} */
>> } AVCodec;
>
> I do think they need more precisse and elaborate description
> besides something tells me a
> clone(AVCodecContext *new, AVCodecContext *old)
> would be better than a memcpy() ; init_copy(dst)

Commented further. init_copy() is actually not really useful for most  
codecs, since they usually allocate everything after getting the first  
frame. For MPEG codecs the memcpy could be removed and just run in the  
first update_context() for each thread (actually, it already is), but  
for simpler codecs I think it would be an unnecessary special case.  
One memcpy per thread at codec init costs practically nothing.

> [...]
>> +typedef struct PerThreadContext {
>> +    pthread_t thread;
>> +    pthread_cond_t input_cond, progress_cond, output_cond;
>> +    pthread_mutex_t mutex, progress_mutex, buffer_mutex;
>> +
>
>> +    AVCodecContext *context;
>
> and here i think avctx would be better than context
>
>> +
>> +    uint8_t *buf;
>> +    int buf_size, allocated_buf_size;
>> +
>> +    AVFrame picture;
>> +    int got_picture, output_res;
>> +
>> +    struct FrameThreadContext *parent;
>> +
>> +    int decode_progress;
>> +
>> +    enum {
>> +        STATE_INPUT_READY,
>> +        STATE_PREDECODING,
>> +        STATE_DECODING
>> +    } state;
>> +
>> +    AVFrame released_buffers[MAX_DELAYED_RELEASED_BUFFERS];
>> +    int nb_released_buffers;
>> +} PerThreadContext;
>> +
>> +typedef struct FrameThreadContext {
>> +    PerThreadContext *threads;
>> +    PerThreadContext *prev_thread;
>> +
>> +    int next_available, next_ready;
>> +    int delaying;
>> +
>> +    int die;
>> +} FrameThreadContext;
>
> please document all structure members and all structures

done, and some renamed

> [...]
>> @@ -105,6 +146,9 @@
>>     ThreadContext *c= avctx->thread_opaque;
>>     int dummy_ret;
>>
>> +    if (!USE_AVCODEC_EXECUTE(avctx) || avctx->thread_count <= 1)
>> +        return avcodec_default_execute(avctx, func, arg, ret,  
>> job_count);
>> +
>
> another one of these opaque macors :/
> if only i would remember the address of the git repo to find out if it
> maybe is defined there ...
>
>
> [...]
>> +
>> +static void handle_delayed_releases(PerThreadContext *p)
>> +{
>> +    while (p->nb_released_buffers > 0)
>> +        ff_release_buffer(&p->released_buffers[--p- 
>> >nb_released_buffers]);
>> +}
>> +
>
> plaese document all functions

done

>
>> +static int submit_frame(PerThreadContext * volatile p, const  
>> uint8_t *buf, int buf_size)
>> +{
>> +    FrameThreadContext *fctx = p->parent;
>> +    PerThreadContext *prev_thread = fctx->prev_thread;
>> +    AVCodec *codec = p->context->codec;
>> +    int err = 0;
>> +
>> +    if (!buf_size && !(codec->capabilities & CODEC_CAP_DELAY))  
>> return 0;
>> +
>> +    pthread_mutex_lock(&p->mutex);
>> +    if (prev_thread) err = update_context_from_copy(p->context,  
>> prev_thread->context, 0);
>> +    if (err) return err;
>> +
>> +    p->buf = av_fast_realloc(p->buf, &p->allocated_buf_size,  
>> buf_size);
>> +    memcpy(p->buf, buf, buf_size);
>
> copying every frame does not seem particularly reasonable to me,  
> considering
> that it should be easy for most user apps to keep a few buffers  
> around or
> if they cant they can still copy ...
>
> This also reminds me that the lavc API should be changed to use  
> AVPackets
> as input to the decode functions. This would allow av_packet_dup()  
> to be
> used which would just copy packets when needed ...

Well, with the current API I can't guarantee anything about buffers so  
I did it the safe way. CODEC_FLAG_INPUT_PRESERVED exists but doesn't  
cover quite enough, and I don't see many users for it (only vlc) in  
google code search. I can fix it if the API is better or if more  
clients can guarantee not freeing them.

>
>
>
> [...]
>> +int ff_decode_frame_threaded(AVCodecContext *avctx,
>> +                             void *data, int *data_size,
>> +                             const uint8_t *buf, int buf_size)
>> +{
>> +    FrameThreadContext *fctx;
>> +    PerThreadContext * volatile p;
>> +    int thread_count = avctx->thread_count, err = 0;
>> +
>> +    if (!avctx->thread_opaque) ff_frame_thread_init(avctx);
>> +
>> +    fctx = avctx->thread_opaque;
>> +    p = &fctx->threads[fctx->next_available];
>> +    update_context_from_user(p->context, avctx);
>> +    err = submit_frame(p, buf, buf_size);
>> +
>> +    if (err) return err;
>> +
>> +    fctx->next_available++;
>> +
>> +    if (fctx->delaying) {
>> +        if (fctx->next_available >= (thread_count-1)) fctx- 
>> >delaying = 0;
>> +
>> +        *data_size=0;
>> +        return 0;
>> +    }
>> +
>> +    p = &fctx->threads[fctx->next_ready++];
>> +
>> +    pthread_mutex_lock(&p->progress_mutex);
>
>> +    while (p->state != STATE_INPUT_READY) pthread_cond_wait(&p- 
>> >output_cond, &p->progress_mutex);
>
> too much on one line ...

Changed (and similar lines).


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1-frame-threading.diff
Type: text/x-diff
Size: 33257 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080826/c575f5ea/attachment.diff>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: multithreading.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080826/c575f5ea/attachment.txt>



More information about the ffmpeg-devel mailing list