[FFmpeg-devel] [PATCH 1/6] Frame-based multithreading framework using pthreads

Alexander Strange astrange
Fri Jan 21 12:30:45 CET 2011


On Jan 21, 2011, at 5:29 AM, Alexander Strange wrote:

> 
> On Mon, Dec 27, 2010 at 10:12 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Mon, Nov 15, 2010 at 08:37:01AM -0500, Alexander Strange wrote:
>>> See doc/multithreading.txt for details on use in codecs.
>>> ---
>>> doc/APIchanges         |    5 +
>>> doc/multithreading.txt |   63 +++++
>>> libavcodec/avcodec.h   |   80 ++++++-
>>> libavcodec/options.c   |    4 +
>>> libavcodec/pthread.c   |  656 +++++++++++++++++++++++++++++++++++++++++++++++-
>>> libavcodec/thread.h    |   98 +++++++
>>> libavcodec/utils.c     |   64 +++++-
>>> libavcodec/w32thread.c |    6 +
>>> libavformat/utils.c    |    5 +
>>> libavutil/internal.h   |   11 +
>>> 10 files changed, 979 insertions(+), 13 deletions(-)
>>> create mode 100644 doc/multithreading.txt
>>> create mode 100644 libavcodec/thread.h
>>> 
>>> diff --git a/doc/APIchanges b/doc/APIchanges
>>> index b6806f8..c8892d9 100644
>>> --- a/doc/APIchanges
>>> +++ b/doc/APIchanges
>>> @@ -13,6 +13,11 @@ libavutil:   2009-03-08
>>> 
>>> API changes, most recent first:
>>> 
>>> +2010-11-XX - rX - lavc 52.xx.0 - threading API
>>> +  Add CODEC_CAP_FRAME_THREADS with new restrictions on get_buffer()/
>>> +  release_buffer()/draw_horiz_band() callbacks for appropriate codecs.
>>> +  Add thread_type and active_thread_type fields to AVCodecContext.
>>> +
>>> 2010-11-13 - r25745 - lavc 52.95.0 - AVCodecContext
>>>   Add AVCodecContext.subtitle_header and AVCodecContext.subtitle_header_size
>>>   fields.
>>> diff --git a/doc/multithreading.txt b/doc/multithreading.txt
>>> new file mode 100644
>>> index 0000000..65bdfb1
>>> --- /dev/null
>>> +++ b/doc/multithreading.txt
>>> @@ -0,0 +1,63 @@
>>> +FFmpeg multithreading methods
>>> +==============================================
>>> +
>>> +FFmpeg provides two methods for multithreading codecs.
>>> +
>>> +Slice threading decodes multiple parts of a frame at the same time, using
>>> +AVCodecContext execute() and execute2().
>>> +
>>> +Frame threading decodes multiple frames at the same time.
>>> +It accepts N future frames and delays decoded pictures by N-1 frames.
>>> +The later frames are decoded in separate threads while the user is
>>> +displaying the current one.
>>> +
>>> +Restrictions on clients
>>> +==============================================
>>> +
>>> +Slice threading -
>>> +* The client's draw_horiz_band() must be thread-safe according to the comment
>>> +  in avcodec.h.
>>> +
>>> +Frame threading -
>>> +* Restrictions with slice threading also apply.
>>> +* The client's get_buffer() and release_buffer() must be thread-safe as well.
>>> +* There is one frame of delay added for every thread beyond the first one.
>>> +  Clients using dts must account for the delay; pts sent through reordered_opaque
>>> +  will work as usual.
>>> +
>>> +Restrictions on codec implementations
>>> +==============================================
>>> +
>>> +Slice threading -
>>> + None except that there must be something worth executing in parallel.
>>> +
>>> +Frame threading -
>>> +* Codecs can only accept entire pictures per packet.
>>> +* Codecs similar to ffv1, whose streams don't reset across frames,
>>> +  will not work because their bitstreams cannot be decoded in parallel.
>>> +
>> 
>>> +* The contents of buffers must not be read before ff_thread_await_progress()
>>> +  has been called on them. reget_buffer() and buffer age optimizations no longer work.
>> 
>> why does the buffer age optimization not work anymore?
>> if a MB is skiped in 100 frames and you have 10 frame decoding threads then
>> the optim should allow some skiping still. And actually this should be easy
>> And it does happen in reality (the black bars on top and bottom of films)
> 
> The problem is that each decoding thread has its own copy of the AVCodecContext, and when get_buffer() is called it uses that copy. So they end up with their own internal_buffer, and the ages don't make sense when the AVFrames travel across threads.
> 
> This can be fixed by sharing the same internal_buffer between threads. But I tried it and everything broke; I spent too much time looking at it so I've punted it into a branch:
> 
> http://gitorious.org/~astrange/ffmpeg/ffmpeg-mt/commit/e5622e99c9608e612e9f386c1711d76ce08268e1
> 
> The nice thing is that mt adds so much speed (clock time) than skipping that I found it didn't really matter on mpeg4 I tried, so I think this can be put off. VP3 and H264 don't use it at all.
> 
> The new patch works with mplayer dr + custom get_buffer() by default, as it calls get_buffer() on the client's thread by default.
> Setting thread_safe_callbacks restores the old behavior and ffplay uses this.
> !thread_safe_callbacks is slower, and dr is not any faster than !dr for me, so it can probably improve a little by disabling dr.
> 
>> [...]
>>> @@ -1203,7 +1219,8 @@ typedef struct AVCodecContext {
>>>      * If non NULL, 'draw_horiz_band' is called by the libavcodec
>>>      * decoder to draw a horizontal band. It improves cache usage. Not
>>>      * all codecs can do that. You must check the codec capabilities
>>> -     * beforehand.
>>> +     * beforehand. May be called by different threads at the same time,
>>> +     * so implementations must be reentrant.
>> 
>> thread safe or reentrant ?
>> for many video filters, slice N has to be done before N+1 starts, and they have
>> to be called in order even if you mix calls from several frames.
> 
> Unfortunately the patch doesn't enforce this. I looked at it and I believe this was already not enforced (just decode mpeg2 with threads on and it will be called out of order), but nothing ever seemed to use draw_horiz_band() with 4:2:0 video, so nobody noticed.
> 
> Wrote a longer comment about what you can expect. (which is that anything at all can happen)
> 
> Note that playing 4:2:2 Theora video with this patch series will break in mplayer because of this, but it's better to turn it off in mplayer than to fix it in here IMO.
> 
>>> 
>>>      * The function is also used by hardware acceleration APIs.
>>>      * It is called at least once during frame decoding to pass
>>>      * the data needed for hardware render.
>>> @@ -1457,6 +1474,9 @@ typedef struct AVCodecContext {
>>>      * if CODEC_CAP_DR1 is not set then get_buffer() must call
>>>      * avcodec_default_get_buffer() instead of providing buffers allocated by
>>>      * some other means.
>>> +     * May be called from a different thread if thread_type==FF_THREAD_FRAME
>>> +     * is set, but not by more than one thread at once, so does not need to be
>>> +     * reentrant.
>> 
>> isnt it thread_type&FF_THREAD_FRAME ?
> 
> Yes.
> 
>> and it can be called from more than one thread at once i think
> 
> There's a mutex around it (buffer_mutex) so it can't be.
> 
>> [...]
>>> @@ -2811,6 +2863,26 @@ typedef struct AVCodec {
>>>     const int64_t *channel_layouts;         ///< array of support channel layouts, or NULL if unknown. array is terminated by 0
>>>     uint8_t max_lowres;                     ///< maximum value for lowres supported by the decoder
>>>     AVClass *priv_class;                    ///< AVClass for the private context
>>> +
>>> +    /**
>>> +     * @defgroup framethreading Frame-level threading support functions.
>>> +     * @{
>>> +     */
>>> +    /**
>>> +     * If defined, called on thread contexts when they are created.
>>> +     * If the codec allocates writable tables in init(), re-allocate them here.
>>> +     * priv_data will be set to a copy of the original.
>>> +     */
>>> +    int (*init_thread_copy)(AVCodecContext *);
>> 
>>> +    /**
>>> +     * Copy necessary context variables from a previous thread context to the current one.
>>> +     * If not defined, the next thread will start automatically; otherwise, the codec
>>> +     * must call ff_thread_finish_setup().
>>> +     *
>>> +     * dst and src will (rarely) point to the same context, in which case memcpy should be skipped.
>>> +     */
>>> +    int (*update_thread_context)(AVCodecContext *dst, AVCodecContext *src);
>> 
>> src could be const
> 
> Done.
> 
>> 
>> 
>> [...]
>>> +int ff_thread_decode_frame(AVCodecContext *avctx,
>>> +                           AVFrame *picture, int *got_picture_ptr,
>>> +                           AVPacket *avpkt)
>>> +{
>>> +    FrameThreadContext *fctx = avctx->thread_opaque;
>>> +    int finished = fctx->next_finished;
>>> +    PerThreadContext *p;
>>> +    int err;
>>> +
>>> +    /*
>>> +     * Submit a frame to the next decoding thread.
>>> +     */
>>> +
>>> +    p = &fctx->threads[fctx->next_decoding];
>>> +    update_context_from_user(p->avctx, avctx);
>>> +    err = submit_frame(p, avpkt);
>>> +    if (err) return err;
>>> +
>>> +    fctx->next_decoding++;
>>> +
>>> +    /*
>>> +     * If we're still receiving the initial frames, don't return a picture.
>>> +     */
>>> +
>>> +    if (fctx->delaying && avpkt->size) {
>>> +        if (fctx->next_decoding >= (avctx->thread_count-1)) fctx->delaying = 0;
>>> +
>>> +        *got_picture_ptr=0;
>>> +        return 0;
>>> +    }
>>> +
>>> +    /*
>>> +     * Return the next available picture from the oldest thread.
>>> +     * If we're at the end of the stream, then we have to skip threads that
>>> +     * didn't output a picture, because we don't want to accidentally signal
>>> +     * EOF (avpkt->size == 0 && *got_picture_ptr == 0).
>>> +     */
>>> +
>>> +    do {
>>> +        p = &fctx->threads[finished++];
>>> +
>>> +        if (p->state != STATE_INPUT_READY) {
>>> +            pthread_mutex_lock(&p->progress_mutex);
>>> +            while (p->state != STATE_INPUT_READY)
>>> +                pthread_cond_wait(&p->output_cond, &p->progress_mutex);
>>> +            pthread_mutex_unlock(&p->progress_mutex);
>>> +        }
>>> +
>>> +        *picture = p->picture;
>>> +        *got_picture_ptr = p->got_picture;
>>> +
>>> +        avcodec_get_frame_defaults(&p->picture);
>>> +        p->got_picture = 0;
>>> +
>>> +        if (finished >= avctx->thread_count) finished = 0;
>>> +    } while (!avpkt->size && !*got_picture_ptr && finished != fctx->next_finished);
>>> +
>>> +    update_context_from_thread(avctx, p->avctx, 1);
>>> +
>>> +    if (fctx->next_decoding >= avctx->thread_count) fctx->next_decoding = 0;
>>> +
>>> +    fctx->next_finished = finished;
>>> +
>>> +    return p->result;
>>> +}
>> 
>> I think this design has some issues.
>> 1. (minor) I dont see why you dont return frames when they are ready at the
>>  begin
>> 2. Frames can have variable duration, if the picture doesnt change. A timespan
>>  of 2 seconds without changes is quite realistic, 32 frames make one minute
>>  delay, this will kill AV sync with some players and could also underflow
>>  video buffers if they are time based (iam thinking of RT*P here).
>> Thus i think frames should be returned when they are ready
> 
> 
> 
>> 3. This code blocks when no frame is available to be returned, thats ok but
>>  we should try to also support a non blocking EAGAIN returning form,
>>  its usefull for players that try to also do other things in the decoding
>>  thread and dont want to be blocked for half a second in unlucky cases
> 
> 
> 
> 
>> [...]
>>> +    /*
>>> +     * Buffer age is difficult to keep track of between
>>> +     * multiple threads, and the optimizations it allows
>>> +     * are not worth the effort. It is disabled for now.
>>> +     */
>>> +    f->age = INT_MAX;
>> 
>> I see nothing hard in relation to age and threads.
>> Please elaborate what is difficult
> 
> Commented up there.
> 
>> [...]
>>> +/**
>>> + * Set the threading algorithms used.
>>> + *
>>> + * Threading requires more than one thread.
>>> + * Frame threading requires entire frames to be passed to the codec,
>>> + * and is incompatible with low_delay.
>>> + *
>>> + * @param avctx The context.
>>> + */
>>> +static void validate_thread_parameters(AVCodecContext *avctx)
>>> +{
>>> +    int frame_threading_supported = (avctx->codec->capabilities & CODEC_CAP_FRAME_THREADS)
>>> +                                && !(avctx->flags & CODEC_FLAG_TRUNCATED)
>>> +                                && !(avctx->flags & CODEC_FLAG_LOW_DELAY)
>>> +                                && !(avctx->flags2 & CODEC_FLAG2_CHUNKS);
>>> +    if (avctx->thread_count == 1)
>>> +        avctx->active_thread_type = 0;
>>> +    else if (frame_threading_supported && (avctx->thread_type & FF_THREAD_FRAME))
>>> +        avctx->active_thread_type = FF_THREAD_FRAME;
>>> +    else
>>> +        avctx->active_thread_type = FF_THREAD_SLICE;
>> 
>> style nitpick if{}else if{}else
> 
> Done.
> 
> 
>> [...]
>>> +/**
>>> + * Waits for decoding threads to finish and resets internal
>>> + * state. Called by avcodec_flush_buffers().
>>> + *
>>> + * @param avctx The context.
>>> + */
>>> +void ff_thread_flush(AVCodecContext *avctx);
>>> +
>>> +/**
>>> + * Submits a new frame to a decoding thread.
>>> + * Returns the next available frame in picture. *got_picture_ptr
>>> + * will be 0 if none is available.
>>> + *
>>> + * Parameters are the same as avcodec_decode_video2().
>>> + */
>>> +int ff_thread_decode_frame(AVCodecContext *avctx, AVFrame *picture,
>>> +                           int *got_picture_ptr, AVPacket *avpkt);
>>> +
>>> +/**
>>> + * Codecs which define update_thread_context should call this
>>> + * when they are ready for the next thread to start decoding
>>> + * the next frame. After calling it, do not change any variables
>>> + * read by the update_thread_context method.
>>> + *
>>> + * @param avctx The context.
>>> + */
>>> +void ff_thread_finish_setup(AVCodecContext *avctx);
>>> +
>> 
>> 
>> 
>>> +/**
>>> + * Call this when some part of the picture is finished decoding.
>>> + * Later calls with lower values of progress have no effect.
>>> + *
>>> + * @param f The picture being decoded.
>>> + * @param progress Value, in arbitrary units, of how much of the picture has decoded.
>>> + * @param field The field being decoded, for field-picture codecs.
>>> + * 0 for top field or frame pictures, 1 for bottom field.
>>> + */
>>> +void ff_thread_report_progress(AVFrame *f, int progress, int field);
>>> +
>>> +/**
>>> + * Call this before accessing some part of a picture.
>>> + *
>>> + * @param f The picture being referenced.
>>> + * @param progress Value, in arbitrary units, to wait for.
>>> + * @param field The field being referenced, for field-picture codecs.
>>> + * 0 for top field or frame pictures, 1 for bottom field.
>>> + */
>>> +void ff_thread_await_progress(AVFrame *f, int progress, int field);
>>> +
>>> +/**
>>> + * Call this function instead of avctx->get_buffer(f).
>>> + *
>>> + * @param avctx The current context.
>>> + * @param f The frame to write into.
>>> + */
>>> +int ff_thread_get_buffer(AVCodecContext *avctx, AVFrame *f);
>>> +
>>> +/**
>>> + * Call this function instead of avctx->release_buffer(f).
>>> + *
>>> + * @param avctx The current context.
>>> + * @param f The picture being released.
>>> + */
>>> +void ff_thread_release_buffer(AVCodecContext *avctx, AVFrame *f);
>> 
>> You should document what the functions "do" not when to call them.
>> Well you should document when to call them too but what they do is more
>> important IMHO. And i mean from a high level view not low level implementation
>> dox of course
> 
> Wrote more comments.
> 
>>> 
>>> +
>>> +#endif /* AVCODEC_THREAD_H */
>>> diff --git a/libavcodec/utils.c b/libavcodec/utils.c
>>> index 3875140..c931484 100644
>>> --- a/libavcodec/utils.c
>>> +++ b/libavcodec/utils.c
>>> @@ -36,6 +36,7 @@
>>> #include "dsputil.h"
>>> #include "libavutil/opt.h"
>>> #include "imgconvert.h"
>>> +#include "thread.h"
>>> #include "audioconvert.h"
>>> #include "internal.h"
>>> #include <stdlib.h>
>>> @@ -256,6 +257,11 @@ int avcodec_default_get_buffer(AVCodecContext *s, AVFrame *pic){
>>>     (*picture_number)++;
>>> 
>>>     if(buf->base[0] && (buf->width != w || buf->height != h || buf->pix_fmt != s->pix_fmt)){
>>> +        if(s->active_thread_type&FF_THREAD_FRAME) {
>>> +            av_log_missing_feature(s, "Width/height changing with frame threads is", 0);
>>> +            return -1;
>>> +        }
>>> +
>>>         for(i=0; i<4; i++){
>>>             av_freep(&buf->base[i]);
>>>             buf->data[i]= NULL;
>>> @@ -517,18 +523,29 @@ int attribute_align_arg avcodec_open(AVCodecContext *avctx, AVCodec *codec)
>>>         goto free_and_end;
>>>     }
>>>     avctx->frame_number = 0;
>>> +
>>> +    if (HAVE_THREADS && !avctx->thread_opaque) {
>>> +        ret = avcodec_thread_init(avctx, avctx->thread_count);
>>> +        if (ret < 0) {
>>> +            goto free_and_end;
>>> +        }
>>> +    }
>>> +
>>>     if (avctx->codec->max_lowres < avctx->lowres) {
>>>         av_log(avctx, AV_LOG_ERROR, "The maximum value for lowres supported by the decoder is %d\n",
>>>                avctx->codec->max_lowres);
>>>         goto free_and_end;
>>>     }
>>> 
>>> -    if(avctx->codec->init){
>>> -        ret = avctx->codec->init(avctx);
>>> -        if (ret < 0) {
>>> -            goto free_and_end;
>>> +    if(avctx->codec->init && !(avctx->active_thread_type&FF_THREAD_FRAME)){
>>> +        if(avctx->codec->init){
>>> +            ret = avctx->codec->init(avctx);
>>> +            if (ret < 0) {
>>> +                goto free_and_end;
>>> +            }
>>>         }
>> 
>>>     }
>>> +
>>>     ret=0;
>> 
>> cosmetic
> 
> Removed. (there was a funny merge glitch there too)
> 
>> 
>> 
>> [...]
>>> @@ -737,7 +757,7 @@ av_cold int avcodec_close(AVCodecContext *avctx)
>>> 
>>>     if (HAVE_THREADS && avctx->thread_opaque)
>>>         avcodec_thread_free(avctx);
>>> -    if (avctx->codec && avctx->codec->close)
>>> +    if (avctx->codec && avctx->codec->close && !(avctx->active_thread_type&FF_THREAD_FRAME))
>>>         avctx->codec->close(avctx);
>>>     avcodec_default_free_buffers(avctx);
>>>     avctx->coded_frame = NULL;
>> 
>> Moving init/close elsewhere is a bit ugly, is this unavoidable?
> 
> close: avoidable, removed.
> init: not avoidable, I just rediscovered why it was and then forgot because it's 5 AM.
> 
> If it is avoidable, the change would involve more than one line.
> 
> 
>>> 
>>> @@ -745,6 +765,7 @@ av_cold int avcodec_close(AVCodecContext *avctx)
>>>     if(avctx->codec && avctx->codec->encode)
>>>         av_freep(&avctx->extradata);
>>>     avctx->codec = NULL;
>>> +    avctx->active_thread_type = 0;
>>>     entangled_thread_counter--;
>>> 
>>>     /* Release any user-supplied mutex. */
>>> @@ -985,6 +1006,8 @@ void avcodec_init(void)
>>> 
>>> void avcodec_flush_buffers(AVCodecContext *avctx)
>>> {
>>> +    if(HAVE_PTHREADS && avctx->active_thread_type&FF_THREAD_FRAME)
>>> +        ff_thread_flush(avctx);
>>>     if(avctx->codec->flush)
>>>         avctx->codec->flush(avctx);
>>> }
>>> @@ -1185,3 +1208,30 @@ unsigned int ff_toupper4(unsigned int x)
>>>             + (toupper((x>>16)&0xFF)<<16)
>>>             + (toupper((x>>24)&0xFF)<<24);
>>> }
>>> +
>> [...]
>> 
>>> diff --git a/libavformat/utils.c b/libavformat/utils.c
>>> index 8fcbe96..0b5d89c 100644
>>> --- a/libavformat/utils.c
>>> +++ b/libavformat/utils.c
>>> @@ -913,6 +913,11 @@ static void compute_pkt_fields(AVFormatContext *s, AVStream *st,
>>>     /* do we have a video B-frame ? */
>>>     delay= st->codec->has_b_frames;
>>>     presentation_delayed = 0;
>>> +
>>> +    // ignore delay caused by frame threading
>>> +    if (delay && st->codec->active_thread_type&FF_THREAD_FRAME)
>>> +        delay -= st->codec->thread_count-1;
>>> +
>>>     /* XXX: need has_b_frame, but cannot get it if the codec is
>>>         not initialized */
>>>     if (delay &&
>> 
>> This looks wrong
>> Please elaborate on what you are trying to do here
> 
> I removed the change and everything still appears to work.
> I think it may have been fixed by using pkt_dts??
> 
> The problem before was that threading increased has_b_frames, which caused it to print the "invalid dts/pts combination" warning on codecs that had dts==pts all the time like vp3. But I don't see it now.

Turns out it only happened on some ogg files. big_buck_bunny_720p_stereo.ogg still has the problem, whereas controlled_60.ogv didn't. Added it back with slightly longer comment.

This is only used to avoid the dts/pts combination warning, not anything else in that function, so the warning could be turned off instead. But I do think setting has_b_frames is the right thing for threading to do.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Frame-based-multithreading-framework-using-pthreads.patch
Type: application/octet-stream
Size: 47606 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110121/2253412f/attachment.obj>
-------------- next part --------------





More information about the ffmpeg-devel mailing list