[FFmpeg-devel] [PATCH 2/3] avformat: reject FFmpeg-style merged side data in raw packets

Wed Mar 8 17:06:20 EET 2017

On Wed, 8 Mar 2017 15:36:25 +0100
Michael Niedermayer <michael at niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:
> > It looks like this could lead to security issues, as side data readers  
> 
> anything could, but side data wasnt excluded from fuzzing, in fact
> ive seen fuzzers find and trigger the split code

The split code runs on _any_ input data, so this is not surprising. My
intention is to remove merging/splitting completely (the patches don't
go that far yet).

> also a demuxer and user app could always set side data to something
> bad.
> All uses of side data must assume the data to be untrusted, because
> it is even without all the split code.

This depends. For example, AV_PKT_DATA_NEW_EXTRADATA is of course 100%
defined by untrusted data (except that the size fits into memory). On
the other hand, I expect something like AV_PKT_DATA_DISPLAYMATRIX to
have at least the correct size. FFmpeg code also does this sometimes,
for example dump_sidedata() doesn't check the size of this particular
side data type.

In my own FFmpeg API using code I assume that side data has the correct
minimum size for a given type, and I will blame any potentially
security-relevant deviations on FFmpeg. I'm sure most other API users
also will, and they're probably not even aware that this us supposedly
possible.

Besides, fewer ways for untrusted input data to leak into
complex/obscure data structures is an improvement for security.

> 
> 
> > will for example rely on side data allocation sizes to be as large as
> > needed for correct operation.  
> 
> Is this really so ?
> we allocate at least AV_INPUT_BUFFER_PADDING_SIZE, accesses beyond
> that require a check on the size.
> 

Except av_frame_new_side_data() doesn't do that, and often side data is
copied from packets to frames without additional checks (see
ff_init_buffer_info() for e.g. the DISPLAYMATRIX case).

> also size checks are needed for the case where a lib gets replaced by
> one with newer version that has a bigger struct.

Oh really? We never do this. Normal API structs are also considered
appendable, so compiling against a newer API and then linking against
an older version doesn't work. This is exactly the same case.

> 
> > If such files exist at all, they also
> > should be brought out of circulation, so fully reject them. Under normal
> > circumstances, nothing creates such files.
> > 
> > To avoid problems for now, we let the concat and hls demuxers do this
> > (they merely return previously-demuxed packets, whose side data might
> > have been merged by libavformat itself after demuxing). The
> > special-cases will be removed in the next commit.
> > ---
> >  libavformat/utils.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/libavformat/utils.c b/libavformat/utils.c
> > index 37d7024465..68f3c977d6 100644
> > --- a/libavformat/utils.c
> > +++ b/libavformat/utils.c
> > @@ -840,6 +840,15 @@ int ff_read_packet(AVFormatContext *s, AVPacket *pkt)
> >              *pkt = tmp;
> >          }
> >  
> > +        if (strcmp(s->iformat->name, "concat") && strcmp(s->iformat->name, "hls,applehttp")  
> 
> can strcmp() be avoided here ?
> it would be run per packet which is rather ugly

We could add a flag, but considering the next commit removes these
strcmp()s again, and that strcmp() on very short strings is very cheap,
I don't think it's worth doing this.

> 
> > +            && av_packet_split_side_data(pkt) == 1) {  
> 
> this could fail with ENOMEM which complicates things

I can add a check for ENOMEM too. This should be the only thing that
looks like a failure, but could work in a separate call (like the one
on the libavcodec side).

> 
> if we do block such packets, its prbobably better to add a new
> static inline function to a header that checks if a packet has
> splitable side data and use that in av_packet_split_side_data() and
> here

Not sure what you mean here.

> Iam suggesting "static inline" here to make backporting easier so no
> new symbol is added to libavcodec that is needed by libavformat

What's the purpose?

> also it may be interresting to disable this check for fuzzing so
> side data can be fuzzed in a wider range of cases and any past
> testcases that happen to use this can still be used for regression
> testing

I think what you want is fault injection for memory errors, seems out
of scope here.