[FFmpeg-devel] [PATCH 2/3] avformat: reject FFmpeg-style merged side data in raw packets

Wed Mar 8 21:16:32 EET 2017

On Wed, 8 Mar 2017 19:56:42 +0100
Michael Niedermayer <michael at niedermayer.cc> wrote:

> On Wed, Mar 08, 2017 at 07:31:27PM +0100, wm4 wrote:
> > On Wed, 8 Mar 2017 19:03:21 +0100
> > Michael Niedermayer <michael at niedermayer.cc> wrote:
> >   
> > > On Wed, Mar 08, 2017 at 05:26:57PM +0100, wm4 wrote:  
> > > > On Wed, 8 Mar 2017 17:11:12 +0100
> > > > Michael Niedermayer <michael at niedermayer.cc> wrote:
> > > >     
> > > > > On Wed, Mar 08, 2017 at 04:06:20PM +0100, wm4 wrote:    
> > > > > > On Wed, 8 Mar 2017 15:36:25 +0100
> > > > > > Michael Niedermayer <michael at niedermayer.cc> wrote:
> > > > > >       
> > > > > > > On Wed, Mar 08, 2017 at 01:40:11PM +0100, wm4 wrote:      
> > > > > > > > It looks like this could lead to security issues, as side data readers        
> > > > > [...]    
> > > > > > > also size checks are needed for the case where a lib gets replaced by
> > > > > > > one with newer version that has a bigger struct.      
> > > > > > 
> > > > > > Oh really? We never do this. Normal API structs are also considered
> > > > > > appendable, so compiling against a newer API and then linking against
> > > > > > an older version doesn't work. This is exactly the same case.      
> > > > > 
> > > > > no its not
> > > > > 
> > > > > what you call normal structs are allocated by an allocator that is
> > > > > part of the lib that defines it, the struct, and lib dependancies
> > > > > ensure that its new. Any allocated struct as a result is large
> > > > > enough though possibly not every field is set
> > > > > 
> > > > > with side data the code using it sets the size explicitly that makes
> > > > > the size generally hardcoded in the lib using the code theres no
> > > > > longer a common allocator (some exceptions exist).
> > > > > The size a lib allocates that way is the
> > > > > compile time sizeof() which may differ from another lib
> > > > > and side data can be passed in both directions between libs not just
> > > > > in the direction of their dependancy
> > > > > so you can end up with a smaller side data and that means you have to
> > > > > do checks.    
> > > > 
> > > > This is wrong.    
> > >   
> > > > side data which has structs have corresponding functions
> > > > to get their allocation size. Of course that's all very error prone and
> > > > hard to use correctly and some were added only recently because the
> > > > API had holes, but that's how the libav* APIs are for now.    
> > > 
> > > you talk about something else here.
> > > 
> > > fact is the allocated side data uses hardcoded size values often
> > > anyone can look at
> > > git grep -A3 new_side_data
> > > 
> > > theres is sizeof() use and there are litteral numbers also  
> > 
> > You have to use whatever is correct in each specific case. Using a
> > number or sizeof() argument for new_side_data is simply an API
> > violation in some cases, similar to e.g. using
> > av_malloc(sizeof(AVFrame)). There are a few.
> >   
> 
> > I don't know why you want to "check" these uses with FATE. As I've said
> > in the other thread, that's like letting FATE check sizeof(AVFrame).
> > 
> > The right way is to check it in new_side_data, or have an API that is
> > not so hard to use incorrectly. This has been discussed before, when
> > we added new functions to add display mastering data or something
> > similar in an ABI-safe way.
> >   
> > > if these ever change, checks on the size become needed
> > > which was the original thing i meant in this sub argumentation.
> > > checks are needed, or something else in their place
> > > is needed in that case  
> > 
> > And FATE-checking the sizes is the wrong thing to do. At least for the
> > side data types whose byte layouts are defined by the C ABI like
> > MASTERING_DISPLAY_METADATA, not by something wire-like as for
> > example the SKIP_SAMPLES ones.  
> 
> wrong thread or you totally misunderstand me
> 
> what you originally said:
> > It looks like this could lead to security issues, as side data readers
> > will for example rely on side data allocation sizes to be as large as
> > needed for correct operation.  
> 
> And what i replied:
>     [...]
> 
>     also size checks are needed for the case where a lib gets replaced by
>     one with newer version that has a bigger struct.

Yes, and that's why I said somewhere that this is the same as other
"appendable" structs in the libav* API. For example, AVFrame is
supposed to be extendable, which is why everything outside of libavutil
(the lib that defines the struct) must use av_frame_alloc() to allocate
AVFrame. sizeof(AVFrame) is not part of the ABI.

There is _no_ intention that an API user compiled against, say,
libavutil version 1.2.100 can link to libavutil version 1.1.100,
because 1.2.100 might have extended some structs (like AVFrame).

So where does your argument for size checks come from? We don't do
anything like this. There is no av_get_avframe_size(), and there's no
code that would do "if(av_get_avframe_size() < sizeof(AVFRame))". But
for side data we're suddenly supposed to do this?

For side data we have e.g. av_mastering_display_metadata_alloc() and 
av_mastering_display_metadata_create_side_data(). Nowhere are you
supposed to know or check the side data size.

(I thought some side data types have an API function that returns the
size of the corresponding struct, but I didn't find one. My mistake.)

And finally, my claim that it's a security problem: if some code
for example accesses mastering display side data, and the side data is
too small, that code will obviously read out of bounds. There isn't
even a way to check this, short of using sizeof(AVMastering...), which
is not part of the ABI. This is why arbitrary side data read from files
should be prevented.

Some side-data types are defined like a file format - these generally
use explicit size (as in, numbers in the code), and you accessed them
with AV_WB32/AV_RB32 etc. - these are different.

> Now fact is that for cases where you link to a lib with a differently
> sized struct or more generally any side data (which is what was meant)
> if its not using an allocator it needs a check in the code or something
> else, thats what i meant and thats from where this bizare sub
> discussion started where you seem to keep disagreeing about things i
> did not say
> 
> Iam not talking about fate here, thats a seperate thing

OK, then I don't know what you're arguing about. Isn't the ABI
situation pretty clear?

> 
> >   
> > > 
> > >   
> > > > 
> > > > Besides, manually checking struct sizes/allocations makes for an even
> > > > worse ABI compatibility concept than FFmpeg currently follows. (Worse
> > > > as in ease of use and accidental incompatibilities and UB triggered as
> > > > consequence.)
> > > >     
> > >   
> > > > > 
> > > > > 
> > > > > [...]
> > > > >     
> > > > > > >       
> > > > > > > > +            && av_packet_split_side_data(pkt) == 1) {        
> > > > > > > 
> > > > > > > this could fail with ENOMEM which complicates things      
> > > > > > 
> > > > > > I can add a check for ENOMEM too. This should be the only thing that
> > > > > > looks like a failure, but could work in a separate call (like the one
> > > > > > on the libavcodec side).
> > > > > >       
> > > > > > >       
> > > > >     
> > > > > > > if we do block such packets, its prbobably better to add a new
> > > > > > > static inline function to a header that checks if a packet has
> > > > > > > splitable side data and use that in av_packet_split_side_data() and
> > > > > > > here      
> > > > > > 
> > > > > > Not sure what you mean here.
> > > > > >       
> > > > > > > Iam suggesting "static inline" here to make backporting easier so no
> > > > > > > new symbol is added to libavcodec that is needed by libavformat      
> > > > > > 
> > > > > > What's the purpose?      
> > > > > 
> > > > > Its simpler, cleaner and faster    
> > > > 
> > > > I mean of that function, or why we should care about symbols present.    
> > > 
> > > i think i explained this but ill try again all more verbosly
> > > 
> > > using a functiom to check for splitable sidedata instead of
> > > spliting the side data is cleaner as we dont want to split it we just
> > > want to check if theres any.
> > > 
> > > Its simpler as we dont have to deal with errors from the split code
> > > and also dont need to deal with the case that split happened.  
> > 
> > I don't see much simplicity in code duplication, putting code into
> > public headers (which also means you have to make sure this
> > compiles with C++), reducing compile times, and potentially exposing
> > implementation details.  
> 
> It can be in a private header, it doesnt need to be public

OK, that could probably be reduced to the magic footer.