[FFmpeg-devel] [PATCH] Set correct frame_size for Speex decoding

Sun Aug 16 18:26:32 CEST 2009

Michael Niedermayer wrote:

> On Sun, Aug 16, 2009 at 05:07:18PM +0200, Michael Niedermayer wrote:
>> On Sat, Aug 15, 2009 at 07:55:46PM -0400, Justin Ruggles wrote:
>>> Michael Niedermayer wrote:
> [...]
>>>>>> what exactly is the argument you have that speex should not be handled like
>>>>>> every other codec?!
>>>>>> split it in a parser, the muxer muxes ONLY a single speex packet per
>>>>>> container packet. Any extension from that is low priority and "patch welcome"
>>>>>> IMHO ...
>>>>> The downside for Speex is the container overhead since individual frames
>>>>> are very small.
>>>> this is true for many (most?) speech codecs
>>>>
>>>> also if we write our own speex encoder, it will only return one frame at a
>>>> time.
>>> Why would it have to?  
>> because the API requires it, both encoders and decoders have to implement the
>> API, a video encoder also cannot return 5 frames in one packet.
>> APIs can be changed but you arent arguing for a API change you argue for
>> ignoring the API and just doing something else.
>>
>>
>>> If the user sets frame_size before initialization
>>> to a number of samples that is a multiple of a single frame, it could
>>> return multiple speex frames at once, properly joined together and
>>> padded at the end.  With libspeex this is very easy to do because the
>>> functionality is built into the API.
>>>
>>> I understand the desire to keep what are called frames as separate
>>> entities, but in the case of Speex I see it as more advantagous to allow
>>> the user more control when encoding.  If frames are always split up,
>>> this limits the users options for both stream copy *and* for encoding to
>>>  just 1 frame per packet.
>>>
>>> If you're dead-set against this idea, then I will finish the parser that
>>> splits packets in order to continue with my other Speex patches, but I
>>> don't like how limiting it would be for the user.
>> i am againt speex handling things different than other speech codecs
>> based on arguments that apply to other speech codecs as well.
> 
>> Also iam against passing data between muxer and codec layers in a way
>> that violates the API.
> 
> ffmpeg seperates muxer and codec layers, writing a demuxer & decoder
> that depend on things beyond the API (frames per frame) is going to
> break things. We had similar great code (passing structs in AVPacket.data
> because it was convenient) that also didnt turn out to be that great
> and required a complete rewrite ...
> 
> 
> ive alraedy said nut doesnt allow multiple frames per packet, but its
> not just nut, avi as well doesnt allow multiple frames per packet
> for VBR and either way avi needs to have its headers set up properly,
> not with fake frame size and such and flv as we alaredy know has a 
> issue with >8 frames per frame. All that is just what we know of will
> break if you implement your hack, what else will break is something we
> only would learn after some time.
> 
> IMHO, demuxer->parser->splited frames [unless it is not possible to split]
> if a muxer can store multiple frames it can merge several depending on its
> abilities and user provided parameters, that merging could also be done
> as a bitstream filter.
> But just skiping the spliting and merging and hoping that every container
> would allow anyting that any other container allowed is just not going to
> work. And even more so as we already know of many combinations that would
> noz work

I do understand your point.  There is a subtle difference with speex
though.  The process of merging of frames into groups of frames is
something that is specified by the codec itself, not the container.  To
the container, it would be as transparent as for an audio codec that
allows different numbers of samples/duration for a frame.  Nut would
support it just fine, as it does with FLAC with different numbers of
samples per frame.  As for FLV, it would be the same as if it doesn't
allow over a certain number of samples per frame before getting choppy
and/or crashing.

That said, I think a bitstream filter could work too, but I don't know
how options are passed to it, if that is even possible.

Doing it in lavf in the muxer(s) would require it to reuse a lot of the
parser code to essentially re-parse each frame to determine the number
of non-padding bits.  I don't think I could personally come up with a
good generic system for combining frames, but some shared code just for
speex would not be too difficult.  Do you think an ok approach would be
for the muxers to pass speex AVPackets to this function, which would
spit out either an empty packet or a combined packet?

-Justin