[FFmpeg-devel] [PATCH] Set correct frame_size for Speex decoding

Sun Aug 16 01:55:46 CEST 2009

Michael Niedermayer wrote:

> On Sat, Aug 15, 2009 at 07:20:29PM -0400, Justin Ruggles wrote:
>> Michael Niedermayer wrote:
>>
>>> On Sat, Aug 15, 2009 at 01:30:10PM -0400, Justin Ruggles wrote:
>>>> Justin Ruggles wrote:
>>>>
>>>>> Michael Niedermayer wrote:
>>>>>> On Sat, Aug 01, 2009 at 06:53:11AM -0400, Justin Ruggles wrote:
>>>>>>> Michael Niedermayer wrote:
>>>>>>>> On Fri, Jul 31, 2009 at 07:34:00PM -0400, Justin Ruggles wrote:
>>>>>>>>> Michael Niedermayer wrote:
>>>>>>>>>> On Fri, Jul 31, 2009 at 06:48:59PM -0400, Justin Ruggles wrote:
>>>>>>>>>>> Michael Niedermayer wrote:
>>>>>>>>>>>> On Thu, Jul 30, 2009 at 07:25:17PM -0400, Justin Ruggles wrote:
>>>>>>>>>>>>> Justin Ruggles wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Currently AVCodecContext.frame_size is not set correctly for Speex.
>>>>>>>>>>>>>> Since the Ogg and FLV demuxers and the libspeex decoder handle a full
>>>>>>>>>>>>>> packet as a single frame, frame_size should be set to the Speex
>>>>>>>>>>>>>> frame_size * frames_per_packet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If frames_per_packet is not specified in the Speex header, or if there
>>>>>>>>>>>>>> is no header, it can be determined after decoding the first packet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stream copy is not implemented yet for Speex, but once it is, a parser
>>>>>>>>>>>>>> will be able to set all the stream parameters instead of the decoder
>>>>>>>>>>>>>> when the header is missing or incomplete.
>>>>>>>>>>>>> ping.
>>>>>>>>>>>> it might be helpfull if you say who you expect to review this
>>>>>>>>>>> In general, I'm looking for an ok on having lavf and lavc treat a whole
>>>>>>>>>>> Speex packet as a single frame.  After considering and trying to code
>>>>>>>>>>> the split-then-join idea it did not seem like a very clean solution, and
>>>>>>>>>>> it is not really necessary.  This is my general plan:
>>>>>>>>>> which values of frames_per_packet does each container allow?
>>>>>>>>>> that is each container that supports speex
>>>>>>>>> AFAIK, Ogg is only limited by the Speex header, which supports up to
>>>>>>>>> INT32_MAX. (libspeex reads the 4-byte value in the header as a signed int32)
>>>>>>>>>
>>>>>>>>> For FLV, it's unclear.  Here is a quote from Art:
>>>>>>>>>> Here's what I found.  Set the speex frames per packet all the way from 1 up
>>>>>>>>>> to 8, and it appears they all now work with Flash Player (I erroneously
>>>>>>>>>> reported that 1 would not work before, but at least with the latest version
>>>>>>>>>> that is not the case).  Setting 9 frames per packet causes flash player to
>>>>>>>>>> start stuttering.  Set 10 or more frames per packet causes flash player to
>>>>>>>>>> crash, bringing down the browser with it.
>>>>>>>>> The speexenc commandline program allows encoding between 1 and 10 frames
>>>>>>>>> per packet.
>>>>>>>> so if you have a ogg with 11 -acodec copy to flv will not work, IMO thats a
>>>>>>>> bug. How should that be fixed?
>>>>>>>> if the parser splits the packets as it should, it should work ...
>>>>>>> -acodec copy from Ogg to FLV with 11 frames per packet would work just
>>>>>>> fine in FFmpeg, but Flash Player would crash when playing the file.
>>>>>>> That seems like a bug in Flash Player to me.
>>>>>> sure a crash is a bug but with propriatary formats the official tools are
>>>>>> more or less what define the format. One can argue about it of course but
>>>>>> de facto ffmpeg generating flv files that flash player cant play is a
>>>>>> disadvantage for ffmpeg and would be preceived as a bug in ffmpeg by most.
>>>>>> also even if fixed in the next version there still will be many people
>>>>>> around who use the old player that doesnt handle it, ffmpeg definitly
>>>>>> should not create files that are known to be problematic to 99% of the
>>>>>> used players, at least not by default.
>>>>> Well, we could not allow creation of FLV files with more than 8 frames
>>>>> per packet.  The only downside would be that stream copy would fail when
>>>>> the source has more than 8 frames per packet and the destination is FLV.
>>> its not the only downside
>>> having a random number of frames packed into one frame (thats actually
>>> alraedy odd) means the internal format of speex is pretty much undefined.
>>> what meaning do frame size or duration variables have then?
>> It's not random.  The header defines how many frames (and also how many
>> samples) are in each packet.  In the absence of a header, a parser can
>> determine the same information easily.  It is constant for the entire
>> stream.
>>
>> If a whole speex packet is treated as a single frame, it would behave
>> just the same as other codecs.  For FLV there would be a limit on the
>> number of samples in a frame, but for other containers there would not be.
> 
> i dont think the speex decoder will succeed as treating it like that

It already does.  It just has the wrong value set for frame_size.

>>> what exactly is the argument you have that speex should not be handled like
>>> every other codec?!
>>> split it in a parser, the muxer muxes ONLY a single speex packet per
>>> container packet. Any extension from that is low priority and "patch welcome"
>>> IMHO ...
>> The downside for Speex is the container overhead since individual frames
>> are very small.
> 
> this is true for many (most?) speech codecs
> 
> also if we write our own speex encoder, it will only return one frame at a
> time.

Why would it have to?  If the user sets frame_size before initialization
to a number of samples that is a multiple of a single frame, it could
return multiple speex frames at once, properly joined together and
padded at the end.  With libspeex this is very easy to do because the
functionality is built into the API.

I understand the desire to keep what are called frames as separate
entities, but in the case of Speex I see it as more advantagous to allow
the user more control when encoding.  If frames are always split up,
this limits the users options for both stream copy *and* for encoding to
 just 1 frame per packet.

If you're dead-set against this idea, then I will finish the parser that
splits packets in order to continue with my other Speex patches, but I
don't like how limiting it would be for the user.

-Justin