[FFmpeg-devel] [PATCH] Set correct frame_size for Speex decoding

Mon Aug 17 23:48:11 CEST 2009

On Sun, Aug 16, 2009 at 10:01:53PM -0400, Justin Ruggles wrote:
> Michael Niedermayer wrote:
> 
> > On Sun, Aug 16, 2009 at 12:26:32PM -0400, Justin Ruggles wrote:
> >> Michael Niedermayer wrote:
> >>
> >>> On Sun, Aug 16, 2009 at 05:07:18PM +0200, Michael Niedermayer wrote:
> >>>> On Sat, Aug 15, 2009 at 07:55:46PM -0400, Justin Ruggles wrote:
> >>>>> Michael Niedermayer wrote:
> >>> [...]
> >>>>>>>> what exactly is the argument you have that speex should not be handled like
> >>>>>>>> every other codec?!
> >>>>>>>> split it in a parser, the muxer muxes ONLY a single speex packet per
> >>>>>>>> container packet. Any extension from that is low priority and "patch welcome"
> >>>>>>>> IMHO ...
> >>>>>>> The downside for Speex is the container overhead since individual frames
> >>>>>>> are very small.
> >>>>>> this is true for many (most?) speech codecs
> >>>>>>
> >>>>>> also if we write our own speex encoder, it will only return one frame at a
> >>>>>> time.
> >>>>> Why would it have to?  
> >>>> because the API requires it, both encoders and decoders have to implement the
> >>>> API, a video encoder also cannot return 5 frames in one packet.
> >>>> APIs can be changed but you arent arguing for a API change you argue for
> >>>> ignoring the API and just doing something else.
> >>>>
> >>>>
> >>>>> If the user sets frame_size before initialization
> >>>>> to a number of samples that is a multiple of a single frame, it could
> >>>>> return multiple speex frames at once, properly joined together and
> >>>>> padded at the end.  With libspeex this is very easy to do because the
> >>>>> functionality is built into the API.
> >>>>>
> >>>>> I understand the desire to keep what are called frames as separate
> >>>>> entities, but in the case of Speex I see it as more advantagous to allow
> >>>>> the user more control when encoding.  If frames are always split up,
> >>>>> this limits the users options for both stream copy *and* for encoding to
> >>>>>  just 1 frame per packet.
> >>>>>
> >>>>> If you're dead-set against this idea, then I will finish the parser that
> >>>>> splits packets in order to continue with my other Speex patches, but I
> >>>>> don't like how limiting it would be for the user.
> >>>> i am againt speex handling things different than other speech codecs
> >>>> based on arguments that apply to other speech codecs as well.
> >>>> Also iam against passing data between muxer and codec layers in a way
> >>>> that violates the API.
> >>> ffmpeg seperates muxer and codec layers, writing a demuxer & decoder
> >>> that depend on things beyond the API (frames per frame) is going to
> >>> break things. We had similar great code (passing structs in AVPacket.data
> >>> because it was convenient) that also didnt turn out to be that great
> >>> and required a complete rewrite ...
> >>>
> >>>
> >>> ive alraedy said nut doesnt allow multiple frames per packet, but its
> >>> not just nut, avi as well doesnt allow multiple frames per packet
> >>> for VBR and either way avi needs to have its headers set up properly,
> >>> not with fake frame size and such and flv as we alaredy know has a 
> >>> issue with >8 frames per frame. All that is just what we know of will
> >>> break if you implement your hack, what else will break is something we
> >>> only would learn after some time.
> >>>
> >>> IMHO, demuxer->parser->splited frames [unless it is not possible to split]
> >>> if a muxer can store multiple frames it can merge several depending on its
> >>> abilities and user provided parameters, that merging could also be done
> >>> as a bitstream filter.
> >>> But just skiping the spliting and merging and hoping that every container
> >>> would allow anyting that any other container allowed is just not going to
> >>> work. And even more so as we already know of many combinations that would
> >>> noz work
> >> I do understand your point.  There is a subtle difference with speex
> >> though.  The process of merging of frames into groups of frames is
> >> something that is specified by the codec itself, not the container.  To
> >> the container, it would be as transparent as for an audio codec that
> >> allows different numbers of samples/duration for a frame.  Nut would
> >> support it just fine, as it does with FLAC with different numbers of
> >> samples per frame.  As for FLV, it would be the same as if it doesn't
> >> allow over a certain number of samples per frame before getting choppy
> >> and/or crashing.
> > 
> > hmmm, could you quote the part of the speex spec that supports your view?
> > and provide a link to that spec. (iam asking for a quote instead of just
> > url cuz of the ML archive ...)
> 
> unfortunately there is not a speex specification per-se, but a speex
> manual that gives some hints and guidance.  the libspeex code is
> definitive reference.
> 
> http://speex.org/docs/manual/speex-manual/node7.html
> 
> "Sometimes it is desirable to pack more than one frame per packet (or
> other basic unit of storage). The proper way to do it is to call
> speex_encode $ N$  times before writing the stream with speex_bits_write.
> 
> In cases where the number of frames is not determined by an out-of-band
> mechanism, it is possible to include a terminator code. That terminator
> consists of the code 15 (decimal) encoded with 5 bits, as shown in Table
> 4. Note that as of version 1.0.2, calling speex_bits_write automatically
> inserts the terminator so as to fill the last byte. This doesn't
> involves any overhead and makes sure Speex can always detect when there
> is no more frame in a packet."
> 
> The end of the packet is able detected because it looks like the start
> of another frame, but the mode id is the terminator code.  This is
> needed because a full valid speex frame can be only 5 bits.
> 
> What happens when libspeex writes frames is this:
> If all the combined frames end on a byte boundary, no terminator code is
> used because the packet data will end at the end of a frame.  If the
> combined frames do not end on a byte boundary, a 0-bit followed by
> enough 1's to pad to the next byte are appended.  The result is that if
> there are 5 or more bits, the terminator code will be detected.

iam not truely happy about all that but i agree that its overall alot
more convenient and simpler to keep these bit packed pieces of bull shit
together but in that case it has to be considered that this is a whole
frame and all variables have to be set accordingly.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090817/a3921e18/attachment.pgp>