[FFmpeg-devel] [PATCH] speex in ogg muxer

David Conrad lessen42
Thu Oct 1 08:14:02 CEST 2009


On Sep 5, 2009, at 8:20 PM, Justin Ruggles wrote:

> Justin Ruggles wrote:
>
>> Justin Ruggles wrote:
>>
>>> Justin Ruggles wrote:
>>>
>>>> Justin Ruggles wrote:
>>>>
>>>>> Justin Ruggles wrote:
>>>>>
>>>>> Now I think I know what is going wrong, and there is nothing we  
>>>>> can do
>>>>> about it I think.  speexenc does some weird things with granule
>>>>> positions.  It starts out for a long time with granulepos=0 even  
>>>>> though
>>>>> it is encoding audio, then when it starts writing granule  
>>>>> positions it
>>>>> is not always in sync with the start of the stream.  Below is a  
>>>>> little
>>>>> snippet from a comparison of an original spx file to a copied  
>>>>> spx file.
>>>>> Each packet should be 320 samples.
>>>>>
>>>>> [...]
>>>> So... I figured it out, but you may not want to know the answer. ;)
>>>>
>>>> The granulepos of the first packet is supposed to be interpreted as
>>>> smaller than the full frame size by calculating what the  
>>>> granulepos of
>>>> the first page would normally be, then subtracting it from what it
>>>> really is to get the delay.
>>>>
>>>>>> From above, this is the last packet in the first page. There  
>>>>>> are 59
>>>> packets per page in this stream (the first 2 packets are headers,  
>>>> hence
>>>> the packetno of 60).
>>>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>>> speexdec interprets the first packet as having a delay of
>>>> 18880-18737=143 samples.  So the first packet should be 320-143=177
>>>> samples long, and the decoder discards the first 143 samples of the
>>>> first frame.
>>>>
>>>> None of this is documented except for in the speexenc and speexdec
>>>> source code.  From analyzing a Speex-in-FLV sample, it appears  
>>>> that the
>>>> way Adobe handles this in Flash Media Server is to do like our ogg
>>>> demuxer does and interpret the first page as if each frame is 320
>>>> samples, then resync timestamps with the source after the first  
>>>> page,
>>>> causing a skip in timestamps after the first page instead of at the
>>>> beginning of the stream.
>>>>
>>>> I'm still not sure what to do about this though...
>>> This patch makes it so that all the pts and durations are correct  
>>> for
>>> Ogg/Speex.  It basically just changes the durations of the first and
>>> last packets.
>>
>> nevermind. this doesn't quite work. i'm still working on it. damn ogg
>> and its craziness!
>
> Ok, now this patch should work correctly.

After some discussion with xiph people, apparently vorbis does this  
exact same thing. The reasoning behind it is that libvorbis/libspeex  
generate additional samples to prime the lapped transform. There is  
apparently nothing in the vorbis/speex bitstream to indicate how many  
samples this is, so instead ogg granulepos is used to figure out how  
many samples to skip at the beginning.

This is probably why there's such a high PSNR difference between our  
decoder and libvorbis despite the output sounding fine. However, I'm  
not sure how to fix this: since vorbis has no fixed frame_size it  
requires decoding every packet in the first ogg page, then subtracting  
the page's granulepos from how many samples were decoded. Which would  
break other containers.


> diff --git a/libavformat/oggparsespeex.c b/libavformat/oggparsespeex.c
> index cc00dd2..c295970 100644
> --- a/libavformat/oggparsespeex.c
> +++ b/libavformat/oggparsespeex.c
> @@ -53,6 +64,7 @@ static int speex_header(AVFormatContext *s, int  
> idx) {
>            byte-aligned. */
>         st->codec->frame_size = AV_RL32(p + 56);
>         frames_per_packet     = AV_RL32(p + 64);
> +        spxp->frame_size      = st->codec->frame_size;
>         if (frames_per_packet)
>             st->codec->frame_size *= frames_per_packet;

[...]

> +static int speex_packet(AVFormatContext *s, int idx)
> +{
> +    struct ogg *ogg = s->priv_data;
> +    struct ogg_stream *os = ogg->streams + idx;
> +    struct speex_params *spxp = os->private;
> +    int frames_per_packet = s->streams[idx]->codec->frame_size /  
> spxp->frame_size;

This frame_size / frame_size is confusing; one should be renamed. spxp- 
 >single_frame_size maybe with a comment reminding that the codec- 
 >frame_size is per packet with multiple frames?

> +
> +    if (os->flags & OGG_FLAG_EOS && os->lastgp != -1 && os->granule  
> > 0) {
> +        /* first packet of last page. we have to calculate the last  
> packet
> +           duration here because it is the only place we know the  
> next-to-last
> +           granule position. */
> +        spxp->last_packet_duration = os->granule - os->lastgp +
> +                                     spxp->frame_size *  
> frames_per_packet *
> +                                     (1 - ogg_page_packets(os));
> +    }
> +
> +    if (!os->lastgp && os->granule > 0)

This will set this duration for all speex packets in the first page;  
shouldn't it only be the first (os->seq == 2)?

> +        /* first packet */
> +        os->pduration = os->granule + spxp->frame_size *  
> frames_per_packet *
> +                        (1 - ogg_page_packets(os));

granule - frame_size * frames_per_packet * (ogg_page_packets(os) - 1)  
and likewise for last_packet_duration is more clear IMO, since it's  
equivalent to (stream duration including packet) - (stream duration  
excluding packet).



More information about the ffmpeg-devel mailing list