[FFmpeg-devel] [PATCH 3/4] oggdec: add support for Opus codec.

Wed Jul 11 06:28:48 CEST 2012

On Tue, Jul 10, 2012 at 5:11 PM, Nicolas George
<nicolas.george at normalesup.org> wrote:
> My intention was to ignore the version field until a version 2 is defined,
> but I can error out on version mismatch too.

There is a zero and one defined though the change there was not of
great consequence:
it's what defined the interpretation of the version field.

The specification splits the version into major and minor fields.
Implementations
are supposed to accept all with the 0 major. There might be other fields added
in other minor versions but they're guaranteed to be compatible though
the header
obviously not have the same size.

If the major version is different all bets are off and you might just
be producing gibberish
output. The spec recommends you reject files with major != 1.

> Concerning timestamps, I am completely unsure what to make of them. If the
> packets decode to 960 samples, and the first 350 samples must be discarded,
> what timestamp should have the second packet? 960 or 960-350=610? And the
> first? 0 or -350?

Yes, the first samples out of the decoder will have negative time.

Quoting from the specification:

   The PCM sample position is determined from the granule position using
   the formula

         'PCM sample position' = 'granule position' - 'pre-skip' .

   For example, if the granule position of the first audio data page is
   59,971, and the pre-skip is 11,971, then the PCM sample position of
   the last decoded sample from that page is 48,000.  This can be
   converted into a playback time using the formula

                                   'PCM sample position'
                 'playback time' = --------------------- .
                                          48000.0

> I am also a bit uncertain about that requirement:
> "When seeking [...] the decoder should start decoding (and discarding the
> output) at least 3840 samples (80 ms) prior to the seek point"

The seeking offset doesn't impact the timestamping, it's just a requirement
that the decoder start ahead in order to get correct output after a seek.

Opus is asymptotic convergent, like rolling intra-refresh in video. So there
are no keyframes.  You decode early and eventually you get output
which isn't corrupted.  Often this happens quite fast, sometimes longer.
If you're _really_ unlucky you can get a loud squeal in the initial samples
out after you seek.

> I am not sure lavf is equipped for that. Michael's patches for padding
> should be able to handle the 3840 samples, but it is likely to cause A-V
> desync with the current code.

If it causes AV desync then it sounds like the timestamps aren't correct.

> I must say I am quite baffled by that gain field, it looks like exposing
> part of the codec's internal workings, and I see no good reason for it. I
> have half a mind of purposefully ignoring it until either all encoders
> decide not tu use it or until the API grows some kind of
> OPUS_SET_DECODE_GAIN or something.

The header is supported in all existing decoders that I'm aware of
(Opusdec, gstreamer, Firefox).

The support requires on the order of two lines of code in typical software.

Ffmpeg would be the odd man out to not support it, and you'll hurt
users because they won't be able to safely do recodeless gain tweaks
(like they can with mp3gain and aacgain) because they won't be able to
trust that the modifications will be applied in all the popular
software.

Please don't do this.

I'll be happy to make whatever concession I can to avoid the
gratuitous incompatibility.

I can't adequately express how disappointed I am to see this kind of
response. We've had years of open development of Opus with basically
no feedback from this community, though we extended invitations a few
times— although I do understand everyone is busy with their own
projects. This header was initially specified over a year ago, and
there have been pre-release tools implementing it for just as long.
We've spent the last five years writing tens of thousands of lines of
code, spent countless hours in negotiation with traditionally less
open organizations, performing seemingly endless listening tests, and
authored a whole codec design and specification that we've shared with
you freely. And at the same time I'm gladly offering to provide review
and QA for your implementation and I took the time to seek you out in
order to do so… and you'd break compatibility just to make a point
over a couple lines of caller implementation complexity?

If you would like a decoder CTL you're welcome to one— and you don't
have to threaten to spread incompatible software, you could have just
asked.

As you requested, I have added OPUS_SET_GAIN CTL to libopus:

https://git.xiph.org/?p=opus.git;a=commit;h=28b41ae5ae1acf6eb222858567952564838d48f5

> But I can listen to reason if someone explains why it should be the client
> application's task to handle the gain.

One motivation is because in integrated applications which have a
volume control it would make sense to combine it with the volume
control instead of always doing two layers of scaling.  Thats not of a
reason to not also have a CTL for it, however.

I suspected it might be easier to add a CTL for it and discussed it in
the past as it could also have some minor precision benefits for the
fixed point decoder and the integer interface, but doing so missed the
last code freeze for the standardization process (there was a span of
time where we couldn't change any implementation of Opus in any way in
order to avoid "but it's not done yet!" FUD by people trying to
disrupt the standards process). But, ultimately, the real answer is
simply because no one cared to ask for anything else: Since no one
implementing Opus support had asked it hadn't come back up again.

If you need anything else, please don't hesitate to ask.