[FFmpeg-devel] [PATCH 5/9] sbc: implement SBC encoder (low-complexity subband codec)

Wed Feb 28 01:56:13 EET 2018

On Sat, Feb 24, 2018 at 09:31:30PM +0000, Rostislav Pehlivanov wrote:
> On 24 February 2018 at 12:01, Aurelien Jacobs <aurel at gnuage.org> wrote:
> 
> > On Thu, Feb 22, 2018 at 06:18:48PM +0000, Rostislav Pehlivanov wrote:
> > > On 21 February 2018 at 22:37, Aurelien Jacobs <aurel at gnuage.org> wrote:
> > >
> > > > This was originally based on libsbc, and was fully integrated into
> > ffmpeg.
> > > > ---
> > > >  doc/general.texi         |   2 +-
> > > >  libavcodec/Makefile      |   1 +
> > > >  libavcodec/allcodecs.c   |   1 +
> > > >  libavcodec/sbcdsp.c      | 382 ++++++++++++++++++++++++++++++
> > > > +++++++++++++
> > > >  libavcodec/sbcdsp.h      |  83 ++++++++++
> > > >  libavcodec/sbcdsp_data.c | 329 +++++++++++++++++++++++++++++++++++++
> > > >  libavcodec/sbcdsp_data.h |  55 +++++++
> > > >  libavcodec/sbcenc.c      | 411 ++++++++++++++++++++++++++++++
> > > > +++++++++++++++++
> > > >  8 files changed, 1263 insertions(+), 1 deletion(-)
> > > >  create mode 100644 libavcodec/sbcdsp.c
> > > >  create mode 100644 libavcodec/sbcdsp.h
> > > >  create mode 100644 libavcodec/sbcdsp_data.c
> > > >  create mode 100644 libavcodec/sbcdsp_data.h
> > > >  create mode 100644 libavcodec/sbcenc.c
> > > >
> > > > +
> > > > +#define OFFSET(x) offsetof(SBCEncContext, x)
> > > > +#define AE AV_OPT_FLAG_AUDIO_PARAM | AV_OPT_FLAG_ENCODING_PARAM
> > > > +static const AVOption options[] = {
> > > > +    { "joint_stereo", "use joint stereo",
> > > > +      OFFSET(joint_stereo), AV_OPT_TYPE_BOOL, { .i64 =  0 }, 0,   1,
> > AE },
> > > >
> > > +    { "dual_channel", "use dual channel",
> > > > +      OFFSET(dual_channel), AV_OPT_TYPE_BOOL, { .i64 =  0 }, 0,   1,
> > AE },
> > > >
> > >
> > > Erm those 2 things should be decided by the encoder, not by exposing them
> > > to the user. The encoder should decide which mode has lower distortion
> > for
> > > a given signal.
> >
> > See bellow.
> >
> > > > +    { "subbands",     "number of subbands (4 or 8)",
> > > > +      OFFSET(subbands),     AV_OPT_TYPE_INT,  { .i64 =  8 }, 4,   8,
> > AE },
> > > >
> > >
> > > The encoder doesn't check if the value isn't 4 or 8 so 5, 6 and 7 are all
> > > accepted. Similar issue to the previous option too.
> >
> > OK, fixed.
> >
> > > > +    { "bitpool",      "bitpool value",
> > > > +      OFFSET(bitpool),      AV_OPT_TYPE_INT,  { .i64 = 32 }, 0, 255,
> > AE },
> > > >
> > >
> > > This should be controlled by the bitrate setting. Either have a function
> > to
> > > translate bitrate to bitpool value or a table which approximately maps
> > > bitrate values supplied to bitpools. You could expose it directly as well
> > > as mapping it to a bitrate value by using the global_quality setting so
> > it
> > > shouldn't be a custom encoder option.
> >
> > Indeed, this is better this way, thanks for the suggestion.
> >
> > > > +    { "blocks",       "number of blocks (4, 8, 12 or 16)",
> > > > +      OFFSET(blocks),       AV_OPT_TYPE_INT,  { .i64 = 16 }, 4,  16,
> > AE },
> > > > +    { "snr",          "use SNR mode (instead of loudness)",
> > > > +      OFFSET(allocation),   AV_OPT_TYPE_BOOL, { .i64 =  0 }, 0,   1,
> > AE },
> > > >
> > >
> > > SNR mode too needs to be decided by the encoder rather than exposing it
> > as
> > > a setting.
> >
> > See bellow.
> >
> > > > +    { "msbc",         "use mSBC mode (wideband speech mono SBC)",
> > > >
> > >
> > > Add a profile fallback setting for this as well, like in aac where
> > -aac_ltp
> > > turns LTP mode on and -profile:a aac_ltp does the same.
> >
> > Not sure of the benefits of having 2 redundant way to do this, but OK.
> >
> > > You don't have to make the encoder decide which stereo coupling mode or
> > > snr/loudness setting to use, you can implement that with a later patch.
> > > I think you should remove the "blocks" and "subbands" settings as well
> > and
> > > instead replace those with a single "latency" setting like the native
> > Opus
> > > encoder in milliseconds which would adjust both of them on init to set
> > the
> > > frame size. This would also allow the encoder to change them. Again, you
> > > don't have to do this now, you can send a patch which adds a "latency"
> > > option later.
> >
> > I can see the value in what you propose, and I think that indeed, it
> > would be great for the encoder to choose by itself the most appropriate
> > parameters by default.
> > But on the other hand, we are talking about a codec targetting some
> > hardware decoders (blutetooth speakers, headsets...), and all those
> > parameters have to be negotiated between the encoder and the hardware
> > decoder. So I think it is mandatory to keep all those parameters
> > accessible to be able to setup the encoder according to the target
> > hardware capabilities.
> >
> >
> Hardware isn't as limited as you might think it is, and there's also no
> negotiation happening between encoders and decoders IRL (except for modern
> streaming protocols which suck). While its true that we've had some issues
> with hardware decoders not supporting rarely used features we wait for
> users to report them and don't assume that all hardware violates the specs.

It is not really about violating the specs, but more about very cheap or
old hardware that may only support some limited combination of settings.
And the bluetooth stack that will negotiate those settings also
according to the actual RF link quality.

> And looking at the code, only 2 settings strike out as being able to cause
> issues: the frame size, controlled by subbands and blocks and the msbc
> mode.

I don't think msbc is allowed for bluetooth, but indeed, I think
subbands and blocks are the one that may be troublesome, along with
the bitpool.

> There's no way in hell that dual stereo and joint coding are
> unsupported (except for mono streams obviously but we can detect that)

Agreed.

> like I said, you can add a setting to control the frame size later on, and
> the SNR setting is far too trivial to not be implemented.
> Hence I think that only the msbc setting and the latency need to be exposed.
> If someone reports that a device doesn't support e.g. non-SNR mode we'll
> add an override. We can easily add options, but removing them requires a 2
> year deprecation period. In all cases, the encoder should decide the
> parameters unless told otherwise, rather than having the user do so.

OK. It is indeed easier to add options later on when the need arise,
rather than removing them (even though I can't imagine why we would
remove them).
So I've updated the patch with only a msbc and a delay option and
I've added some sane parameters decisions, mainly based on the following
study :
https://pdfs.semanticscholar.org/1f19/561d03bc88b67728375566c95bbf77e730d5.pdf
This seems to give pretty good results.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-sbc-implement-SBC-encoder-low-complexity-subband-cod.patch
Type: text/x-diff
Size: 55137 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20180228/981f67b4/attachment.patch>