[FFmpeg-trac] #8066(avcodec:open): Bad quality encoding of high compressed audio by AAC encoder
FFmpeg
trac at avcodec.org
Tue May 5 18:13:59 EEST 2020
#8066: Bad quality encoding of high compressed audio by AAC encoder
------------------------------------+-----------------------------------
Reporter: Lirk | Owner: Lynne
Type: defect | Status: open
Priority: normal | Component: avcodec
Version: git-master | Resolution:
Keywords: aac | Blocked By:
Blocking: | Reproduced by developer: 1
Analyzed by developer: 1 |
------------------------------------+-----------------------------------
Comment (by Lynne):
I didn't say a cascaded encode wasn't useful or relevant to any scenario,
I pointed out it stresses a single component of the encoder, and that
specific part hasn't been touched since at least 1996 (its a line-by-line
copy of the 3gpp example encoder aac spec). As for why it got so far: no
one really noticed until the encoder started getting used by OBS, which
took 3-4 years after most of the work on the encoder was done.
I'm working on a replacement psychoacoustic system for both the opus
encoder and the aac encoder, but I need raw samples at both 44.1 and 48khz
that display artifacts. Obviously, the higher the bitrate and the lower
the amount of cascading needed, the better. I'd prefer 48khz (could
someone tell OBS people to make that the default, most streams use 44.1
which, aside from the obvious reasons, doesn't produce nice Bark bands and
makes writing a psychoacoustic model less general and more samplerate-
specific).
Having said that, I would like to point out that, in the real world, where
mixing and sub-frame-sample-offsets happen, sterile cascading tests could
potentially give highly misleading results, especially with good encoders
like libopus.
The performance of cascading encoders depends highly on whether each
decoded frame is given sample-aligned to the encoder. Even a small
alignment difference for each successive encode can ruin the result. For
example, if a transient AAC frame (1024 samples, split into 8 smaller
transforms) is given to an encoder with a 64-sample offset, the block
boundary of each smaller decoded transform, where most MDCT codec
artifacts happen, will be in the middle of the encoder's frame. Which, if
it decides to encode as a transient (very possible, given the artifacts
increase the energy) will produce very annoyinh results after no more than
5-6 encodes, regardless of the bitrate.
Coincidences like that happen and are somewhat out of your control, unless
you like to inject discontinuities and latency into your stream and assume
frame sizes.
As for Opus, it uses 120-sample overlaps on 960-sample frames, rather than
the AAC's 512-samples at 1024 frames. With such a low amount of overlap
(1/4 less compared to AAC), there are even higher artifacts at the frame
boundary, even with non-transient frames (Opus too splits transient frames
into 8 smaller transforms), and even worse, it does TF switching
(recombines/uncombines smaller transforms) which is highly sensitives
towards the signal (and artifacts), so Opus really benefits from
"cheating". Thankfully, some of this can be kept under control due to its
lossless signalling of band energy levels (so low-frequency artifacts can
overwhelm the signal acceptably).
In conclusion, while cascading does give you a good idea of how the
encoder deals with codec artifacts, don't assume it won't spazz out on you
in the field. Not saying it isn't useful, just not very useful for that
exact case where your frames aren't aligned.
Meta:
ffmpeg is hardly an organized, focused, single-entity project where each
contributor forms a part of a swarm mind and can work and judge anything,
or is responsible for the actions of another. You shouldn't take some
random contributor's word for much, let alone a bug tracker janitor. I
don't even read the bug tracker unless something in the title strikes me
as relevant to my field and I by chance read it.
Certainly, while people who do research on encoders for fun (and for free)
are few and far between, everyone knows everyone in this field, so you
only needed to ask literally anywhere (especially on freenode) to find the
most appropriate person who would pay attention to it, rather than do an
angry broadcast and hope someone listens.
As it turns out, some motivated people did exactly that, and somehow I got
very unwelcome and demotivating private messages. While perhaps such
things are not entirely responsible for ffmpeg developers' overall
reputation of being an unwelcoming, they '''really''' don't help.
Especially when it becomes a personal attack like now, since there's
really only a single person who would do this work. Shaming companies
where people share duties and responsibilities (or even lack such, since
they're paid) is one thing, but when it comes to open source, there's
usually just a single person behind a given feature.
I spoke with 2 "leaders" behind 2 other large projects and was told to
just ignore such messages, as every project this size gets random "you
suck" complaints on a daily basis (seriously).
Regarding the "best h264 encoder and worst aac encoder" comment, x264
literally got millions from huge companies, still does to this day, and
had several full time developers. Can't say the latter got anything.
Regarding the "developers must be held responsible for not maintaining
such a widely used project" comment: I can name addresses and address
names of who to send glitter bombs to. Unfortunately, one of those
resolves as 'NULL', at '0xffffffffffffffff', which is dependent on
~~undefined behavior~~, metaphysical interpretation.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/8066#comment:14>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list