[Ffmpeg-devel] ac3enc.c modifications

Sun May 15 05:55:25 CEST 2005

Simone Karin Lehmann wrote:
> Ah, the scaling thingy again...
> 
> It's not the mdct itself which isn't correct, it's the kind of scaling which 
> is done in there. There's a bitshift rigt in the pre rotation code, but no 
> bitshift left in post rotation.
> 
> Just take a look at the file mdct.c. This file implements just the same mdct 
> function, but without any bitshifting.

The only part of this that doesn't add-up to me is that when the formula
from the spec is plugged-in to ffmpeg as-is with no pre-rotation, fft,
or post-rotation, the audio still comes out too quiet when played with
liba52.  That is what makes me think that it doesn't have anything to do
with scaling within the mdct, but rather with the type of input that is
used versus the range that ac3 uses internally.  Also, I have a feeling
that the right-shift has something to do with the 2/N scaling in the
spec, which from what I can tell is there because a window function is
applied.  In mdct.c, no window function is used.

> And another point is, as the specs mention, that in the decoder described in 
> the specs,  after imdct is done, the pcm has to be mutiplied by 2, which 
> "undoes headroom scaling performed in the encoder".  (page 102 of the 
> ATSC/A52a spec). Maybe this is an approach of the A52 standard to prevent 
> clippings. As I've found out, ac3 tends to reproduce signals at a slightly 
> higher level than the original. Depending on the source and the encoded 
> bitrate this is in most cases something between 0.25 db to 0.5 db on peak 
> levels. (Still enough to get hearable clippings if the originial has 0 dbFS 
> peaks). Dialog normalisation will solve this problem, but that's another 
> endless story.....

So does liba52 do the factor of 2 "headroom unscaling" as shown in the
spec?  If it does, then shouldn't the output level be closer to the
original than a whole factor of 2?  Also, the spec continues to say on
page 102 that "it is possible for the output signal to exceed 100
percent level even though the original input signal was less than or
equal to 100 percent level".  So maybe the possible clipping is just
inherent to the format, and like you said, is taken care of by dialog
normalization.

> Back again to the scaling... I still think, that just dropping the bitshift 
> right is the best way to solve this problem.

After looking at it again, I still feel that the mdct should be left
alone and that the issue lies in the bit-depth conversion.  But I'm not
trying to be argumentative. :)

-Justin