[Ffmpeg-devel] Google Summer of Code 2007

Sun Mar 4 00:01:19 CET 2007

On 3-mrt-2007, at 21:33, Mike Melanson wrote:
> Derk-Jan Hartman wrote:
>> On 3-mrt-2007, at 2:25, Justin Ruggles wrote:
>>> Harikrishnan Varma wrote:
>>>> Audio:
>>>> * MPEG-1 Layer 3 and MPEG-2 Layer 3. Both CBR & ABR are supported.
>>>> * AC-3
>>>> * MP3 Surround (In drffmpeg we use a dll supplied by Thomson)
>>>
>>> Now that the AC-3 decoder is hopefully close to being merged  
>>> (waiting
>>> patiently), I think E-AC-3 would be a great SoC project.  There are
>>> clear decoding specs and this year there are samples to work  
>>> with.  It's
>>> also something that could be easily split out into incremental  
>>> steps.
>>>
>>> If nobody does it through SoC, I'll probably try to implement it  
>>> myself.
>>>  Personally, I really want a good E-AC-3 decoder so I can work on an
>>> encoder without having to install Windows in order to test it. :)
>>
>>
>> I have another idea for ffmpeg's GSoC.
>> Improving multichannel analog audio framework. As I understand,
>> multichannel audio is really weak now in ffmpeg, and especially audio
>> channel order concept is totally lost. I think this is one of the  
>> last
>> big missing things in avcodec, and that it is real important for  
>> future
>> versions of ffmpeg. It might also be something that google will  
>> see as
>> an important element for such a project. A possible additional  
>> element
>> could be channel reordering.
>>
>> I have worked on this for VLC's OSX audio module and have realized  
>> the
>> importance of proper channel designations in applications that handle
>> multichannel audio.
>>
>> Of course it would require heavy mentoring, because it would touch  
>> a lot
>> of API.
>
> Can you break this into a number of quantifiable goals and possibly
> suggest a mentor?

Goals:
1: each channel within a sample should be identifiable as a certain  
speakerlocation.
2: each stream/track should identify the order of the channels(1)  
within it's samples.
3: ffmpeg needs to report back the exact format it will use, and give  
the application the possibility to switch back to stereo, the  
reported format or to install a "channel-reordering" filter.
4: client application needs to be able to inform ffmpeg of it's  
speakerlayout in case channelreordering is one of the goals.
5: channel reordering filters, at least a 5.1 -> stereo fallback option.

At first sight, audio-channel ordering looks like not such a big  
problem, however working with CoreAudio I have realised how many  
different systems are in use from authoring, to codecs and drivers.

My personal opinion and warnings:
For 1: At least in tags, but keep possibility of future use of  
coordinates open. See also AudioChannelDescription of CoreAudio. Note  
that apple fills out the other params automatically if you use tags.  
Special designations for packed/passthrough audio might be an idea.  
Apple has designations such as Headphones, alternate language etc.  
These can be handy sometimes, but more in "speakerlayout" terms and  
less in "channel-layout".

For 2: tags can be used, but the system should not be limited to  
that. Alternate orderings and arbitrary orderings should be possible.  
See also AudioChannelLayout in CoreAudio.

For 3/4: It is important that the client application can know what  
the original format is, but can also request another type of  
channellayout. Not all applications will support all layouts.

For 5: Proper channel down/up-mixing can be quite involved and where  
possible should be handled in hardware/drivers in my opinion. However  
downmixing to stereo, with Center and LFE mixing into the front  
channels is a filter that you simply cannot do without.

Pitfalls
1: channel layouts can change during the stream these days (aac)
2: digital audio passthrough: this is a special beast and keeping it  
in mind during design is always a good idea.
3: mp4 fileformat can define "reordering" tables for audioformats.
4: channels might not be "active". If you look at this from the terms  
of Speakerlayout, instead of Channellayout, a device might have 5.1  
channels, but only L&R having speakers connected (quite common  
situation).
5: Allowing at least 32 discrete channels seems like a good idea. Not  
uncommon in the authoring world. This is also where coordinate based  
positioning might come up as opposed to predefined speaker positions.

For a mentor, well I don't really know. I know a bit about the audio  
stuff, but I know very little about ffmpeg API. And I don't know the  
various ffmpeg developers very well either.

DJ