[FFmpeg-devel] ASS/SSA discussions

Tue Sep 23 23:16:19 CEST 2008

On Sun, Sep 21, 2008 at 01:06:00AM +0300, Ivan Kalvachev wrote:
> On 9/18/08, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Mon, Sep 15, 2008 at 11:32:59PM +0300, Uoti Urpala wrote:
> >> On Mon, 2008-09-15 at 16:52 +0300, Ivan Kalvachev wrote:
[...]
> > convertion between .ass and .mkv will need convertion of the format
> > whereever it is done, doing it in the mkv demuxer means nut and other
> > formats like avi could use the packets directly.
> 
> Hum, see bellow.
> 
> >> > 3. If "Block's timecode" is stored in AVPacket.pts and "Block's
> >> > duration" is stored in AVPacket.convergence_duration, doesn't that mean
> >> > we have everything we need for muxing?
> >> 
> >> Aurelien's use of convergence_duration was mistaken and he should have
> >> used another field (probably waited for a patch adding display_duration
> >> to be applied); but yes, all the information should be available in the
> >> packet fields, and having it in the bitstream is completely redundant
> >> for an internal format. 
> >
> > So you propose that a .ass demuxer removes these fields?
> > The fact that .ass and .mkv use a different format leaves no choice
> > either one converts or the internal format is ambigous.
> > Now if one must convert why should that be the .ass and not mkv demuxer?
> 
> I see no problem if .ass demuxer removes fields that it have parsed and stored in 
> AVPacket structures (pts,duration,dts=readorder). I would say that this should be 
> mandatory.

Do you suggest that all duplicate fields should be removed?
If so mpeg1, mpeg2, mpeg4, h263, mp3, mp2, mp1, h264, ... 
(half of all codecs i suspect) do have redundant things, duration,
width/height some kind of timestamps for direct mode b frames, ...
And all demuxers that spport any of them would have to edit the bitstream
which effectively is half of all codecs * nearly all demuxers

Now if you didnt mean to suggest that every duplicate field should be
removed then iam not sure what you suggest or why it should be done to
ass/ssa but not others.

besides readorder and dts (you say "dts=readorder" above) have completely
different semantics and storing read order in dts would very badly break
things. DTS are not "a random user specified second timestamp", but have clear
meaning that is not even related to readorder. actually dts=pts for ASS.

besides#2, the ass demuxer in libavcodec will not parse, nor set the
display_duration. Its the ass AVParser that might, it would be nasty
code duplication if demuxers would do it ...

> 
> It would make processing of subtitles a lot more generic and flexible.
> 
> For example -  no need to manually mess with the strings, if we need to move or 
> reorder subtitles (e.g. copying the second half of a file). The code that is processing 
> the subtitles could work with mkv.ass and mkv.srt without need to distinguish or 
> know their internal representation. It could also work with more obscure subtitle 
> types. The (de)muxers would take care of the way they are stored or/and their 
> external representation.

see the 2nd comment below

> 
> Take this as example. Muxer is getting AVPacket containing subtitle in full {order,start_time.duration,string} text format. What is muxer supposed to do if  corresponding AVPackets are entierly different. Should it parse the text format, check for differences(optional) and then recreate the subtitle packet if necessary?
> It should be simpler to avoid parsing of text that have already been parsed once;)

This question is silly because it assumes that the input data to a muxer is
corrupted.
Its identical to asking what the muxer should do with h.263 that has 352x288
in its header but 720x480 in AVCodecContext. Such cases simply have undefined
behavior.
Its basically the users fault if he feeds the muxer with wrong data, one
cant mux /dev/random either and expect to have a playable video ...

> 
> 
> Indeed, the next big problem that could arise is when duration is changed and the
> subtitle have karaoke/script/animation in them that contain their own timing
> information.
> 
> This could be solved in two ways. First is special handling by the main program that
> changes the duration. The second is to store the original duration with the subtitle,
> and make demuxer or libass correct it to the packet duration.
> 
> None of these are simple, but in the second case we would need to create
> non-standard mkv.ass format. I don't think this is acceptable.

It seems you missed my past comments ...
What ffmpeg is heading toward is
* the demuxers return subtitle packets like any other packet 
* the subtite decoder decodes these packets to a common subtitle structure
  (AVSubtitle) containing utf-8 text, timestamps/durations, positions,
  effects, bitmaps, font references, ...
* A common subtitle renderer renders these so they can be displayed or
  a subtitle encoder encodes them to a possibly diferent format again.

Now this is not so much different from video and audio
the decoder converts a codec specific bitstream into a common and simple
representation (a bitmap or a bunch of PCM samples).

Within this framework, subtitles are trivially editable, not only the
duration or timestamps but also the actual text (unless the format uses
bitmaps). Also nothing in it is duplicated so there is no problem with 
any possibly ambiguities, as there cannot be any.

You wouldnt want to apply video filters to packets prior to the video
decoder, so why do you base your arguments on changing things prior to
subtitle decoder?

> 
> 
> Summary:
> Demuxer must remove fields that it have parsed and stored in AVPacket structure.

see above

> Anything else would lead to increasing of code duplication and special handling.

This is certainly not true, as it is not done currently by any (de)muxer 
and doing it would add very significant and complex code to every
demuxer. And yes iam speaking about the general case here, not just ass, if you
mean just ass, then i honestly do not understand why it should be a special
case.

> 
> 
> P.S.
> Please don't involve personal insults in the discussion.

I assume this reply was intended for uoti because i did not call anyone
anything.

> I don't mind moving the thread to ffmpeg-dev under new name, 
> if this is the way to get purely technical discussion.

I am cc-ing ffmpeg-dev

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080923/ad28b4c7/attachment.pgp>