[FFmpeg-devel] [PATCH] lavc: support subtitles charset conversion.

Nicolas George nicolas.george at normalesup.org
Thu Jan 3 16:53:02 CET 2013

Le quartidi 14 nivôse, an CCXXI, Clement Boesch a écrit :
> User configurable?

User, probably no. Caller, probably: lavf should be allowed to set it to

> >     AV_TEXT_ENCODING_MODE_DEFAULT, //< let lavc decide
> Detection based on what?

Based on the codec. Probably something like that:

int foobarsub_dec_init(AVCodecContext *avc)
    if (!avc->text_encoding_mode)
        avc->text_encoding_mode = AV_TEXT_ENCODING_MODE_SOMETHING;

with SOMETHING depending on the codec.

> >     AV_TEXT_ENCODING_MODE_MANUAL,  //< the decoder does the work
> Internally to the decoder, using the helper you're talking below?

Internally to the decoder, using any practical method. If I understand you
correctly, the helper I am talking below would be for demuxers.

> >     AV_TEXT_ENCODING_MODE_DONE,    //< the demuxer did the work
> Internally to the demuxer, using the helper you're talking below?

Yes if relevant: it is permitted to use lavc without lavf.

> >     AV_TEXT_ENCODING_MODE_PRE,     //< lavc must recode the packet
> Since lavc is not really supposed to modify the AVPacket (AFAIK), this
> might be a bit painful (buf copy before decoding callback).

I do not see why copying the AVPacket would be more painful than copying the
rectangle texts like you already do.

> >     AV_TEXT_ENCODING_MODE_POST,    //< lavc must recode the decoded text
> That sounds like the perfect place ;)

Actually, no: the more I think about it, the more I believe that PRE is way
better than POST. Unless I am mistaken, recoding before the decoder would
work just as well for all current codecs an ASCII-compatible encodings, plus
it will work with non-ASCII-compatible encodings just as well.

The only situation where POST would be better than PRE seems to be if the
codec mixes binary and text data (for example: U16BE line_length;
U8 line[line_length]; repeat). And we do not have those currently.

> Except that it doesn't contain the buffer size, so it can only do ASCII
> compliant charset conversions.

The problem is not just the buffer size: the decoders produce ASS markup in
ASCII, so, for example, a line break, "\N" would become U+5C4E
(non-existent) in UTF-16-BE (depending on the parity of the text before).

> Note: inside the demuxer, you don't have access to the codec charset
> (options are not yet populated). Inside the decoder that's possible.

The same problem happens for the rawvideo or pcm demuxers: the frame size or
sample rate is not available. This is solved using a private option with a
standardized name.

> I must say I have a hard time following what you actually want me to do.
> Can you tell me more about what you want to want to expose to the user
> first?

Do you mean API user or command-line tool user?


  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130103/7a17d38b/attachment.asc>

More information about the ffmpeg-devel mailing list