[FFmpeg-devel] [PATCH] lavc: support subtitles charset conversion.
nicolas.george at normalesup.org
Thu Jan 3 16:53:02 CET 2013
Le quartidi 14 nivôse, an CCXXI, Clement Boesch a écrit :
> User configurable?
User, probably no. Caller, probably: lavf should be allowed to set it to
AV_TEXT_ENCODING_MODE_DONE, for example.
> > AV_TEXT_ENCODING_MODE_DEFAULT, //< let lavc decide
> Detection based on what?
Based on the codec. Probably something like that:
int foobarsub_dec_init(AVCodecContext *avc)
avc->text_encoding_mode = AV_TEXT_ENCODING_MODE_SOMETHING;
with SOMETHING depending on the codec.
> > AV_TEXT_ENCODING_MODE_MANUAL, //< the decoder does the work
> Internally to the decoder, using the helper you're talking below?
Internally to the decoder, using any practical method. If I understand you
correctly, the helper I am talking below would be for demuxers.
> > AV_TEXT_ENCODING_MODE_DONE, //< the demuxer did the work
> Internally to the demuxer, using the helper you're talking below?
Yes if relevant: it is permitted to use lavc without lavf.
> > AV_TEXT_ENCODING_MODE_PRE, //< lavc must recode the packet
> Since lavc is not really supposed to modify the AVPacket (AFAIK), this
> might be a bit painful (buf copy before decoding callback).
I do not see why copying the AVPacket would be more painful than copying the
rectangle texts like you already do.
> > AV_TEXT_ENCODING_MODE_POST, //< lavc must recode the decoded text
> That sounds like the perfect place ;)
Actually, no: the more I think about it, the more I believe that PRE is way
better than POST. Unless I am mistaken, recoding before the decoder would
work just as well for all current codecs an ASCII-compatible encodings, plus
it will work with non-ASCII-compatible encodings just as well.
The only situation where POST would be better than PRE seems to be if the
codec mixes binary and text data (for example: U16BE line_length;
U8 line[line_length]; repeat). And we do not have those currently.
> Except that it doesn't contain the buffer size, so it can only do ASCII
> compliant charset conversions.
The problem is not just the buffer size: the decoders produce ASS markup in
ASCII, so, for example, a line break, "\N" would become U+5C4E
(non-existent) in UTF-16-BE (depending on the parity of the text before).
> Note: inside the demuxer, you don't have access to the codec charset
> (options are not yet populated). Inside the decoder that's possible.
The same problem happens for the rawvideo or pcm demuxers: the frame size or
sample rate is not available. This is solved using a private option with a
> I must say I have a hard time following what you actually want me to do.
> Can you tell me more about what you want to want to expose to the user
Do you mean API user or command-line tool user?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: Digital signature
More information about the ffmpeg-devel