[FFmpeg-devel] [PATCH 2/2] lavc: support subtitles charset conversion.

Clément Bœsch ubitux at gmail.com
Fri Jan 11 20:18:04 CET 2013


On Tue, Jan 08, 2013 at 12:29:36PM +0100, Nicolas George wrote:
[...]
> > +    /**
> > +     * Character encoding of the input subtitles file.
> > +     * - decoding: set by user, libavformat or libavcodec
> > +     * - encoding: unused
> > +     */
> > +    char *sub_charenc;
> 
> Needs a not about not accessing it directly from outside lavc (and possibly
> an accessor).
> 

Just kept "set by user". On top of the struct: "Please use AVOptions
(av_opt* / av_set/get*()) to access these fields from user applications."

This is consistent with the other options.

> > +
> > +    /**
> > +     * Subtitles character encoding mode.
> 
> > +     * - decoding: set by libavformat or libavcodec, not intended to be used by user apps
> 
> Do we consider lavf mandatory for subtitles demuxing? IMHO, from lavc's
> point of view, lavf is a "user app".
> 

I kept this lavc internal for now.

> > +     * - encoding: unused
> > +     */
> > +    int sub_charenc_mode;
> 
> > +#define FF_SUB_CHARENC_MODE_DO_NOTHING  -1  ///< do nothing (demuxer outputs a stream supposed to be already in UTF-8, or the codec is bitmap for instance)
> > +#define FF_SUB_CHARENC_MODE_AUTOMATIC    0  ///< libavcodec will select the mode itself
> > +#define FF_SUB_CHARENC_MODE_DECODER_PRE  2  ///< the AVPacket data needs to be recoded to UTF-8 before being fed to the decoder, requires iconv
> > +//#define FF_SUB_CHARENC_MODE_DECODER_POST 3  ///< the AVSubitle data needs to be recoded to UTF-8 after the decoder pass, requires iconv
> 
> Nit: enum?
> 

I prefer define in these case, at least consistent with the other fields.

[...]
> > diff --git a/libavformat/assdec.c b/libavformat/assdec.c
> > index 35fcb51..710a171 100644
> > --- a/libavformat/assdec.c
> > +++ b/libavformat/assdec.c
> > @@ -93,6 +93,7 @@ static int ass_read_header(AVFormatContext *s)
> >      avpriv_set_pts_info(st, 64, 1, 100);
> >      st->codec->codec_type = AVMEDIA_TYPE_SUBTITLE;
> 
> >      st->codec->codec_id= AV_CODEC_ID_SSA;
> > +    st->codec->sub_charenc_mode = FF_SUB_CHARENC_MODE_DO_NOTHING;
> 
> lavf should not access the final AVCodecContext fields directly. Also,
> usually, FF_* macros are for intra-use only (but there are exceptions).
> 
> Another problem: at this point, sub_charenc is still NULL, and therefore
> DO_NOTHING is redundant. If this code path leads to sub_charenc being
> non-NULL, that means it has been set by the user: ignoring it seems wrong.
> IMHO, the demuxer should only set the mode to DO_NOTHING if it actually does
> something with the encoding.
> 
> (ASS files in something else than UTF-8 exist; I am sure malformed webftt
> files exist too somewhere.)
> 

For this one and the following, sub_charenc_mode setting removed for now.

[...]

I won't be available for about one month, but since Alexander raised some
interest to be working on this, here is my current work:

  The two current patches for character encoding support, rebased on
  master:
    https://github.com/ubitux/FFmpeg/compare/master...sub-recode-nofilter

  The additional lavfi patches, rebased on sub-recode-nofilter:
    https://github.com/ubitux/FFmpeg/compare/master...sub-recode

These patches are ready from my PoV (just be careful about the TODO
version bump).

Of course, this may be discussed because of the UTF-16 insanity. My
opinion is that for standalone text-subtitles formats we should just make
ff_subtitles_read_chunk() and ff_get_line() support an utf16-to-utf8
convert and be it. Yes we would loose the utf-16 encoding when
transcoding, but I think that's really not important.

Another thing is that we don't support at all utf-16 currently in these
demuxers, and I'm relatively against reworking them to change their
c-string logic into a binary one.

After the sub-recode-nofilter patches, utf-16 is supported for binary
demuxers.

Anyone is welcome to work on this, and/or push when ready.

See you in one month,

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20130111/e6fcb2f4/attachment.asc>


More information about the ffmpeg-devel mailing list