[FFmpeg-devel] [PATCH v2] Add SRV3 decoder/demuxer

Sat Dec 28 04:34:47 EET 2024

On Fri, Dec 27, 2024 at 12:16:04AM +0100, fishhh wrote:
> > instructions on how to obtain a srv3 sample file would be helpful too.
> > this is so we can actually test your patch.
> 
> downloading the file from YouTube is probably the easiest way,
> you can achieve this by the means of yt-dlp like this:
> yt-dlp --skip-download --write-subs --sub-langs en --sub-format srv3 <URL>
> --skip-download makes it so the video itself is not downloaded
> 
> if you need a video with complex subtitles for testing, this
> one exists: https://www.youtube.com/watch?v=gxWAM5mgH_o
> 
> also I just realized adding `ctx->q.keep_duplicates = 1;`
> somewhere in srv3_read_header significantly improves the quality
> of some conversions because it keeps the queue from removing
> events with the same content but different styles
> this also prompted me to make EDGE_GLOW emit a smaller glow
> (`{\\bord1\\blur1}`) which looks way better
> you can test with those changes if you care about the result's quality

ok got working.

> > usually we leave the version.h modifications out of patches posted to
> > the
> > mailining list, and only make those changes at git push time. this makes
> > it
> > easier to apply your patch in the future when those files may have
> > changed.
> > also for new codec we bump avcodec/version.h.
> 
> right, makes perfect sense, the documentation at
> https://ffmpeg.org/developer.html#New-codecs-or-formats-checklist
> should probably be updated to not tell you to change it yourself
> 
> > this patch sets adds a demuxer and a decoder. imho, it should be split
> > into
> > two patches. the decoder patch comes first, then the demuxer.
> > ...
> > allcodecs.c sections are sorted by name.
> 
> okay, I'll do those things for v3.
> 
> > > +enum SRV3PenAttrs {
> > > +    SRV3_PEN_ATTR_ITALIC = 1,
> > > +    SRV3_PEN_ATTR_BOLD = 2,
> > > +};
> > 
> > you use 0 elsewhere in the code.
> > should there be a SRV3_PEN_ATTR_NONE too?
> 
> the difference between PenAttrs and the other enums is that
> PenAttrs is a bitfield because the pen can be both italic and bold
> of course.
> now that I've looked into this it does seem ffmpeg uses macros
> for defining flags so maybe I should just make these
> #define SRV3_PEN_ATTR_ITALIC (1 << 0)
> #define SRV3_PEN_ATTR_BOLD (1 << 1)
> 
> > > +typedef struct SRV3Pen {
> > > +    int id;
> > > +
> > > +    int font_size, font_style;
> > > +    int attrs;
> > > +
> > > +    int edge_type, edge_color;
> > > +
> > > +    int ruby_part;
> > 
> > these should be enum SRV3PenAttrs, enum SRV3EdgeType, enum SRV3RubyPart,
> > instead of int. this helps static analysis. HOWEVER, as part of reading
> > in the values you assume they are int and this approach has merrit too.
> 
> yeah I didn't do that specifically so I could just pass these as
> int* to the parsing functions, it would be nice if we could just do
> enum SRV3EdgeType : int {}
> but that's a C23 thing so not really acceptable.
> 
> > = SRV3_EDGE_NONE
> 
> > > +#define FREE_LIST(type, list, until)                     \
> > > +do {                                                                \
> > > +    for (void *current = list; current && current != until; current
> > > = next) {  \
> > > +        next = ((type*)current)->next;                              \
> > > +        av_free(current);                                           \
> > > +    }                                                               \
> > > +} while(0)
> > 
> > syyle: the \'s should all line up
> 
> will fix
> 
> > > +
> > > +static int srv3_probe(const AVProbeData *p)
> > > +{
> > > +    if (strstr(p->buf, "<timedtext format=\"3\">"))
> > > +        return AVPROBE_SCORE_MAX;
> > > +
> > 
> > use strstr(p->buf, "<timedtext") && strstr(p->buf, "format=\"3\"");
> > 
> > this handle the situaton wehere an extra tag is inserted between the
> > two, e.g.
> >      <timedtext x="y" format="3" ....>
> > 
> 
> okay
> 
> > > +
> > > +static int srv3_parse_numeric_attr(SRV3Context *ctx, const char
> > > *parent, xmlAttrPtr attr, int *out, int min, int max)
> > > +{
> > > +    return srv3_parse_numeric_value(ctx, parent, attr->name,
> > > attr->children->content, 10, out, min, max) == 0;
> > > +}
> > > +
> > > +static void srv3_parse_color_attr(SRV3Context *ctx, const char
> > > *parent, xmlAttrPtr attr, int *out)
> > > +{
> > > +    srv3_parse_numeric_value(ctx, parent, attr->name,
> > > attr->children->content + (*attr->children->content == '#'), 16,
> > > out, 0, 0xFFFFFF);
> > > +}
> > > +
> > 
> > for both functions, should probably return the
> > srv3_parse_numeric_value() error value
> 
> well actually the only reason srv3_parse_numeric_value returns an
> error value is because attributes like "p" and "wp" have to
> process the number if it is parsed successfully to find the
> structure with the right id. I can make parse_color_attr return an error
> but it'd be ignored in all cases.
> 
> > srv3_parse_numeric_value() returns an error, but you are discarding it
> > everywhere here.
> 
> the error value from srv3_parse_numeric_value is discarded to allow
> parsing slightly incorrect subtitles since a single event's invalid
> opacity really shouldn't abort parsing and srv3_parse_numeric_value
> already logs an error so its return value is only necessary in the
> special cases I mentioned above.
> 
> > i would probably convert these attributes ("sz", "fs", "et", etc.), the
> > pen struct offsets, and
> > min/max values into an array, and then iterate through that array
> > searching for the attribute name.
> > something like AVOption.
> 
> nice idea I'll do that
> 
> > > +static int srv3_clean_segment_text(char *text) {
> > > +    char *out = text;
> > > +    const char *start = text;
> > > +
> > > +    while (1) {
> > > +        const char *end = strstr(start, ZERO_WIDTH_SPACE);
> > > +        size_t cnt = end ? (size_t)(end - start) :
> > > (size_t)strlen(start);
> > 
> > strlen returns size_t, should be no need to cast it.
> right thanks
> 
> > ditto for end-start;
> I think pointer subtraction returns ptrdiff_t which is supposed
> to be signed but it appears compilers don't complain about that
> without -Wconversion (gcc doesn't even with -Wconversion which
> I find amusing) since I don't see any mention of -Wconversion
> in ffmpeg I can remove that cast too.
> 
> > > +                } else if (!strcmp(attr->name, "ws")) {
> > > +                    // TODO: Handle window styles
> > 
> > use avpriv_report_missing_feature()?
> > 
> 
> I can do that but it'll trigger on many subtitle files as it's
> pretty commonly used, just doesn't usually change all that much
> which is why it's unimplemented.
> 
> > > +                if(textlen == 0)
> > > +                    continue;
> > 
> > if (textlen...
> I assume you mean if(!textlen) and aren't suggesting to put
> everything into an if block?

actually just need to insert a space after the 'if'

> > > +static const AVOption options[] = {
> > > +    { NULL }
> > > +};
> > > +
> > > +static const AVClass srv3_demuxer_class = {
> > > +    .class_name  = "SRV3 demuxer",
> > > +    .option      = options,
> > > +    .version     = LIBAVUTIL_VERSION_INT,
> > > +};
> > 
> > are you planning to use options?
> > if not, options and srv3_demuxer_class can be removed.
> 
> when I wrote this code I did have in mind a hypothetical
> option that would enable using non-standard libass functionality
> like BorderStyle = 3 so that's a possibility but I probably won't
> implement that myself anytime soon.
> can a class be added retroactively? if yes then yeah this
> should probably be removed.

yes class & options can be added later

-- Peter
(A907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20241228/798822ae/attachment.sig>