[FFmpeg-devel] [PATCH v2] Add SRV3 decoder/demuxer

Fri Dec 27 01:16:04 EET 2024

> instructions on how to obtain a srv3 sample file would be helpful too.
> this is so we can actually test your patch.

downloading the file from YouTube is probably the easiest way,
you can achieve this by the means of yt-dlp like this:
yt-dlp --skip-download --write-subs --sub-langs en --sub-format srv3 
<URL>
--skip-download makes it so the video itself is not downloaded

if you need a video with complex subtitles for testing, this
one exists: https://www.youtube.com/watch?v=gxWAM5mgH_o

also I just realized adding `ctx->q.keep_duplicates = 1;`
somewhere in srv3_read_header significantly improves the quality
of some conversions because it keeps the queue from removing
events with the same content but different styles
this also prompted me to make EDGE_GLOW emit a smaller glow
(`{\\bord1\\blur1}`) which looks way better
you can test with those changes if you care about the result's quality

> usually we leave the version.h modifications out of patches posted to 
> the
> mailining list, and only make those changes at git push time. this 
> makes it
> easier to apply your patch in the future when those files may have 
> changed.
> also for new codec we bump avcodec/version.h.

right, makes perfect sense, the documentation at
https://ffmpeg.org/developer.html#New-codecs-or-formats-checklist
should probably be updated to not tell you to change it yourself

> this patch sets adds a demuxer and a decoder. imho, it should be split 
> into
> two patches. the decoder patch comes first, then the demuxer.
> ...
> allcodecs.c sections are sorted by name.

okay, I'll do those things for v3.

>> +enum SRV3PenAttrs {
>> +    SRV3_PEN_ATTR_ITALIC = 1,
>> +    SRV3_PEN_ATTR_BOLD = 2,
>> +};
> 
> you use 0 elsewhere in the code.
> should there be a SRV3_PEN_ATTR_NONE too?

the difference between PenAttrs and the other enums is that
PenAttrs is a bitfield because the pen can be both italic and bold
of course.
now that I've looked into this it does seem ffmpeg uses macros
for defining flags so maybe I should just make these
#define SRV3_PEN_ATTR_ITALIC (1 << 0)
#define SRV3_PEN_ATTR_BOLD (1 << 1)

>> +typedef struct SRV3Pen {
>> +    int id;
>> +
>> +    int font_size, font_style;
>> +    int attrs;
>> +
>> +    int edge_type, edge_color;
>> +
>> +    int ruby_part;
> 
> these should be enum SRV3PenAttrs, enum SRV3EdgeType, enum 
> SRV3RubyPart,
> instead of int. this helps static analysis. HOWEVER, as part of reading
> in the values you assume they are int and this approach has merrit too.

yeah I didn't do that specifically so I could just pass these as
int* to the parsing functions, it would be nice if we could just do
enum SRV3EdgeType : int {}
but that's a C23 thing so not really acceptable.

> = SRV3_EDGE_NONE

>> +#define FREE_LIST(type, list, until)                     \
>> +do {                                                                \
>> +    for (void *current = list; current && current != until; current = 
>> next) {  \
>> +        next = ((type*)current)->next;                              \
>> +        av_free(current);                                           \
>> +    }                                                               \
>> +} while(0)
> 
> syyle: the \'s should all line up

will fix

>> +
>> +static int srv3_probe(const AVProbeData *p)
>> +{
>> +    if (strstr(p->buf, "<timedtext format=\"3\">"))
>> +        return AVPROBE_SCORE_MAX;
>> +
> 
> use strstr(p->buf, "<timedtext") && strstr(p->buf, "format=\"3\"");
> 
> this handle the situaton wehere an extra tag is inserted between the
> two, e.g.
>      <timedtext x="y" format="3" ....>
> 

okay

>> +
>> +static int srv3_parse_numeric_attr(SRV3Context *ctx, const char 
>> *parent, xmlAttrPtr attr, int *out, int min, int max)
>> +{
>> +    return srv3_parse_numeric_value(ctx, parent, attr->name, 
>> attr->children->content, 10, out, min, max) == 0;
>> +}
>> +
>> +static void srv3_parse_color_attr(SRV3Context *ctx, const char 
>> *parent, xmlAttrPtr attr, int *out)
>> +{
>> +    srv3_parse_numeric_value(ctx, parent, attr->name, 
>> attr->children->content + (*attr->children->content == '#'), 16, out, 
>> 0, 0xFFFFFF);
>> +}
>> +
> 
> for both functions, should probably return the 
> srv3_parse_numeric_value() error value

well actually the only reason srv3_parse_numeric_value returns an
error value is because attributes like "p" and "wp" have to
process the number if it is parsed successfully to find the
structure with the right id. I can make parse_color_attr return an error
but it'd be ignored in all cases.

> srv3_parse_numeric_value() returns an error, but you are discarding it
> everywhere here.

the error value from srv3_parse_numeric_value is discarded to allow
parsing slightly incorrect subtitles since a single event's invalid
opacity really shouldn't abort parsing and srv3_parse_numeric_value
already logs an error so its return value is only necessary in the
special cases I mentioned above.

> i would probably convert these attributes ("sz", "fs", "et", etc.), the 
> pen struct offsets, and
> min/max values into an array, and then iterate through that array 
> searching for the attribute name.
> something like AVOption.

nice idea I'll do that

>> +static int srv3_clean_segment_text(char *text) {
>> +    char *out = text;
>> +    const char *start = text;
>> +
>> +    while (1) {
>> +        const char *end = strstr(start, ZERO_WIDTH_SPACE);
>> +        size_t cnt = end ? (size_t)(end - start) : 
>> (size_t)strlen(start);
> 
> strlen returns size_t, should be no need to cast it.
right thanks

> ditto for end-start;
I think pointer subtraction returns ptrdiff_t which is supposed
to be signed but it appears compilers don't complain about that
without -Wconversion (gcc doesn't even with -Wconversion which
I find amusing) since I don't see any mention of -Wconversion
in ffmpeg I can remove that cast too.

>> +                } else if (!strcmp(attr->name, "ws")) {
>> +                    // TODO: Handle window styles
> 
> use avpriv_report_missing_feature()?
> 

I can do that but it'll trigger on many subtitle files as it's
pretty commonly used, just doesn't usually change all that much
which is why it's unimplemented.

>> +                if(textlen == 0)
>> +                    continue;
> 
> if (textlen...
I assume you mean if(!textlen) and aren't suggesting to put
everything into an if block?

>> +
>> +static const AVOption options[] = {
>> +    { NULL }
>> +};
>> +
>> +static const AVClass srv3_demuxer_class = {
>> +    .class_name  = "SRV3 demuxer",
>> +    .option      = options,
>> +    .version     = LIBAVUTIL_VERSION_INT,
>> +};
> 
> are you planning to use options?
> if not, options and srv3_demuxer_class can be removed.

when I wrote this code I did have in mind a hypothetical
option that would enable using non-standard libass functionality
like BorderStyle = 3 so that's a possibility but I probably won't
implement that myself anytime soon.
can a class be added retroactively? if yes then yeah this
should probably be removed.