[FFmpeg-devel] [PATCH] support for UTF-16 encoding in id3v2 tags

Reimar Döffinger Reimar.Doeffinger
Sat Sep 12 14:46:54 CEST 2009


On Fri, Sep 11, 2009 at 10:33:37AM +0200, Anton Khirnov wrote:
> diff --git a/libavformat/id3v2.c b/libavformat/id3v2.c
> index 0cf2cb1..6b7efa2 100644
> --- a/libavformat/id3v2.c
> +++ b/libavformat/id3v2.c
> @@ -81,6 +81,7 @@ static void read_ttag(AVFormatContext *s, int taglen, const char *key)
>      char *q, dst[512];
>      int len, dstlen = sizeof(dst) - 1;
>      unsigned genre;
> +    unsigned int (*get)(ByteIOContext*) = get_be16;
>  
>      dst[0] = 0;
>      if (taglen < 1)
> @@ -99,11 +100,38 @@ static void read_ttag(AVFormatContext *s, int taglen, const char *key)
>          *q = '\0';
>          break;
>  
> +    case 1:  /* UTF-16 with BOM */
> +        taglen -= 2;
> +        switch (get_be16(s->pb)) {
> +        case 0xfffe:
> +            get = get_le16;
> +        case 0xfeff:
> +            break;
> +        default:
> +            av_log(s, AV_LOG_ERROR, "Incorrect BOM value in tag %s.\n", key);
> +            return;
> +        }
> +        // fall-through
> +
> +    case 2:  /* UTF-16BE without BOM */
> +        q = dst;
> +        while (taglen > 1) {
> +            uint32_t ch;
> +            uint8_t tmp;
> +
> +            GET_UTF16(ch, ((taglen -= 2) >= 0 ? get(s->pb) : 0), break;)
> +            PUT_UTF8(ch, tmp, if (q - dst < dstlen -1) *q++ = tmp;)

This should probably be done in a separate patch, but IMO the if(...) in
PUT_UTF8 should go and instead the while condition should check we have
at least space for 7+1 bytes in dst.
Also since dstlen == sizeof(dst) - 1 both this and the existing code are
off by 1 I think.

> +        *q = '\0';

Again more an issue with the existing code, but I am allergic to that
'\0' clutter instead of a simple 0.

> + * \param GET_16BIT gets UTF-16 encoded bytes from any proper source. It can be
> + * a function or a statement whose return value or evaluated value is of type
> + * uint16_t. It will be executed up to 2 times.

Should probably have a comment on endianness (e.g. "gets two
bytes of UTF-16 encoded data converted to native endianness").

> +            if (val > 0x3FFU || hi > 0x3FFU) {\
> +                ERROR\
> +            }\

GET_UTF8 does not have {} around ERROR, IMO having this consistent is a
must.
And since we're at consistency, changing
> +    {\
> +        unsigned int hi;\
> +        val = GET_16BIT;\
> +        hi = val - 0xD800;\

to

> +    val = GET_16BIT;\
> +    {\
> +        unsigned int hi = val - 0xD800;\

is both one line shorter and more similar to GET_UTF8.



More information about the ffmpeg-devel mailing list