[FFmpeg-devel] [PATCH 1/2] avutil/avstring: do not loose ascii characters when decoding non utf-8 with av_utf8_decode()

Michael Niedermayer michaelni at gmx.at
Sun Apr 13 03:26:04 CEST 2014


On Sun, Apr 13, 2014 at 12:10:59AM +0200, Nicolas George wrote:
> Le tridi 23 germinal, an CCXXII, Michael Niedermayer a écrit :
> > Subject: [FFmpeg-devel] [PATCH 1/2] avutil/avstring: do not loose ascii
> >  characters when decoding non utf-8 with av_utf8_decode()
> 
> Spelling mistake: "to loose" means "to set free", as in "the Forsaken are
> loose" (sorry, re-reading WoT). The correct spelling would be "lose". This
> applies to the next patch too, of course.
> 
> > 
> > Fixes Ticket3363
> > 
> > Signed-off-by: Michael Niedermayer <michaelni at gmx.at>
> > ---
> >  libavutil/avstring.c |    8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libavutil/avstring.c b/libavutil/avstring.c
> > index f4374fd..e75cdc6 100644
> > --- a/libavutil/avstring.c
> > +++ b/libavutil/avstring.c
> > @@ -331,15 +331,15 @@ int av_utf8_decode(int32_t *codep, const uint8_t **bufp, const uint8_t *buf_end,
> >      while (code & top) {
> >          int tmp;
> >          if (p >= buf_end) {
> > -            ret = AVERROR(EILSEQ); /* incomplete sequence */
> > -            goto end;
> > +            (*bufp) ++;
> > +            return AVERROR(EILSEQ); /* incomplete sequence */
> >          }
> >  
> >          /* we assume the byte to be in the form 10xx-xxxx */
> >          tmp = *p++ - 128;   /* strip leading 1 */
> >          if (tmp>>6) {
> > -            ret = AVERROR(EILSEQ);
> > -            goto end;
> > +            (*bufp) ++;
> > +            return AVERROR(EILSEQ);
> 
> With this form, each byte of an invalid sequence will trigger EILSEQ for
> each byte in an invalid sequence, instead of treating the whole sequence as
> invalid. I do not know whether this is better or not.

The problem is that while bytes n..m might form an invalid sequence
n+1.. where n+1<=m might very well be a valid sequence. So skiping
the whole sequence is problematic as it could easily loose the start
of the next valid sequence

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140413/b8f0af51/attachment.asc>


More information about the ffmpeg-devel mailing list