[FFmpeg-devel] [PATCH] FLAC parser

Fri Oct 22 12:11:57 CEST 2010

On Fri, Oct 22, 2010 at 04:54:36AM +0200, Michael Chinen wrote:
> On Fri, Oct 22, 2010 at 1:15 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Thu, Oct 21, 2010 at 06:39:45PM +0200, Michael Chinen wrote:
> >> On Thu, Oct 21, 2010 at 1:05 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > On Wed, Oct 20, 2010 at 03:38:14PM +0200, Michael Chinen wrote:
> >> >> Hi,
> >> >>
> >> >> On Tue, Oct 19, 2010 at 2:48 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> >> >[...]
> >> >> >> I did profiling again and it turns out I missed one exit point for the
> >> >> >> function the last time. ?The non-flat wrap buffer version is about
> >> >> >> 2-4% faster overall. ?I've squashed it into the 0003.
> >> >> >
> >> >> >what is the speed difference between current svn and after this patch ?
> >> >>
> >> >> I used the -benchmark flag for 'ffmpeg -i fourminsong.flac a.wav' and
> >> >> five runs and got
> >> >> without patch: utime = 2.044-2.042s
> >> >> with patch: ? ?utime = 2.363-2.379s
> >> >>
> >> >> So flac demuxing with the parser is slower.
> >> >
> >> > its not a problem when the parser is needed, like for -acodec copy but when
> >> > it is not needed then a 15% slowdown is a problem. That said it of course
> >> > would be nicer if it was faster than that even when needed
> >> >
> >> >
> >> >
> >> > [...]
> >> >> >
> >> >> >
> >> >> > [...]
> >> >> >> +static int find_headers_search(FLACParseContext *fpc, uint8_t *buf, int buf_size,
> >> >> >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int search_start)
> >> >> >> +
> >> >> >> +{
> >> >> >> + ? ?FLACFrameInfo fi;
> >> >> >> + ? ?int size = 0, i;
> >> >> >> + ? ?uint8_t *header_buf;
> >> >> >> +
> >> >> >> + ? ?for (i = 0; i < buf_size - 1; i++) {
> >> >> >> + ? ? ? ?if ((AV_RB16(buf + i) & 0xFFFE) == 0xFFF8) {
> >> >> >
> >> >> > something based on testing several positions at once is likely faster
> >> >> > like
> >> >> > x= AV_RB32()
> >> >> > (x & ~(x+0x01010101))&0x80808080
> >> >> > that will detect 0xFF bytes and only after that testing the 4 positions for
> >> >> > FFF8
> >> >>
> >> >> Hmm. ?Since in both cases (header there/header not there) this will
> >> >> require more masks on a 2 byte int how will it be faster?
> >> >> Also since it is 15 bits that we are looking for is the 32 bit
> >> >> handling a mistake?
> >> >
> >> > the code is executed 4 times less often than your 2 byte masking
> >> > see ff_avc_find_startcode_internal() for something quite similar
> >>
> >> Thanks I now see the light - I didn't see at first you meant to
> >> process in 4 byte chunks.
> >> It is about 2-3x faster with the multiple byte processing:
> >> fastest without multibyte processing:
> >> 357748 dezicycles in with more pos testing, 16337 runs, 47 skips
> >> slowest with:
> >> 127551 dezicycles in with more pos testing, 15364 runs, 1020 skips
> >>
> >> (of course there are more skips so it is harder to profile)
> >>
> >> The -benchmark utime dropped down to a range of
> >> utime = 2.232-2.236
> >> vs the prepatch:
> >> utime = 2.049-2.058
> >>
> >> so now it is a slowdown of about 10%.
> >
> > what amount of that is the startcode search and what amount is the crc16 check?
> > note START/STOP_TIMER can easily be used to test this
> 
> 
> Since the startcode search function contains two calls two the
> function that eventually calls the crc functions, and since they don't
> get called every time I had to modify STOP_TIMER to omit the skips and
> have them print out at the same time (not just on the powers of two).
> I don't think omitting the skips makes the profiling too inaccurate,
> but if anything I would guess it makes the startcode+crc profile
> longer (since skips over long running times would occur more often
> with fast calls.)
> Attaching patch for timer just in case anyone wants to verify its
> doing the right stuff.
> 
> 1952876 dezicycles in crc, 2048 runs, 0 skipsts/s
> 298719 dezicycles in startcode+crc search, 18698 runs, 0 skips
> 307619 dezicycles in flac_parse, 20330 runs, 0 skips
> 
> the total running time
> crc                = 1952876 * 2048 = 3999490048
> startcode+crc      = 298719 * 18698 = 5585447862
> flac_parse         = 307619 * 20330 = 6253894270
> 
> so
> startcode = 5585447862 - 3999490048 = 1585957814
> 
> so startcode search is 28%, crc is 72%.
> in the larger picture, profiling the entire flac_parse, startcode is
> 25%, crc is 64% and the other parsing stuff/overhead is 11%.  "other
> parsing stuff/overhead" includes writing to the fifo, occasionally
> copying to a wrap buffer to return a complete frame when it falls
> across the wrap, and other small stuff like scoring a frame.

i see
Can you make the CRC16 check only be run in ambigous cases?
I mean if there are frames with all parameters matching and frame numbers
monotinously increasing and doing so by a equal amount then the crc16 check
maybe isnt needed?
example:
valid looking frames with numbers:
n=0   n=10 n=20 n=30
(no crc check needed)

n=0 n=5 n=10 n=20 n=30
(crc16 check should be run)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In fact, the RIAA has been known to suggest that students drop out
of college or go to community college in order to be able to afford
settlements. -- The RIAA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20101022/811ed953/attachment.pgp>