[FFmpeg-cvslog] r9504 - trunk/libavcodec/bitstream.c
Michael Niedermayer
michaelni
Fri Jul 6 17:06:22 CEST 2007
Hi
On Fri, Jul 06, 2007 at 04:30:50PM +0200, Aurelien Jacobs wrote:
> On Fri, 6 Jul 2007 16:14:41 +0200 (CEST)
> aurel <subversion at mplayerhq.hu> wrote:
>
> > Author: aurel
> > Date: Fri Jul 6 16:14:41 2007
> > New Revision: 9504
> >
> > Log:
> > simplify ff_copy_bits: merge 2 test branches
>
> It seems ff_copy_bits was written this way for speed reason.
> It would be easy to simplify it, but this could hurt speed.
>
> I placed some START/STOP TIMER at the begining/end of ff_copy_bits.
> Here are the numbers I get:
>
> 15600 dezicycles in ff_copy_bits, 1 runs, 0 skips
> 11815 dezicycles in ff_copy_bits, 2 runs, 0 skips
> 9695 dezicycles in ff_copy_bits, 4 runs, 0 skips
> 9357 dezicycles in ff_copy_bits, 8 runs, 0 skips
> 9465 dezicycles in ff_copy_bits, 16 runs, 0 skips
> 8726 dezicycles in ff_copy_bits, 32 runs, 0 skips
> 10435 dezicycles in ff_copy_bits, 61 runs, 3 skips
> 10420 dezicycles in ff_copy_bits, 124 runs, 4 skips
> 10612 dezicycles in ff_copy_bits, 249 runs, 7 skips
> 10358 dezicycles in ff_copy_bits, 502 runs, 10 skips
> 9204 dezicycles in ff_copy_bits, 1011 runs, 13 skips
> 10244 dezicycles in ff_copy_bits, 2034 runs, 14 skips
> 9484 dezicycles in ff_copy_bits, 4081 runs, 15 skips
> 8250 dezicycles in ff_copy_bits, 8175 runs, 17 skips
> 6108 dezicycles in ff_copy_bits, 16367 runs, 17 skips
>
> Now if I simplify the function further using the attached patch:
>
> 15750 dezicycles in ff_copy_bits, 1 runs, 0 skips
> 12415 dezicycles in ff_copy_bits, 2 runs, 0 skips
> 10635 dezicycles in ff_copy_bits, 4 runs, 0 skips
> 10417 dezicycles in ff_copy_bits, 8 runs, 0 skips
> 9914 dezicycles in ff_copy_bits, 16 runs, 0 skips
> 9368 dezicycles in ff_copy_bits, 32 runs, 0 skips
> 9491 dezicycles in ff_copy_bits, 63 runs, 1 skips
> 12255 dezicycles in ff_copy_bits, 126 runs, 2 skips
> 12125 dezicycles in ff_copy_bits, 251 runs, 5 skips
> 11608 dezicycles in ff_copy_bits, 504 runs, 8 skips
> 13245 dezicycles in ff_copy_bits, 1014 runs, 10 skips
> 12574 dezicycles in ff_copy_bits, 2038 runs, 10 skips
> 11837 dezicycles in ff_copy_bits, 4085 runs, 11 skips
> 9908 dezicycles in ff_copy_bits, 8178 runs, 14 skips
> 7013 dezicycles in ff_copy_bits, 16370 runs, 14 skips
>
> The difference don't seem very significant but it's slightly slower.
>
> Now if I try to write bytes instead of words to simplify a bit more:
>
> 26470 dezicycles in ff_copy_bits, 1 runs, 0 skips
> 21670 dezicycles in ff_copy_bits, 2 runs, 0 skips
> 19535 dezicycles in ff_copy_bits, 4 runs, 0 skips
> 19517 dezicycles in ff_copy_bits, 8 runs, 0 skips
> 18604 dezicycles in ff_copy_bits, 16 runs, 0 skips
> 17125 dezicycles in ff_copy_bits, 32 runs, 0 skips
> 19126 dezicycles in ff_copy_bits, 63 runs, 1 skips
> 20532 dezicycles in ff_copy_bits, 126 runs, 2 skips
> 20529 dezicycles in ff_copy_bits, 252 runs, 4 skips
> 21121 dezicycles in ff_copy_bits, 506 runs, 6 skips
> 20705 dezicycles in ff_copy_bits, 1015 runs, 9 skips
> 20402 dezicycles in ff_copy_bits, 2039 runs, 9 skips
> 18575 dezicycles in ff_copy_bits, 4085 runs, 11 skips
> 15700 dezicycles in ff_copy_bits, 8180 runs, 12 skips
> 10729 dezicycles in ff_copy_bits, 16372 runs, 12 skips
>
> The difference seems too big for such a small simplification.
>
> So I would personnaly use the simplification in the proposed patch.
> What do you think about it ?
>
> PS: note that in my tests, the length parameter varied between 6
> and 700. The speed difference would probably be more important
> if ff_copy_bits() is used with bigger length.
use higher bitrate, big slices, data partitioning and 2 threads
or some combination of that ...
[...]
> Index: libavcodec/bitstream.c
> ===================================================================
> --- libavcodec/bitstream.c (r??vision 9504)
> +++ libavcodec/bitstream.c (copie de travail)
> @@ -69,16 +69,7 @@
>
> if(length==0) return;
>
> - if(words < 16 || put_bits_count(pb)&7){
> for(i=0; i<words; i++) put_bits(pb, 16, be2me_16(srcw[i]));
> - }else{
> - for(i=0; put_bits_count(pb)&31; i++)
> - put_bits(pb, 8, src[i]);
> - flush_put_bits(pb);
> - memcpy(pbBufPtr(pb), src+i, 2*words-i);
> - skip_put_bytes(pb, 2*words-i);
> - }
> -
> put_bits(pb, bits, be2me_16(srcw[words])>>(16-bits));
hmm what about placing the simplification under #ifdef CONFIG_SMALL
or puttin something like || ENABLE_SMALL in the if() ?
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-cvslog/attachments/20070706/0473f74b/attachment.pgp>
More information about the ffmpeg-cvslog
mailing list