[Ffmpeg-devel] Re: [PATCH] Machine endian bytestream functions

Sun Mar 11 21:36:58 CET 2007

Hi

On Sat, Mar 10, 2007 at 05:15:44PM -0300, Ramiro Polla wrote:
> Hello,
> 
> Reimar D?ffinger escreveu:
> >Hello,
> >On Sat, Mar 10, 2007 at 11:06:41PM -0300, ramiro at lisha.ufsc.br wrote:
> >  
> >>Attached patch makes the AV_{R,W}{L,B}xx macros have a machine endian for
> >>the simple 16 and 32 bit types. Those macros are then #ifdef'd for the
> >>correct endianess. 24 bit remains the same, as it would be more complex.
> >>    
> >
> >They completely ignore alignment issues...
> >
> >  
> You're right.
> 
> Attached patch makes use of machine endianess where unaligned data 
> accesses are possible, and faster than what gcc is currently doing.
> 
> I have only tested this on a p4, but the following program should detect 
> this on any architecture. Compile bytes.c and main.c with the same 
> options FFmpeg gives to libavcodec files, link them, and test the speed 
> both for patched and unpatched FFmpeg. bytes.c should be changed to 'be' 
> on big-endian architectures.
> 
> Regression tests pass.
> 
> Ramiro Polla

> Index: configure
> ===================================================================
> --- configure	(revis?o 8316)
> +++ configure	(c?pia de trabalho)
> @@ -602,6 +602,7 @@
>      dlopen
>      fast_64bit
>      fast_cmov
> +    fast_unaligned
>      freetype2
>      imlib2
>      inet_aton
> @@ -737,6 +738,7 @@
>  mmx="default"
>  cmov="no"
>  fast_cmov="no"
> +fast_unaligned="no"
>  armv5te="default"
>  armv6="default"
>  iwmmxt="default"
> @@ -951,9 +953,11 @@
>  case "$arch" in
>    i386|i486|i586|i686|i86pc|BePC)
>      arch="x86_32"
> +    enable fast_unaligned
>    ;;
>    x86_64|amd64)
>      arch="x86_32"
> +    enable fast_unaligned
>      canon_arch="`$cc -dumpmachine | sed -e 's,\([^-]*\)-.*,\1,'`"
>      if [ x"$canon_arch" = x"x86_64" -o x"$canon_arch" = x"amd64" ]; then
>        if [ -z "`echo $CFLAGS | grep -- -m32`"  ]; then

maybe configure should rather have a generic test which checks which version
is faster? (it would be much easier to maintain instead of keeping track what
is faster for which cpu ...)

> Index: libavutil/intreadwrite.h
> ===================================================================
> --- libavutil/intreadwrite.h	(revis?o 8316)
> +++ libavutil/intreadwrite.h	(c?pia de trabalho)
> @@ -47,8 +47,8 @@
>  #define AV_RB8(x)  (((uint8_t*)(x))[0])
>  #define AV_WB8(p, d)  { ((uint8_t*)(p))[0] = (d); }
>  
> -#define AV_RB16(x) ((((uint8_t*)(x))[0] << 8) | ((uint8_t*)(x))[1])
> -#define AV_WB16(p, d) { \
> +#define AV_RBE16(x) ((((uint8_t*)(x))[0] << 8) | ((uint8_t*)(x))[1])
> +#define AV_WBE16(p, d) { \

cosmetic, rejected

[...]
>  
> +#ifdef HAVE_FAST_UNALIGNED
> +# ifdef WORDS_BIGENDIAN
> +#  define AV_RL16(x)    AV_RLE16(x)
> +#  define AV_RL32(x)    AV_RLE32(x)
> +#  define AV_RB16(x)    LD16(x)
> +#  define AV_RB32(x)    LD32(x)
> +#  define AV_WL16(p, d) AV_WLE16(p, d)
> +#  define AV_WL32(p, d) AV_WLE32(p, d)
> +#  define AV_WB16(p, d) ST16(p, d)
> +#  define AV_WB32(p, d) ST32(p, d)
> +# else
> +#  define AV_RL16(x)    LD16(x)
> +#  define AV_RL32(x)    LD32(x)
> +#  define AV_RB16(x)    AV_RBE16(x)
> +#  define AV_RB32(x)    AV_RBE32(x)
> +#  define AV_WL16(p, d) ST16(p, d)
> +#  define AV_WL32(p, d) ST32(p, d)
> +#  define AV_WB16(p, d) AV_WBE16(p, d)
> +#  define AV_WB32(p, d) AV_WBE32(p, d)
> +# endif
> +#else
> +# define AV_RL16(x)    AV_RLE16(x)
> +# define AV_RL32(x)    AV_RLE32(x)
> +# define AV_RB16(x)    AV_RBE16(x)
> +# define AV_RB32(x)    AV_RBE32(x)
> +# define AV_WL16(p, d) AV_WLE16(p, d)
> +# define AV_WL32(p, d) AV_WLE32(p, d)
> +# define AV_WB16(p, d) AV_WBE16(p, d)
> +# define AV_WB32(p, d) AV_WBE32(p, d)
> +#endif

#ifdef HAVE_FAST_UNALIGNED
#   ifdef WORDS_BIGENDIAN
        define AV_RB16(x)    LD16(x)
        define AV_RB32(x)    LD32(x)
        define AV_WB16(p, d) ST16(p, d)
        define AV_WB32(p, d) ST32(p, d)
#   else
        define AV_RL16(x)    LD16(x)
        define AV_RL32(x)    LD32(x)
        define AV_WL16(p, d) ST16(p, d)
        define AV_WL32(p, d) ST32(p, d)
#   endif
#endif

also it might be worth to check bswap(LD32(x)) style too ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070311/08d0027a/attachment.pgp>