[FFmpeg-devel] [RFC] Loop unrolling in C code for 'vector_fmul_*' functions

mark cox melbournemark+ffmpeg
Fri Jan 11 00:44:07 CET 2008


On Jan 11, 2008 7:39 AM, Michael Niedermayer <michaelni at gmx.at> wrote:

> On Tue, Jan 08, 2008 at 02:20:07AM +0200, Siarhei Siamashka wrote:
> [...]
> > But at least for ARM, looks like the compiler is quite stupid and can't
> > schedule instructions properly as seen from the benchmark results (just
> > unrolling loop is not enough and some extra tweaks are needed
> > in 'vector_fmul_c_other_unrolled'). VFP coprocessor has a high result
> latency
> > (8 cycles), though throughput is quite good (1 cycle) and some other
> nice
> > features which can improve performance exist (documantation for VFP can
> be
> > found at http://www.arm.com). The compiler (gcc) does not even try to
> reorder
> > instructions and pipeline is just stalled most of the time. I would not
> be
> > surprised if the compiler screwed up and generated something suboptimal
> on
> > more complicated floating point stuff as well (fft and imdct).
>
> Please submit reports to the gcc devels for every case of suboptimal code
> generated by gcc you stumble across!
> Its much better if gcc would be improved instead of everyone having to
> hand
> schedule c code.
>
>
> >
> > Tweaking C code, performance can be improved quite a lot
> > ('vector_fmul_c_other_unrolled' vs. 'vector_fmul_c_unrolled').
> > But such unnesessarily cluttering code because of inefficient compilers
> is not
> > a good option. Anyway, probably at least just loops can be unrolled to
> help
> > the compiler do its job? The compiler itself does not know that 'len is
> a
> > multiple of 8' and manual loops unrolling seems to be reasonable.
>
> Add a assert((len & 7) == 0); and the compiler can know it.
>

That is a really interesting statement. are you saying that gcc will
optimize by adding such an assert? This is the first i have heard of this.
such code annotations could probably help in many places.

mark

>
>
> >
> > Well, I will do the rest of ARM VFP optimizations for all
> > these 'vector_fmul_*' functions anyway :)
>
> good
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> In a rich man's house there is no place to spit but his face.
> -- Diogenes of Sinope
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFHhoKSYR7HhwQLD6sRAhdmAJsEUBge4Gq8TQaB0EjUVzn6DgmCsgCcC7n4
> SbVr4AEgCrVWI2/VbirbnRA=
> =3jzb
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at mplayerhq.hu
> http://lists.mplayerhq.hu/mailman/listinfo/ffmpeg-devel
>




More information about the ffmpeg-devel mailing list