[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM

Siarhei Siamashka siarhei.siamashka
Sat Jun 28 09:31:42 CEST 2008


On Saturday 28 June 2008, M?ns Rullg?rd wrote:
> Michael Niedermayer <michaelni at gmx.at> writes:
> > Do we have someone who has a arm cpu and can look into the above issue?
>
> I know exactly why it's different.  In simple_idct.c, the column
> transform contains these lines:
>
>         /* XXX: I did that only to give same values as previous code */
>         a0 = W4 * (col[8*0] + ((1<<(COL_SHIFT-1))/W4));
>
> It's simpler to code that as a0 = W4 * col[0] + (1 << (COL_SHIFT-1)).
> Thinking about it, it only takes one more instruction on NEON, and
> I've fixed that in my tree.  With a little luck, the extra instruction
> can be dual-issued with something else.

This part does not have any extra overhead in my finetuned version 
of ARMv5TE IDCT:

  ldr    v1, xxx         /* v1 = (((1<<(COL_SHIFT-1))/W4)*W4) */
  [some unrelated instructions to hide load latency]
  smlatt v2, a2, v4, v1  /* A0t = W4 * (col_t[0] + ((1<<(COL_SHIFT-1))/W4)) */

There is no reason why ARMv6 or NEON should have overhead too. So getting
bit-identical results to C simple_idct is possible without sacrificing 
performance. 

> > Ideally would be the authors who claimed the code to be identical to the
> > C code ...
>
> I wrote the ARMv6 version, but I never made any such claim.  In fact,
> I believe I mentioned at the time that there was a slight difference.
>
> > If we have noone then we will likely have to disable these IDCTs. I do
> > not want to create files that turn green and pink unless they are played
> > on an ARM cpu ...
>
> I don't think the ARM CPUs where these apply will be used mostly for
> playback, not encoding, and on those machines every cycle counts.

Yes, that was one of the reasons why I did not strongly insist on disabling
j_rev_dct_ARM that time (people could get a severe performance regressions 
and complain about it) :)

In any case, ARMv6 idct still needs heavy optimizations, it is not very fast
(on its target devices with ARM11 CPUs of course).

-- 
Best regards,
Siarhei Siamashka




More information about the ffmpeg-devel mailing list