[FFmpeg-devel] [PATCH] ARM: remove useless stack push/pop

Måns Rullgård mans
Wed Jun 9 01:43:54 CEST 2010

Rafa?l Carr? <rafael.carre at gmail.com> writes:

> Hi,
> r12 doesn't need to be saved in called functions because it's a scratch
> register.
> While I'm here, did anyone try to build FFmpeg with -mthumb yet ?

Yes, gcc generated invalid asm.  For that reason, and others, we force
-marm.  There is no gain from using thumb with ffmpeg.

> "grep -Er '(pop|ldm).*pc' libavcodec/arm" shows that there is a lot of
> functions which can't be called from thumb on armv4t : using ldm ...,pc
> will not perform the switch from arm to thumb on these CPU.

So use interworking if you need to.  Any decent linker support that.

> If you want to support both thumb code and armv4t this needs changing
> to use 1 more instruction (without speed cost on anything but arm7tdmi
> where it would take 1 more cycle to return).

Thumb doesn't work anyway, so there's no point.

See also a blog post I did some time ago on the topic.  Perhaps I
should revisit that.

BTW, many, if not most, Cortex-A8 chips in the field have hardware
bugs rendering any mixing of Thumb and ARM code unreliable.  Older
cores work, but most of those are pre-Thumb2 and the speed penalty
there is too great for FFmpeg.

> diff --git a/libavcodec/arm/jrevdct_arm.S b/libavcodec/arm/jrevdct_arm.S
> index 4fcf351..4ce37d0 100644
> --- a/libavcodec/arm/jrevdct_arm.S
> +++ b/libavcodec/arm/jrevdct_arm.S
> @@ -58,7 +58,7 @@
>          .align
>  function ff_j_rev_dct_arm, export=1
> -        stmdb   sp!, { r4 - r12, lr }   @ all callee saved regs
> +        stmdb   sp!, { r4 - r11, lr }   @ all callee saved regs
>          sub sp, sp, #4                  @ reserve some space on the stack
>          str r0, [ sp ]                  @ save the DCT pointer to the stack
> @@ -369,7 +369,7 @@ empty_odd_column:
>  the_end:
>          @ The end....
>          add sp, sp, #4
> -        ldmia   sp!, { r4 - r12, pc }   @ restore callee saved regs and return
> +        ldmia   sp!, { r4 - r11, pc }   @ restore callee saved regs and return

Does this function call any other functions?  If so, the stack must
maintain 8-byte alignment, and this is the easiest way to accomplish
that.  Not that you'd want to use that DCT implementation anyway.

M?ns Rullg?rd
mans at mansr.com

More information about the ffmpeg-devel mailing list