[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2
Dmitry Antipov
dmantipov
Mon May 19 11:46:46 CEST 2008
Siarhei Siamashka wrote:
> But your unrolled code still "sucks" :) It has a lot of pipeline stalls
> that could be eliminated. Please read optimization manual and find
> a definition of instruction latency. That will help a lot in optimizing
> code and understanding how CPU works. ARM pipeline is quite simple to
> comprehend and you will immediately spot potential stalls after you
> get more practice with assembly code.
What docs are you using? As I understand, this is the main XScale core specification
beyond the WMMX-specific stuff: http://www.intel.com/design/intelxscale/273473.htm
(I'm slightly confused with the relationships between XScale cores and ARM{5,7,9,11}
ones).
> Let's try the following. We can start with getting a perfect version of
> 'vsad_intra16_iwmmxt' function first. Once it is done, you can focus on
> optimizing 'pix_sum' function yourself without getting any assistance or
> further hints. Once you manage to get an implementation that does not have
> any pipeline stalls, you should have enough experience and can move on to
> optimizing the rest of functions. Is this plan acceptable for you?
OK, sure. Thank you very much for your assistance and hints.
Dmitry
More information about the ffmpeg-devel
mailing list