[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions
Loren Merritt
lorenm
Fri Nov 16 03:25:08 CET 2007
On Thu, 15 Nov 2007, Christophe GISQUET wrote:
> Allow me a rather long explanation (which should prove I'm wrong anyway)...
>
> This is the horizontal pass of the bicubic filter when there is an
> initial vertical pass beforehand. From the code, you probably have
> noticed the filters coefficients are [-1 9 9 1] or [-3 18 53 -4] with a
> scaling depending on the pair of filters used.
>
> Let's start from the first pass then, the vertical one. Imagine the
> input values:
> 255 255 255 255
> 255 255 255 255
> 255 255 255 255
> 255 255 255 255
> The filters applied can be [-3 18 53 -4]/2? or [-1 9 9 -1]/2?, so
> ignoring rounding, output would be: 2040 2040 2040 2040
>
> Now if we apply the filter in the above function, considering input:
> p1 p2 p3 p4 which are int16_t values
> If we were to use a multiplication, that would give:
> -(p1+p4)+(p2+p3)*9
> The intermediate result (p2+p3)*9 with the above test case is
> (2040+2040)*9=36720, which doesn't fit in an int16_t value. Other values
> in the expression are also signed, so we can't use uint16_t arithmetics
> either. My code was a feeble attempt at mitigating this. I made an
> identical and equally wrong comment for the other functions
>
> Discovering this, I tried to prescale the input values, which obviously
> fail. To sum it up, it seems to me 17bits arithmetics are needed here
> (contrary to what the document I have was affirming). I don't have the
> SMPTE 421M official documentation, but from what I saw, it now makes me
> think I can scrap my whole code for the horizontal pass. I don't see how
> to overcome this except by using pmaddwd.
(a) Bias it. From your explanation, the possible values should be in
[-4080,36720] which is 16 bits of dynamic range, it's just not nicely
centered on 0.
(b) Instead of (-(p1+p4)+(p2+p3)+((p2+p3)<<3))>>1, do
(((p2+p3)-(p1+p4))>>1)+((p2+p3)<<2)
--Loren Merritt
More information about the ffmpeg-devel
mailing list