[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Michael Niedermayer michaelni
Wed Jan 16 04:02:47 CET 2008


On Wed, Jan 16, 2008 at 04:05:54AM +0200, Ivan Kalvachev wrote:
[...]
> On Jan 14, 2008 11:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Mon, Jan 14, 2008 at 09:12:40PM +0100, Christophe GISQUET wrote:
> > [...]
> > > > - Am I wrong or you do all the math in 16 bit signed saturation mode?
> > > > According to vc1 draft in first stage the input is in the range
> > > > [-2048;2047] the multiply constants  are in range [-16;16], this makes
> > > > range [-32768;32768] per multiply and you can have 8 of them.
> > > > Or multiply constants in range [-22;22], that make range
> > > > [-45056;45056] per multiply and you can have 4 of them.
> >
> > you are missing a detail here
> > 45056 >> 3 would be > 4096 thus possibly violate the limit for the 2nd stage
> > input. Still the 512 limit of the output with >>7 before does not look like
> > the naive implementation will work with 16bit
> 
> It's not my fault that at M$ cannot math ;)
> The draft actually says that the intermediate result have to be
> saturated in that range. So it is possible that the C variant also
> doesn't work according to the specs. I wonder what the reference
> source does?

probably something else entirely :)
and i wouldnt be surprised if their own binary encoder + their binary
decoder would for same inputs have overflows which badly trash the image.
This happened with their msmpeg4v3 codec (like on white text over black
background - low QP - p frames) they added a bit in one of the wmv versions
IIRC. But it still wasnt enough to handle the max dct coeff range :)

So i wouldnt take it too serious if MS says something about the max ranges
of values in the idct.


> 
> 
> > > > In the second phase the input range is doubled to [-4096,4095]
> > > >
> > > > Are you sure your transforms produce the same result as their _c equivalents?
> > >
> > > I did test bit exactness (with win32 dll output) but albeit on few
> > > sequences. Everything was perfect.
> > >
> > > The reference I found said it could be done on 16 bits maths.  Maybe it
> > > needs a bias to correct, but as output is usually in the range
> > > [-128;127], it's pretty symmetrical. However, indeed, it would be better
> > > if proof could be given.
> >
> > theres a difference between "can be done" and "it works with the naive
> > implementation"
> >
> > as random example:
> >
> > naive:
> > (22*X+17*Y) >> 3 will not work with 16bit and X and or Y =2048
> >
> > alternative:
> > ((X + ((X + (Y>>1))>>1))>>1) + 2*(X+Y) should work fine
> >
> >
> > there are of course many intermediate variants
> > the key point to keep in mind is that (2*x + y)>>1 == x + (y>>1)
> 
> I wonder if there is some collection of nasty mmx tricks, like the above one.

ffmpegs source ;)

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have never wished to cater to the crowd; for what I know they do not
approve, and what they approve I do not know. -- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080116/ca890aad/attachment.pgp>



More information about the ffmpeg-devel mailing list