[FFmpeg-devel] [PATCH] correct make test failure from 15261?release until now (15899)

Fri Nov 21 23:21:14 CET 2008

On Fri, Nov 21, 2008 at 11:52:05PM +0200, Siarhei Siamashka wrote:
> On Friday 21 November 2008, Siarhei Siamashka wrote:
> > On Friday 21 November 2008, Michael Niedermayer wrote:
> > > On Fri, Nov 21, 2008 at 09:15:30PM +0100, Vitor Sessak wrote:
> > > > David Geldreich wrote:
> > > > > Hello Guillaume,
> > > > >
> > > > > Le 21 nov. 08 ? 17:25, Guillaume POIRIER a ?crit :
> > > > >> This is not the way to go. Reg tests pass on AMD64/Linux, so the
> > > > >> code must be fixed to work the same on any plateform. The md5sum
> > > > >> should not match X plateform results, but all plateforms result.
> > > > >
> > > > > That's why I made another post to tell to ignore my proposed patch.
> > > > >
> > > > > I found no way of making sin/sinf works the same way on all the
> > > > > platform. In my case, OSX ppc and intel gives different results.
> > > > >
> > > > > So changes r15261 and r14982 are incomplete ... they correct the
> > > > > problem for AMD64 but breaks in on Intel32.
> > > > >
> > > > > We must iterate to find a "stable" sine window generating function.
> > > >
> > > > Even if we find a way to generate a sine window in an arch-independent
> > > > way, the codec still uses floating points in other places, so if it
> > > > ever is bit-identical across PPC and I32, I don't see any reason not to
> > > > see a different output when testing on ARM or SH or GCC 6.4 or whatever
> > > > we'll encounter in future. Unless someone tells me why it is supposed
> > > > to work as is, I think that this test should be removed...
> > >
> > > ratecontrol in video uses floats, and other parts do too, we arent seeing
> > > problems with these and arent disabling them
> > >
> > > If you argue that the wma test should be disabled because it is not
> > > matching between some important systems, thats something i can understand
> > >
> > > but, arguing that itz should be disabled because it might theoretically
> > > not work on some architecture or not yet existing compiler is well ...
> > >
> > > And last, id say disable the float code for the regression tests,
> > > replacing the whole fft by a memcpy() if it doesnt match beteen archs
> > > is alot better than removing a regression test.
> > > These tests are important to catch bugs early ...
> >
> > Still what about trying to make regression tests resistant to minor
> > acceptable differences in the generated output?
> 
> OK, let's start brainstorming. Here is the first idea.
> 
> What about trying to use sum of all the samples from the file generated by
> reference decoder and the sum of samples generated by tested decoder? If PSNR
> is good, the difference between these two sums will be reasonably small (zero
> for the identical files).
> 
> Of course just a single check is not enough, so this test can be extended by
> calculating not just a simple sum, but also sums using different signs of
> values according to some (semirandom) patterns when accumulating. The set of
> such values would represent a decoding result fingerprint which is to be
> compared to a fingerprint generated by a reference decoder.
> 
> This difference of values from the fingerprints for two files will probably
> have gaussian distribution according to central limit theorem (at least for
> the case of minor differences). This set of differences can be analyzed by
> some statistical method. I would suggest to have a look at
> http://en.wikipedia.org/wiki/Chi-square_distribution for the start.
> 
> But any kind of empirical test might be useful if it is reliable enough to
> detect bugs on some practical cases. One might also try to do a search in the
> Internet, maybe some kind of algorithms of doing such comparison already exist
> and there is no need to reinvent the wheel :)

some examples of bugs
a fliped bit in the header, it has no effect on the outputed samples but is
wrong
different interleaving of audio and video packets, again no effect on the
output as such but AV sync breaks
some invalid timestamps, again no effect on the output samples
some missing metadata, again no effect  on the output samples
a optimization causing slight +-1 rounding errors, this of course by
definition will not be distinguishable from anything ignoring small changes

IMO there are things that are useful
1. adding integer support for things (especially ones that cause problems
   in the regression tests) this as a side effect will make some non desktop
   cpus happy ...
2. disabling the part that causes a problems between systems (fft->memcpy)
   or even disabling the wma test as suggested by some

and there are things that make IMHO little sense
1. trying to distinguish bugs from platform depentdant rounding in the
   encoded bitstream
2. trying to distinguish bugs from platform depentdant rounding in the
   decoded samples

The first is like testing if 2 gzip files differ by a typo or by whitespace
changes without decompressing them

Basically IMO you are trying to find a complex solution that even in the
best case would only halfway work while ignoring the obvious and simple
solutions.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081121/03d6bbc1/attachment.pgp>