[Ffmpeg-devel] New Video Codec for low grunt embedded CPU's

Mon Mar 20 11:02:52 CET 2006

Hi,

I need a Codec for Low Power (as in Grunt) CPU's (Im talking less than
100Mhz here.)

Im thinking nothing (other than FLC/FLX which we already use) really
fits the bill, here are the requirements:

1. Animations able to be created using "off the shelf" (which means
windows based) tools.
2. Easy and efficient to implement a conforming decoder.
3. Encoder can be slow and hard to implement, because the devices im
thinking of, dont encode. Encoding is done offline (on a PC).
4. Decoding must be able to be done with simple integer only
mathematics. (32bpp operations OK).
5. Minimum Frame size of 8 x 8 pixels. (Yes its tiny). Frame size
increments in 1 pixel after that (eg, 9x13 is OK).
6. RGB Colour Space, so that the decoder doesnt need to do any colour
space conversion (and waste cycles doing it).
7. Must be able to support 8bpp paletised animations, as well as
15/16bpp (24bpp as well, even though i dont need it).
8. Lossless, at small resolutions I cant afford any encoding artifacts,
as they are visible and look really crappy.

Im sure there are other things, but I think this gives you the idea.

The current thought for encoding is as follows (Without worrying about
the boundary conditions at this stage):

1. Take a frame to encode (call it raw frame), If its an 8bpp palette
image, the frame is stored as the 256 palette entries, followed by the
palette image data.
2. Subtract it from the previous frame (unsigned integer subtraction),
for 8bpp and 24bpp, subtract each byte. For 15/16Bpp subtract each word.
(call this delta frame)
3. Scan through the raw-frame and the delta frame, creating re-ordered
versions, according to fixed patterns (8x8, 16x16 blocks, standard
zig-zag).  If its an 8bpp frame, the palette is skipped for the
re-ordering process.
    ie
    1,2,3,4,5,6,7,8
    2,3,4,5,6,7,8,9
    3,4,5,6,7,8,9,A
    4,5,6,7,8,9,A,B
    5,6,7,8,9,A,B,C
    6,7,8,9,A,B,C,D
    7,8,9,A,B,C,D,E
    8,9,A,B,C,D,E,F

   Start a 1, then do 2,2 then do 3,3,3 then do 4,4,4,4, etc until you
hit F, going back and forth in a zig zag type pattern.
   Similar to how i understand MPEG does it. (Which is probably wrong).

   Do this for both 8x8 and 16x16, and call them raw-reordered-8,
delta-reordered-8, raw-reordered-16 and delta-reordered-16

4. Now we have 6 different versions of the frame. Raw-frame,
Delta-frame, raw-reordered-8, delta-reordered-8, raw-reordered-16 and
delta-reordered-16.  Take each of these, and attempt to compress them
with NRV Compression each of types B,D and E, at level 99.  Smallest
frame size wins, and is stored in the file. Encoded data is preceded by
a 1 byte field which indicates what sort of frame is compressed (of the
6, and what NRV algorithm to use (B,D,E) (18 possible choices).  If none
of the results are smaller than the original raw frame size, the raw
frame is stored (the 19th possible result).
5. The container for the animation is AVI, so a Windows compliant Codec
can be created (satisfying the need to be able to work with standard
windows tools).  And AVI is low complexity.

My Theories are:
a) step 2 produces a delta image that should be well compressed in step
4, because NRV does a good job with runs of the same data pattern, so if
the same pixels appear in 2 subsequent frames, they subtract to 0, even
if adjacent pixels are different, areas that are the same will turn into
big regions of 0.
b) Step 3 clumps data that is spatially common together, to try and
maximize the ability of the NRV compressor
 to find runs of similar data.
c) Step 4 might take a little while to run all 18 permutations and pick
the best, but its worthwhile, because the resulting file will be as
small as it could possibly be with the given algorithm.

Anyway thats the current thought.  The decoder would be easy, because it
is easy to implement hand coded NRV decompressors in Assembler (and they
are very fast on low end processors, as opposed to gzip or bzip type
algorithms, and my experience with them is they are almost as good, and
in some cases, depending on the data better).  And the rest of the
manipulations on decode are trivial.

My intention is to Specify this into a formal spec, and write a
"Reference" Windows Codec and FFMPEG Encoder/Decoder for it.  All of
this I would do under the GPL, and would be submitted for inclusion into
FFMPEG if its of interest.

Are there any comments, or thoughts on this?  Any suggestions on how it
can be improved, without getting too complex on decode?  Is this
something your interested in being put into FFMPEG?  Do you think I
should use a standard codec that already exists, and not re-invent the
wheel?

Steven J