[FFmpeg-devel] a64 encoder 7th round

Michael Niedermayer michaelni
Sun Feb 1 17:07:48 CET 2009


On Sun, Feb 01, 2009 at 12:11:15PM +0100, Bitbreaker/METALVOTZE wrote:
[...]
> > you dont load them
> > i did mention this didnt i? Reuse data from the previous frame or 4th pevious
> > depending on mem layout, like P frames.
> > you have 1000 byte of this colorram thing storing the 5th color, why do you
> > change it every frame?
> > change it every 2 or 4, and ideally let the encoder decide when to update
> > within some limit instead of hardcoding a every 4 frame update.
> >   
> All that solutions always also drop quite some overhead on the displayer 
> and make framesize vary, what makes the usage of the buffers kind of 
> complex. I request the desired packet size in advance and then get as 
> many bytes sent as requested. It is all such a pain for no gain. I have 
> already tried quite some attempts to speed up things, here larger frames 
> brought improvement (before i had a maximum framesize of 0x100), but all 
> the rest didn't work out with a better framerate (for e.g. RLE attempt).
> 
> Loading happens roughly like this (lifetime is 4, so a frame is 0x600 + 
> colorram):
> - request another packet with 0x200 bytes
> - receive a packet with 0x400 bytes data (request for that already sent 
> in last round).
> - now i read the data from the received packet while the pc sends 
> already the next data as i requested it already beforehand to save time
> - request 0xxxx for colorram
> - read the 0x200 bytes left from previous request.
> - request next frame data 0x400
> - wait for 2nd vsync
> - read colorram data nad write it immedeately to colorram
> - loop
> 

> > also whats this 18700 cycles thing?
> > 4col mode can be done in 2vsync
> > 5col needs 3vsync
> > the difference are 25*40=1000 byte per frame, which our 8 entry LUT
> > would need ~10 cycles per byte, thats 10k cycles not 18.7k
> > and as you said your normal copy is not optimal either so there must
> > be more headroom if 4col works at full speed currently.
> >   
> Updating must either start synced to vsync and be maximum as fast as the 
> rasterbeam, else you will see the update on the screen.

I really dont see why
IMO
rv  raster beam velocity
wv  write velocity
wv > rv/2
if you are faster than the rasterbeam you of course must start sufficiently
behind the rasterbeam so as not to cross it during the copy.

also there are
1. the 1000 byte of chars
2. the 2048 bytes of the charset
3. the 1000 byte of the colorram

you write 2. split in 512 byte blocks so each frame gets
1000 + 512 + 1000 to copy and charset is updated once every 4 frames

also, we know that your not perfectly optimized 1000+512 byte 4col code
can do 2vsync

so at least the following would be possible:
each frame contains 6 256 byte blocks with a type in front

types 0-7  could point to the 8 256 byte parts of the charset
types 8-15 could point to the 8 256 byte parts of the charset with a
flip of the charset (assuming this can be triggered seperately)
types 16-19 could point to the 4 256 byte chars
types 20-23 could point to the 4 256 byte chars with a page flip if
this is possible
24-27 could point to the 4 256 byte parts of the colorram (stored
compressed to 32 byte)

* this would not increase the amount of copying by a byte (you had padding
  that can be used for the 6 control bytes)
* it would be equivalent to your 4col mode if the colorram update where not
  used
* it would give the encoder alot more flexibility in what to update

and of course if you could squeeze another 256 byte copy in per frame that
would improve the choices the encoder had. Similarly if blocks where 128
byte that would mean more flexibility.

and a step further,
spliting the 25x40 screen in 50 5x4 areas, each could be coded as skip
or intra (aka copy from the net vs. do nothing)
similarly the colorram could be updated
now because many blocks will be skiped in a typical video the contraint
between chars and colorram update should be lifted.

and yet another step forward to a normal video codec,
you have 4? independant copies of the chars one of them being
displayed the others being off display and then fliped at vsync
you can write to more than 1 of these pages thus you can write up
to 3 frames ahead this allows (for example) the 4th frame to use
more than the time of 2 vsyncs and still never drop below 2vsync
per frame.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090201/a621c087/attachment.pgp>



More information about the ffmpeg-devel mailing list