[FFmpeg-devel] a64 encoder 7th round

Fri Jan 30 15:18:41 CET 2009

On Fri, Jan 30, 2009 at 12:38:47PM +0100, Bitbreaker/METALVOTZE wrote:
[...]
>> so if your input flickers your decoder will remove it
>> and if it doesnt flicker it needs a hardcoded list of color pairs to
>> avoid generating flicker
>
> The decoder?! You mean i should avoid flicker on c64 side? How is that 

i meant the encoder of course, sorry

[...]
>> you consider gamma?
>> flicker?
>> luma chroma crosstalk from PAL?
>> lowpass filter effects the probably somewhet  primitive electronics used 
>> to
>> modulate PAL likely cause?
>
> How big is the improvement you expect when all this is implemented? I guess 
> keeeping it easy to achieve nearly the same result would also be a good 
> option?

i dont know at all by how much things can be improved with these ...
but not considering something isnt going to make the result better

> Btw.: The reason why black pixels consume a part of the following pixel is 
> due to the type of transistors used in the video chip. There are 
> NMOS/HMOS-transistors that need too long to reach the desired output level. 
> So actually that phenomenon happens on every pixel, but as black has the 
> way biggest luminance distance to all other colors used on that machine, 
> here it gets very visible. So that more a c64 thing than a PAL thing.

C64 or PAL doesnt matter, these things should be considered IMHO

>
>
>> You did not show an example of a properly done comparission on the C64
>> nor did you provide any theoretic explanation why it would be worse
>
> See attached .gif. I expanded it to double width, just as it would be 
> displayed in multicolor mode. There are lots of vertical structures that 
> build lines/blocks and those get very visible and distracting, even on my 
> TFT. Even more when you zoom in a bit (remember, 320x200 on a 14" screen, 

well, i see the problem but its not a problem because you can
* change the coefficients used in the ED dither
* do the dither in columns instead of rows (would produce horizontal lines)
* do it diagonally (might produce diagonal lines)
* use another form of dither, ED is not the only one, from the page i provided
  the anticorrelation dither is free of line artifacts but i dont know what
  this algo does so cant speak for it.

> so pixels are BIG). Having a pattern of alternating vertical lines is no 
> good idea for another reason. Due to the PAL signal on every colorchange a 
> bit of distortion (depending on the color combination) is added, so the 
> lines appear even more pronounced. It is a bit like a sharpening filter 
> (btw, this looks even worse in hires mode).
> Also, if doing error diffusion beforehand, that would mean a waste of 
> blocks, if i do it afterwards, it would mean a lot of blocks would not fit 
> in regards to their dither pattern. There might of course be algorithms to 
> work around these problems, but alas, why should i bother if it looks 
> worse?

I think dither should be done during ELBG, that way the dither pattern of
blocks where known and other blocks could be dithered in a way considering
their surroundings

[...]
>> just to clarify i meant
>> dst= get_from_net()<<8;
>> x= get_from_net();
>> do{
>>     a= get_from_net();
>>     dst[x+0x0] = a;
>>     dst[x+0x4] = a;
>>     dst[x+0x8] = a;
>>     dst[x+0xC] = a;
>>     x= get_from_net();
>> }while(x)
>> i dont think your code is equivalent to that
>
> Then let's give it another try:
>
>    lda $de00
>    sta a1+1
>    sta a2+1
>    sta a3+1
>    sta a4+1
>    ldx $de01
> ;24 cycles setup
> loop
>    lda $de00
> a1 sta $0000,x
> a2 sta $0004,x
> a3 sta $0008,x
> a4 sta $000c,x
>    ldx $de01
>    bne loop
> ;+ 31 cycles per loop
>
> And what shall i do with that kind of loop now? It can most of all fill 
> bigger areas (mininum 16 bytes) with the same value/pattern, yes. As for 
> the charset you will most likely have 8 consecutive bytes like $00 or $ff, 
> in lucky cases maybe 15, as due to optimal charset usage the same block 
> will not appear more than once. Same goes usually for the charmap, and as 
> for the colorram (if used at all) i tried RLE already, by using the unused 
> highnibble for the amount. It did not perform better.
> In addition, you still need a decision wether your loop or normal loading 
> shall be used. How will that look like? Comparing every byte to find an 
> escape sequence? (so that is expensive). So your loading routine must best 
> be able to decode a complete block (whole charset/whole charmap/whole 
> colorram) at once to save such overhead.
> All that would only be worth the effort if it saves ~1/50s in time. That is 
> where i could switch to a higher framerate and wait for one less vsync.
> So i don't know what you want to achieve. Frames are already small, loading 
> is already very fast and more than sufficient for the modes i use. In case 
> of bigger frames (for e.g. when sending whole bitmaps) loading as well as 
> decoding will be too slow anyway to obtain a good framerate. It is just the 
> wrong direction to search for improvements. It simply is, that not a few 
> more bytes bring better quality, but huge steps would. The next quality 
> step after multicolor charset would be sending a whole bitmap instead, but 
> that would be 8kb. If you are able to load+decode it in the time i would 
> normally load 3kb, that would be awsome. But i'd just say, it is 
> unrealistic :-)

Well, i dont belive it is unreaistic :)
the code i provided above is supposed to write 4 equal elements (like one of
the colors of 4 consecutive chars)
depending on internal organization of bytes this maybe can even be changed to
write to 2x2 chars that is a square
A similar routine could write 4 different colors or indexes or other stuff to
4 consecutive chars (my terminology is crap i know and i know i dont know the
mem layout ...)

what is it good for?
well
you have 4 frames as you said these are in memory somewhere, and when your
irq wakes you you flip to display another frame and then memcpy() over the
not current displayed from the net.
my code does set 4 chars to the same information, these are 4 chars whos
pos is also read from the net, thus for every 256 byte area both a
4 equal and 4 different chars loop is run, thats just 2 loops not 1 loop
with an expensive check at each 4 byte group
This allows you 3 things
1. you can set any goup of 4 bytes to arbitrary values (relatively fast but
   slower than currently)
2. you can set any group of 4 byes to equal values (likely a little faster
   than currently and needs 1/2 the bitrate)
3. you can leave any group of 4 bytes equal to what it was 4 frames ago (VERY
   important because in normal videos things likely dont change that much from
   frame to frame)

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090130/4cec7b9d/attachment.pgp>