[Ffmpeg-devel] [PATCH] cbc decoding for aes

Mon Jan 15 10:02:47 CET 2007

Hello,
On Sun, Jan 14, 2007 at 11:58:31PM +0100, Michael Niedermayer wrote:
> On Sun, Jan 14, 2007 at 10:27:37PM +0100, Reimar D?ffinger wrote:
> [...]
> >  
> > +static inline void copyblock(uint64_t dst[2], const uint64_t src[2]){
> > +    dst[0] = src[0];
> > +    dst[1] = src[1];
> > +}
> 
> whats the problem with memcpy() ? gcc does replace constant memcpy() with
> not that bad code IIRC

I didn't trust it. But I'll give it another try.

> > +
> >  #define SUBSHIFT0(s, box)         s[0]=box[s[ 0]]; s[ 4]=box[s[ 4]];          s[ 8]=box[s[ 8]]; s[12]=box[s[12]];
> >  #define SUBSHIFT1(s, box) t=s[0]; s[0]=box[s[ 4]]; s[ 4]=box[s[ 8]];          s[ 8]=box[s[12]]; s[12]=box[t];
> >  #define SUBSHIFT2(s, box) t=s[0]; s[0]=box[s[ 8]]; s[ 8]=box[    t]; t=s[ 4]; s[ 4]=box[s[12]]; s[12]=box[t];
> > @@ -95,6 +100,17 @@
> >      crypt(a, 0, inv_sbox, dec_multbl);
> >  }
> >  
> > +void av_aes_cbc_decrypt(AVAES *a, uint8_t *mem, int blockcnt, uint8_t *iv) {
> 
> why not have a src and dst?
> is it slower?

Well, at least in my use case I would have to allocate another buffer
for that. It actually might be faster (the decryption itself, together
with cache effects it could easily be slower overall though) since we would
have to copy iv only once per function call. Though about speed I have
been wondering if we really have to use that state var in the context,
it means an additional copy in and copy out if we provide a function
with src != dst. Of course if ot using it the calling application must
align the buffers suitably, which even in my use case could be a
problem (AES in MXF in case someone didn't guess *g*)...
Anyway, my priority was just being able to test, leaving optimization for later
*g*

> > +    while (blockcnt-- > 0) {
> > +        copyblock(a->state, mem);
> > +        crypt(a, 0, inv_sbox, dec_multbl);
> 
> is it slower with av_aes_decrypt()?

Probably not, but to be honest I find av_aes_decrypt pretty useless
as exported function at least since I can hardly imagine a use case
where the application would want to decrypt only 16 bytes...

> > +        mem += 16;
> > +    }
> > +}
> 
> isnt a for(;mem < end; mem+=16) faster?

To be honest I don't have any code to test such "minor" speed
differences yet *g*

Greetings,
Reimar D?ffinger