[FFmpeg-devel] libavutil: added camellia block cipher

Sat Dec 27 18:34:26 CET 2014

2014-12-26 20:01 GMT+01:00 Michael Niedermayer <michaelni at gmx.at>:
> [...]
>
>> +static uint64_t F(uint64_t F_IN, uint64_t KE)
>> +{
>> +    uint32_t Zl, Zr;
>
>> +    Zl = (F_IN >> 32) ^ (KE >> 32);
>> +    Zr = (F_IN & MASK32) ^ (KE & MASK32);
>
> KE ^= F_IN;
> Zl = KE >> 32;
> Zr = KE & MASK32;
>
>
>> +    Zl = ((SBOX1[(Zl >> 24) & MASK8] << 24) | (SBOX2[(Zl >> 16) & MASK8] << 16) |(SBOX3[(Zl >> 8) & MASK8] << 8) |(SBOX4[Zl & MASK8]));
>> +    Zr = ((SBOX2[(Zr >> 24) & MASK8] << 24) | (SBOX3[(Zr >> 16) & MASK8] << 16) |(SBOX4[(Zr >> 8) & MASK8] << 8) |(SBOX1[Zr & MASK8]));
>
> (Zl >> 24) and (Zr >> 24) are limited to 8bit they should not need
> & MASK8
>
> ((uint32_t)SBOX1[Zl >> 24]) << 24)

Maybe this will be useful later: on 64-bit processors, if MASK8 is a
64-bit constant, this may be faster:

KE ^= F_IN;
Zl = ((uint32_t)SBOX1[KE >> 56] << 24) | ((uint32_t)SBOX2[(KE >> 48) &
MASK8] << 16) | ...

> +    Zl ^= LR32(Zr, 8);
> +    Zr ^= LR32(Zl, 16);
> +    Zl ^= RR32(Zr, 8);
> +    Zr ^= RR32(Zl, 8);

The instructions above have a long critical path (each one depends on
the previous one), and this is probably where we lose most speed at
the moment.

> it would also be possible to reduce the number of operations at the
> expense of larger tables but iam not sure that would be a good idea

On 64-bit processors, a big speedup can be obtained by computing S and
P operation together, using 8 8x64 bit sboxes (a total of 16kB of
data) that can be computed in the initialization phase from
SBOX1...SBOX4.

But all these suggestions can be implemented later. My main objection
with this patch is using one big array for all subkeys.

>
>
> [...]
>
>> +static const int shift1[2][6] = {
>> +    {0, 15, 30, 17, 17, 17},
>> +    {0, 15, 15, 15, 34, 17}
>> +};
>> +static const int pos1[2][6] = {
>> +    {0, 4, 10, 16, 18, 22},
>> +    {2, 6, 8, 14, 20, 24}
>> +};
>> +static const int pos2[4][4]= {
>> +    {0, 12, 16, 22},
>> +    {6, 14, 24, 28},
>> +    {2, 10, 20, 32},
>> +    {4, 8, 18, 26}
>> +};
>> +static const int shift2[4][5]= {
>> +    {0, 45, 15, 17},
>> +    {15, 30, 32, 17},
>> +    {0, 30, 30, 51},
>> +    {15, 15, 30, 34}
>> +};
>
> these could be made uint8_t
>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User
> questions about the command line tools should be sent to the ffmpeg-user ML.
> And questions about how to use libav* should be sent to the libav-user ML.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>