diff options
author | Ard Biesheuvel <ardb@kernel.org> | 2024-11-05 17:09:06 +0100 |
---|---|---|
committer | Herbert Xu <herbert@gondor.apana.org.au> | 2024-11-15 19:52:51 +0800 |
commit | e7c1d1c9b2023decb855ec4c921a7d78abbf64eb (patch) | |
tree | 34837564a16ba4a5787a3b09b712012401f2d287 /lib/crypto/mpi/mpi-bit.c | |
parent | 802d8d110ce2b3ae979221551f4cb168e2f5e464 (diff) | |
download | linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.tar.gz linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.tar.bz2 linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.zip |
crypto: arm/crct10dif - Implement plain NEON variant
The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in
the folding coefficients, which are also only 16 bits wide.
This means that the polynomial multiplications involving these
coefficients can be performed using 8-bit long polynomial multiplication
(8x8 -> 16) in only a few steps, and this is an instruction that is part
of the base NEON ISA, which is all most real ARMv7 cores implement. (The
64-bit PMULL instruction is part of the crypto extensions, which are
only implemented by 64-bit cores)
The final reduction is a bit more involved, but we can delegate that to
the generic CRC-T10DIF implementation after folding the entire input
into a 16 byte vector.
This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit
mode. On Cortex-A8 (BeagleBone White), the results are substantially
better than that, but not sufficiently reproducible (with tcrypt) to
quote a number here.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'lib/crypto/mpi/mpi-bit.c')
0 files changed, 0 insertions, 0 deletions