crypto: arm/crct10dif - Implement plain NEON variant - Linux

diff options

author	Ard Biesheuvel <ardb@kernel.org>	2024-11-05 17:09:06 +0100
committer	Herbert Xu <herbert@gondor.apana.org.au>	2024-11-15 19:52:51 +0800
commit	e7c1d1c9b2023decb855ec4c921a7d78abbf64eb (patch)
tree	34837564a16ba4a5787a3b09b712012401f2d287 /lib/crypto/mpi/mpi-bit.c
parent	802d8d110ce2b3ae979221551f4cb168e2f5e464 (diff)
download	linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.tar.gz linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.tar.bz2 linux-e7c1d1c9b2023decb855ec4c921a7d78abbf64eb.zip

crypto: arm/crct10dif - Implement plain NEON variant

The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in the folding coefficients, which are also only 16 bits wide. This means that the polynomial multiplications involving these coefficients can be performed using 8-bit long polynomial multiplication (8x8 -> 16) in only a few steps, and this is an instruction that is part of the base NEON ISA, which is all most real ARMv7 cores implement. (The 64-bit PMULL instruction is part of the crypto extensions, which are only implemented by 64-bit cores) The final reduction is a bit more involved, but we can delegate that to the generic CRC-T10DIF implementation after folding the entire input into a 16 byte vector. This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit mode. On Cortex-A8 (BeagleBone White), the results are substantially better than that, but not sufficiently reproducible (with tcrypt) to quote a number here. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Diffstat (limited to 'lib/crypto/mpi/mpi-bit.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: