summaryrefslogtreecommitdiffstats
path: root/net/tls
Commit message (Collapse)AuthorAgeFilesLines
* net: tls, fix WARNIING in __sk_msg_freeJohn Fastabend2024-01-141-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A splice with MSG_SPLICE_PAGES will cause tls code to use the tls_sw_sendmsg_splice path in the TLS sendmsg code to move the user provided pages from the msg into the msg_pl. This will loop over the msg until msg_pl is full, checked by sk_msg_full(msg_pl). The user can also set the MORE flag to hint stack to delay sending until receiving more pages and ideally a full buffer. If the user adds more pages to the msg than can fit in the msg_pl scatterlist (MAX_MSG_FRAGS) we should ignore the MORE flag and send the buffer anyways. What actually happens though is we abort the msg to msg_pl scatterlist setup and then because we forget to set 'full record' indicating we can no longer consume data without a send we fallthrough to the 'continue' path which will check if msg_data_left(msg) has more bytes to send and then attempts to fit them in the already full msg_pl. Then next iteration of sender doing send will encounter a full msg_pl and throw the warning in the syzbot report. To fix simply check if we have a full_record in splice code path and if not send the msg regardless of MORE flag. Reported-and-tested-by: syzbot+f2977222e0e95cec15c8@syzkaller.appspotmail.com Reported-by: Edward Adam Davis <eadavis@qq.com> Fixes: fe1e81d4f73b ("tls/sw: Support MSG_SPLICE_PAGES") Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: tls, update curr on splice as wellJohn Fastabend2023-12-071-0/+2
| | | | | | | | | | | The curr pointer must also be updated on the splice similar to how we do this for other copy types. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: fix NULL deref on tls_sw_splice_eof() with empty recordJann Horn2023-11-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | syzkaller discovered that if tls_sw_splice_eof() is executed as part of sendfile() when the plaintext/ciphertext sk_msg are empty, the send path gets confused because the empty ciphertext buffer does not have enough space for the encryption overhead. This causes tls_push_record() to go on the `split = true` path (which is only supposed to be used when interacting with an attached BPF program), and then get further confused and hit the tls_merge_open_record() path, which then assumes that there must be at least one populated buffer element, leading to a NULL deref. It is possible to have empty plaintext/ciphertext buffers if we previously bailed from tls_sw_sendmsg_locked() via the tls_trim_both_msgs() path. tls_sw_push_pending_record() already handles this case correctly; let's do the same check in tls_sw_splice_eof(). Fixes: df720d288dbb ("tls/sw: Use splice_eof() to flush") Cc: stable@vger.kernel.org Reported-by: syzbot+40d43509a099ea756317@syzkaller.appspotmail.com Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231122214447.675768-1-jannh@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: don't reset prot->aad_size and prot->tail_size for TLS_HWSabrina Dubroca2023-10-233-11/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Prior to commit 1a074f7618e8 ("tls: also use init_prot_info in tls_set_device_offload"), setting TLS_HW on TX didn't touch prot->aad_size and prot->tail_size. They are set to 0 during context allocation (tls_prot_info is embedded in tls_context, kzalloc'd by tls_ctx_create). When the RX key is configured, tls_set_sw_offload is called (for both TLS_SW and TLS_HW). If the TX key is configured in TLS_HW mode after the RX key has been installed, init_prot_info will now overwrite the correct values of aad_size and tail_size, breaking SW decryption and causing -EBADMSG errors to be returned to userspace. Since TLS_HW doesn't use aad_size and tail_size at all (for TLS1.2, tail_size is always 0, and aad_size is equal to TLS_HEADER_SIZE + rec_seq_size), we can simply drop this hunk. Fixes: 1a074f7618e8 ("tls: also use init_prot_info in tls_set_device_offload") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/979d2f89a6a994d5bb49cae49a80be54150d094d.1697653889.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-10-192-9/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. net/mac80211/key.c 02e0e426a2fb ("wifi: mac80211: fix error path key leak") 2a8b665e6bcc ("wifi: mac80211: remove key_mtx") 7d6904bf26b9 ("Merge wireless into wireless-next") https://lore.kernel.org/all/20231012113648.46eea5ec@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/ti/Kconfig a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object") 98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * tcp: allow again tcp_disconnect() when threads are waitingPaolo Abeni2023-10-132-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Tom, .NET and applications build on top of it rely on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP socket. The blamed commit below caused a regression, as such cancellation can now fail. As suggested by Eric, this change addresses the problem explicitly causing blocking I/O operation to terminate immediately (with an error) when a concurrent disconnect() is executed. Instead of tracking the number of threads blocked on a given socket, track the number of disconnect() issued on such socket. If such counter changes after a blocking operation releasing and re-acquiring the socket lock, error out the current operation. Fixes: 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting") Reported-by: Tom Deseyn <tdeseyn@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | tls: use fixed size for tls_offload_context_{tx,rx}.driver_stateSabrina Dubroca2023-10-131-2/+2
| | | | | | | | | | | | | | | | | | | | driver_state is a flex array, but is always allocated by the tls core to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by making that size explicit so that sizeof(struct tls_offload_context_{tx,rx}) works. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: validate crypto_info in a separate helperSabrina Dubroca2023-10-131-24/+27
| | | | | | | | | | | | | | Simplify do_tls_setsockopt_conf a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: remove tls_context argument from tls_set_device_offloadSabrina Dubroca2023-10-133-10/+10
| | | | | | | | | | | | | | | | | | | | | | It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. While at it, fix up the reverse xmas tree ordering. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: remove tls_context argument from tls_set_sw_offloadSabrina Dubroca2023-10-134-14/+12
| | | | | | | | | | | | | | | | | | It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: add a helper to allocate/initialize offload_ctx_txSabrina Dubroca2023-10-131-14/+25
| | | | | | | | | | | | | | Simplify tls_set_device_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: also use init_prot_info in tls_set_device_offloadSabrina Dubroca2023-10-133-14/+18
| | | | | | | | | | | | | | | | Most values are shared. Nonce size turns out to be equal to IV size for all offloadable ciphers. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: move tls_prot_info initialization out of tls_set_sw_offloadSabrina Dubroca2023-10-131-28/+34
| | | | | | | | | | | | | | Simplify tls_set_sw_offload, and allow reuse for the tls_device code. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: extract context alloc/initialization out of tls_set_sw_offloadSabrina Dubroca2023-10-131-35/+51
| | | | | | | | | | | | | | Simplify tls_set_sw_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: store iv directly within cipher_contextSabrina Dubroca2023-10-133-23/+5
| | | | | | | | | | | | | | | | TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit in cipher_context's size and can simplify the init code a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZESabrina Dubroca2023-10-134-6/+6
| | | | | | | | | | | | | | It's defined in include/net/tls.h, avoid using an overly generic name. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: store rec_seq directly within cipher_contextSabrina Dubroca2023-10-133-21/+4
| | | | | | | | | | | | | | TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: drop unnecessary cipher_type checks in tls offloadSabrina Dubroca2023-10-132-21/+4
| | | | | | | | | | | | | | | | | | | | We should never reach tls_device_reencrypt, tls_enc_record, or tls_enc_skb with a cipher_type that can't be offloaded. Replace those checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of hard-coding offloadable cipher types. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: get salt using crypto_info_salt in tls_enc_skbSabrina Dubroca2023-10-131-3/+1
| | | | | | | | | | | | | | I skipped this conversion in my previous series. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Use size_add() in call to struct_size()Gustavo A. R. Silva2023-09-181-1/+1
|/ | | | | | | | | | | | If, for any reason, the open-coded arithmetic causes a wraparound, the protection that `struct_size()` adds against potential integer overflows is defeated. Fix this by hardening call to `struct_size()` with `size_add()`. Fixes: b89fec54fd61 ("tls: rx: wrap decrypt params in a struct") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()Liu Jian2023-09-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got the below warning when do fuzzing test: BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470 Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9 CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G OE Hardware name: linux,dummy-virt (DT) Workqueue: pencrypt_parallel padata_parallel_worker Call trace: dump_backtrace+0x0/0x420 show_stack+0x34/0x44 dump_stack+0x1d0/0x248 __kasan_report+0x138/0x140 kasan_report+0x44/0x6c __asan_load4+0x94/0xd0 scatterwalk_copychunks+0x320/0x470 skcipher_next_slow+0x14c/0x290 skcipher_walk_next+0x2fc/0x480 skcipher_walk_first+0x9c/0x110 skcipher_walk_aead_common+0x380/0x440 skcipher_walk_aead_encrypt+0x54/0x70 ccm_encrypt+0x13c/0x4d0 crypto_aead_encrypt+0x7c/0xfc pcrypt_aead_enc+0x28/0x84 padata_parallel_worker+0xd0/0x2dc process_one_work+0x49c/0xbdc worker_thread+0x124/0x880 kthread+0x210/0x260 ret_from_fork+0x10/0x18 This is because the value of rec_seq of tls_crypto_info configured by the user program is too large, for example, 0xffffffffffffff. In addition, TLS is asynchronously accelerated. When tls_do_encryption() returns -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow, skmsg is released before the asynchronous encryption process ends. As a result, the UAF problem occurs during the asynchronous processing of the encryption module. If the operation is asynchronous and the encryption module returns EINPROGRESS, do not free the record information. Fixes: 635d93981786 ("net/tls: free record only on encryption error") Signed-off-by: Liu Jian <liujian56@huawei.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* tls: get cipher_name from cipher_desc in tls_set_sw_offloadSabrina Dubroca2023-08-271-25/+4
| | | | | | | | | | | | tls_cipher_desc also contains the algorithm name needed by crypto_alloc_aead, use it. Finally, use get_cipher_desc to check if the cipher_type coming from userspace is valid, and remove the cipher_type switch. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/53d021d80138aa125a9cef4468aa5ce531975a7b.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: use tls_cipher_desc to access per-cipher crypto_info in tls_set_sw_offloadSabrina Dubroca2023-08-271-76/+13
| | | | | | | | | The crypto_info_* helpers allow us to fetch pointers into the per-cipher crypto_info's data. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c23af110caf0af6b68de2f86c58064913e2e902a.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: use tls_cipher_desc to get per-cipher sizes in tls_set_sw_offloadSabrina Dubroca2023-08-271-63/+16
| | | | | | | | | | | | We can get rid of some local variables, but we have to keep nonce_size because tls1.3 uses nonce_size = 0 for all ciphers. We can also drop the runtime sanity checks on iv/rec_seq/tag size, since we have compile time checks on those values. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/deed9c4430a62c31751a72b8c03ad66ffe710717.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: use tls_cipher_desc to simplify do_tls_getsockopt_confSabrina Dubroca2023-08-271-163/+11
| | | | | | | | | | | Every cipher uses the same code to update its crypto_info struct based on the values contained in the cctx, with only the struct type and size/offset changing. We can get those from tls_cipher_desc, and use a single pair of memcpy and final copy_to_user. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c21a904b91e972bdbbf9d1c6d2731ccfa1eedf72.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: get crypto_info size from tls_cipher_desc in do_tls_setsockopt_confSabrina Dubroca2023-08-271-31/+8
| | | | | | | | | | We can simplify do_tls_setsockopt_conf using tls_cipher_desc. Also use get_cipher_desc's result to check if the cipher_type coming from userspace is valid. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/e97658eb4c6a5832f8ba20a06c4f36a77763c59e.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: expand use of tls_cipher_desc in tls_sw_fallback_initSabrina Dubroca2023-08-271-14/+6
| | | | | | | | | | tls_sw_fallback_init already gets the key and tag size from tls_cipher_desc. We can now also check that the cipher type is valid, and stop hard-coding the algorithm name passed to crypto_alloc_aead. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/c8c94b8fcafbfb558e09589c1f1ad48dbdf92f76.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: allocate the fallback aead after checking that the cipher is validSabrina Dubroca2023-08-271-10/+10
| | | | | | | | No need to allocate the aead if we're going to fail afterwards. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/335e32511ed55a0b30f3f81a78fa8f323b3bdf8f.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: expand use of tls_cipher_desc in tls_set_device_offloadSabrina Dubroca2023-08-271-18/+4
| | | | | | | | | | | | | tls_set_device_offload is already getting iv and rec_seq sizes from tls_cipher_desc. We can now also check if the cipher_type coming from userspace is valid and can be offloaded. We can also remove the runtime check on rec_seq, since we validate it at compile time. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/8ab71b8eca856c7aaf981a45fe91ac649eb0e2e9.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: validate cipher descriptions at compile timeSabrina Dubroca2023-08-271-0/+18
| | | | | | Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/b38fb8cf60e099e82ae9979c3c9c92421042417c.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: extend tls_cipher_desc to fully describe the ciphersSabrina Dubroca2023-08-272-9/+64
| | | | | | | | | | | | | | - add nonce, usually equal to iv_size but not for chacha - add offsets into the crypto_info for each field - add algorithm name - add offloadable flag Also add helpers to access each field of a crypto_info struct described by a tls_cipher_desc. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/39d5f476d63c171097764e8d38f6f158b7c109ae.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: rename tls_cipher_size_desc to tls_cipher_descSabrina Dubroca2023-08-274-52/+52
| | | | | | | | | We're going to add other fields to it to fully describe a cipher, so the "_size" name won't match the contents. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/76ca6c7686bd6d1534dfa188fb0f1f6fabebc791.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: reduce size of tls_cipher_size_descSabrina Dubroca2023-08-274-9/+20
| | | | | | | | | | | | | | | | tls_cipher_size_desc indexes ciphers by their type, but we're not using indices 0..50 of the array. Each struct tls_cipher_size_desc is 20B, so that's a lot of unused memory. We can reindex the array starting at the lowest used cipher_type. Introduce the get_cipher_size_desc helper to find the right item and avoid out-of-bounds accesses, and make tls_cipher_size_desc's size explicit so that gcc reminds us to update TLS_CIPHER_MIN/MAX when we add a new cipher. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/5e054e370e240247a5d37881a1cd93a67c15f4ca.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: add TLS_CIPHER_ARIA_GCM_* to tls_cipher_size_descSabrina Dubroca2023-08-271-0/+2
| | | | | | Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/b2e0fb79e6d0a4478be9bf33781dc9c9281c9d56.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tls: move tls_cipher_size_desc to net/tls/tls.hSabrina Dubroca2023-08-271-0/+10
| | | | | | | | It's only used in net/tls/*, no need to bloat include/net/tls.h. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/dd9fad80415e5b3575b41f56b331871038362eab.1692977948.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-08-102-34/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/intel/igc/igc_main.c 06b412589eef ("igc: Add lock to safeguard global Qbv variables") d3750076d464 ("igc: Add TransmissionOverrun counter") drivers/net/ethernet/microsoft/mana/mana_en.c a7dfeda6fdec ("net: mana: Fix MANA VF unload when hardware is unresponsive") a9ca9f9ceff3 ("page_pool: split types and declarations from page_pool.h") 92272ec4107e ("eth: add missing xdp.h includes in drivers") net/mptcp/protocol.h 511b90e39250 ("mptcp: fix disconnect vs accept race") b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning") tools/testing/selftests/net/mptcp/mptcp_join.sh c8c101ae390a ("selftests: mptcp: join: fix 'implicit EP' test") 03668c65d153 ("selftests: mptcp: join: rework detailed report") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: tls: set MSG_SPLICE_PAGES consistentlyJakub Kicinski2023-08-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to change the flags for the last segment, because non-last segments had the MSG_SENDPAGE_NOTLAST flag set. That flag is no longer a thing so remove the setting. Since flags most likely don't have MSG_SPLICE_PAGES set this avoids passing parts of the sg as splice and parts as non-splice. Before commit under Fixes we'd have called tcp_sendpage() which would add the MSG_SPLICE_PAGES. Why this leads to trouble remains unclear but Tariq reports hitting the WARN_ON(!sendpage_ok()) due to page refcount of 0. Fixes: e117dcfd646e ("tls: Inline do_tcp_sendpages()") Reported-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/all/4c49176f-147a-4283-f1b1-32aac7b4b996@gmail.com/ Tested-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230808180917.1243540-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * net: tls: avoid discarding data on record closeJakub Kicinski2023-08-061-31/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TLS records end with a 16B tag. For TLS device offload we only need to make space for this tag in the stream, the device will generate and replace it with the actual calculated tag. Long time ago the code would just re-reference the head frag which mostly worked but was suboptimal because it prevented TCP from combining the record into a single skb frag. I'm not sure if it was correct as the first frag may be shorter than the tag. The commit under fixes tried to replace that with using the page frag and if the allocation failed rolling back the data, if record was long enough. It achieves better fragment coalescing but is also buggy. We don't roll back the iterator, so unless we're at the end of send we'll skip the data we designated as tag and start the next record as if the rollback never happened. There's also the possibility that the record was constructed with MSG_MORE and the data came from a different syscall and we already told the user space that we "got it". Allocate a single dummy page and use it as fallback. Found by code inspection, and proven by forcing allocation failures. Fixes: e7b159a48ba6 ("net/tls: remove the record tail optimization") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/tls: avoid TCP window full during ->read_sock()Hannes Reinecke2023-08-081-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When flushing the backlog after decoding a record we don't really know how much data the caller want us to evaluate, so use INT_MAX and 0 as arguments to tls_read_flush_backlog() to ensure we flush at 128k of data. Otherwise we might be reading too much data and trigger a TCP window full. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20230807071022.10091-1-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/tls: Remove unused function declarationsYue Haibing2023-08-071-4/+0
| | | | | | | | | | | | | | | | Commit 3c4d7559159b ("tls: kernel TLS support") declared but never implemented these functions. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'in-kernel-support-for-the-tls-alert-protocol'Jakub Kicinski2023-07-281-0/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chuck Lever says: ==================== In-kernel support for the TLS Alert protocol IMO the kernel doesn't need user space (ie, tlshd) to handle the TLS Alert protocol. Instead, a set of small helper functions can be used to handle sending and receiving TLS Alerts for in-kernel TLS consumers. ==================== Merged on top of a tag in case it's needed in the NFS tree. Link: https://lore.kernel.org/r/169047923706.5241.1181144206068116926.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/tls: Move TLS protocol elements to a separate headerChuck Lever2023-07-281-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Kernel TLS consumers will need definitions of various parts of the TLS protocol, but often do not need the function declarations and other infrastructure provided in <net/tls.h>. Break out existing standardized protocol elements into a separate header, and make room for a few more elements in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/tls: implement ->read_sock()Hannes Reinecke2023-07-273-0/+103
| | | | | | | | | | | | | | | | | | | | | | Implement ->read_sock() function for use with nvme-tcp. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Cc: Boris Pismenny <boris.pismenny@gmail.com> Link: https://lore.kernel.org/r/20230726191556.41714-7-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/tls: split tls_rx_reader_lockHannes Reinecke2023-07-271-16/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and the actual locking part. With that we can use the tls_rx_reader_lock in situations where the socket is already locked. Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230726191556.41714-6-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/tls: Use tcp_read_sock() instead of ops->read_sock()Hannes Reinecke2023-07-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TLS resets the protocol operations, so the read_sock() callback might be changed, too. In this case using sock->ops->readsock() in tls_strp_read_copyin() will enter an infinite recursion if the read_sock() callback is calling tls_rx_rec_wait() which will call into sock->ops->readsock() via tls_strp_read_copyin(). But as tls_strp_read_copyin() is supposed to produce data from the consumed socket and that socket is always a TCP socket we can call tcp_read_sock() directly without having to deal with callbacks. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230726191556.41714-5-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/tls: handle MSG_EOR for tls_device TX flowHannes Reinecke2023-07-271-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | tls_push_data() MSG_MORE, but bails out on MSG_EOR. Seeing that MSG_EOR is basically the opposite of MSG_MORE this patch adds handling MSG_EOR by treating it as the absence of MSG_MORE. Consequently we should return an error when both are set. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230726191556.41714-3-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net/tls: handle MSG_EOR for tls_sw TX flowHannes Reinecke2023-07-271-1/+4
|/ | | | | | | | | | | | | | tls_sw_sendmsg() already handles MSG_MORE, but bails out on MSG_EOR. Seeing that MSG_EOR is basically the opposite of MSG_MORE this patch adds handling MSG_EOR by treating it as the negation of MSG_MORE. And erroring out if MSG_EOR is specified with MSG_MORE. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230726191556.41714-2-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: Kill MSG_SENDPAGE_NOTLASTDavid Howells2023-06-243-4/+3
| | | | | | | | | | | | | | | | | | Now that ->sendpage() has been removed, MSG_SENDPAGE_NOTLAST can be cleaned up. Things were converted to use MSG_MORE instead, but the protocol sendpage stubs still convert MSG_SENDPAGE_NOTLAST to MSG_MORE, which is now unnecessary. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-17-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)David Howells2023-06-244-65/+0
| | | | | | | | | | | | | | | | | | Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usageDavid Howells2023-06-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't use it further in than the sendpage methods, but rather translate it to MSG_MORE and use that instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: Bernard Metzler <bmt@zurich.ibm.com> cc: Jason Gunthorpe <jgg@ziepe.ca> cc: Leon Romanovsky <leon@kernel.org> cc: John Fastabend <john.fastabend@gmail.com> cc: Jakub Sitnicki <jakub@cloudflare.com> cc: David Ahern <dsahern@kernel.org> cc: Karsten Graul <kgraul@linux.ibm.com> cc: Wenjia Zhang <wenjia@linux.ibm.com> cc: Jan Karcher <jaka@linux.ibm.com> cc: "D. Wythe" <alibuda@linux.alibaba.com> cc: Tony Lu <tonylu@linux.alibaba.com> cc: Wen Gu <guwen@linux.alibaba.com> cc: Boris Pismenny <borisp@nvidia.com> cc: Steffen Klassert <steffen.klassert@secunet.com> cc: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20230623225513.2732256-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>