summaryrefslogtreecommitdiffstats
path: root/net/tls/tls_sw.c
Commit message (Collapse)AuthorAgeFilesLines
* tls: fix currently broken MSG_PEEK behaviorDaniel Borkmann2018-09-171-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kTLS MSG_PEEK behavior is currently failing, strace example: [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] listen(4, 10) = 0 [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] close(4) = 0 [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64 As can be seen from strace, there are two TLS records sent, i) 'test_read_peek' and ii) '_mult_recs\0' where we end up peeking 'test_read_peektest_read_peektest'. This is clearly wrong, and what happens is that given peek cannot call into tls_sw_advance_skb() to unpause strparser and proceed with the next skb, we end up looping over the current one, copying the 'test_read_peek' over and over into the user provided buffer. Here, we can only peek into the currently held skb (current, full TLS record) as otherwise we would end up having to hold all the original skb(s) (depending on the peek depth) in a separate queue when unpausing strparser to process next records, minimally intrusive is to return only up to the current record's size (which likely was what c46234ebb4d1 ("tls: RX path for ktls") originally intended as well). Thus, after patch we properly peek the first record: [pid 2046] wait4(2075, <unfinished ...> [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] listen(4, 10) = 0 [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0 [pid 2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] close(4) = 0 [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14 Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: zero the crypto information from tls_context before freeingSabrina Dubroca2018-09-131-4/+4
| | | | | | | | | | | | | | This contains key material in crypto_send_aes_gcm_128 and crypto_recv_aes_gcm_128. Introduce union tls_crypto_context, and replace the two identical unions directly embedded in struct tls_context with it. We can then use this union to clean up the memory in the new tls_ctx_free() function. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: don't copy the key out of tls12_crypto_info_aes_gcm_128Sabrina Dubroca2018-09-131-4/+1
| | | | | | | | | There's no need to copy the key to an on-stack buffer before calling crypto_aead_setkey(). Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPCVakul Garg2018-09-091-0/+6
| | | | | | | | | | | | | | | tls_sw_sendmsg() allocates plaintext and encrypted SG entries using function sk_alloc_sg(). In case the number of SG entries hit MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for current SG index to '0'. This leads to calling of function tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes kernel crash. To fix this, set the number of SG elements to the number of elements in plaintext/encrypted SG arrays in case sk_alloc_sg() returns -ENOSPC. Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Combined memory allocation for decryption requestVakul Garg2018-08-131-96/+142
| | | | | | | | | | | | | For preparing decryption request, several memory chunks are required (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to an accelerator, it is required that the buffers which are read by the accelerator must be dma-able and not come from stack. The buffers for aad and iv can be separately kmalloced each, but it is inefficient. This patch does a combined allocation for preparing decryption request and then segments into aead_req || sgin || sgout || iv || aad. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Mark the end in scatterlist tableVakul Garg2018-08-051-0/+3
| | | | | | | | | | | | Function zerocopy_from_iter() unmarks the 'end' in input sgtable while adding new entries in it. The last entry in sgtable remained unmarked. This results in KASAN error report on using apis like sg_nents(). Before returning, the function needs to mark the 'end' in the last entry it adds. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Use kmemdup to simplify the codezhong jiang2018-08-011-2/+1
| | | | | | | Kmemdup is better than kmalloc+memcpy. So replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Use socket data_ready callback on record availabilityVakul Garg2018-07-301-1/+1
| | | | | | | | | | | | | | | | | On receipt of a complete tls record, use socket's saved data_ready callback instead of state_change callback. In function tls_queue(), the TLS record is queued in encrypted state. But the decryption happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked. So it should be ok to notify the waiting context about the availability of data as soon as we could collect a full TLS record. For new data availability notification, sk_data_ready callback is more appropriate. It points to sock_def_readable() which wakes up specifically for EPOLLIN event. This is in contrast to the socket callback sk_state_change which points to sock_def_wakeup() which issues a wakeup unconditionally (without event mask). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Fix improper revert in zerocopy_from_iterDoron Roberts-Kedes2018-07-281-7/+5
| | | | | | | | | | | | | The current code is problematic because the iov_iter is reverted and never advanced in the non-error case. This patch skips the revert in the non-error case. This patch also fixes the amount by which the iov_iter is reverted. Currently, iov_iter is reverted by size, which can be greater than the amount by which the iter was actually advanced. Instead, only revert by the amount that the iter was advanced. Fixes: 4718799817c5 ("tls: Fix zerocopy_from_iter iov handling") Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Remove dead code in tls_sw_sendmsgDoron Roberts-Kedes2018-07-281-4/+2
| | | | | | | | | tls_push_record either returns 0 on success or a negative value on failure. This patch removes code that would only be executed if tls_push_record were to return a positive value. Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Skip zerocopy path for ITER_KVECDoron Roberts-Kedes2018-07-261-3/+4
| | | | | | | | | | | | | | | The zerocopy path ultimately calls iov_iter_get_pages, which defines the step function for ITER_KVECs as simply, return -EFAULT. Taking the non-zerocopy path for ITER_KVECs avoids the unnecessary fallback. See https://lore.kernel.org/lkml/20150401023311.GL29656@ZenIV.linux.org.uk/T/#u for a discussion of why zerocopy for vmalloc data is not a good idea. Discovered while testing NBD traffic encrypted with ktls. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Removed redundant checks for non-NULLVakul Garg2018-07-261-6/+3
| | | | | | | | | | Removed checks against non-NULL before calling kfree_skb() and crypto_free_aead(). These functions are safe to be called with NULL as an argument. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-07-241-0/+3
|\
| * tls: check RCV_SHUTDOWN in tls_wait_dataDoron Roberts-Kedes2018-07-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code does not check sk->sk_shutdown & RCV_SHUTDOWN. tls_sw_recvmsg may return a positive value in the case where bytes have already been copied when the socket is shutdown. sk->sk_err has been cleared, causing the tls_wait_data to hang forever on a subsequent invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg, fixes this problem. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linuxDavid S. Miller2018-07-201-1/+6
|\| | | | | | | | | | | | | All conflicts were trivial overlapping changes, so reasonably easy to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
| * tls: Stricter error checking in zerocopy sendmsg pathDave Watson2018-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the zerocopy sendmsg() path, there are error checks to revert the zerocopy if we get any error code. syzkaller has discovered that tls_push_record can return -ECONNRESET, which is fatal, and happens after the point at which it is safe to revert the iter, as we've already passed the memory to do_tcp_sendpages. Previously this code could return -ENOMEM and we would want to revert the iter, but AFAIK this no longer returns ENOMEM after a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"), so we fail for all error codes. Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Dave Watson <davejwatson@fb.com> Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: David S. Miller <davem@davemloft.net>
| * tls: fix skb_to_sgvec returning unhandled error.Doron Roberts-Kedes2018-07-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code does not inspect the return value of skb_to_sgvec. This can cause a nullptr kernel panic when the malformed sgvec is passed into the crypto request. Checking the return value of skb_to_sgvec and skipping decryption if it is negative fixes this problem. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Fix zerocopy_from_iter iov handlingBoris Pismenny2018-07-161-3/+5
| | | | | | | | | | | | | | | | | | | | zerocopy_from_iter iterates over the message, but it doesn't revert the updates made by the iov iteration. This patch fixes it. Now, the iov can be used after calling zerocopy_from_iter. Fixes: 3c4d75591 ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Add rx inline crypto offloadBoris Pismenny2018-07-161-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch completes the generic infrastructure to offload TLS crypto to a network device. It enables the kernel to skip decryption and authentication of some skbs marked as decrypted by the NIC. In the fast path, all packets received are decrypted by the NIC and the performance is comparable to plain TCP. This infrastructure doesn't require a TCP offload engine. Instead, the NIC only decrypts packets that contain the expected TCP sequence number. Out-Of-Order TCP packets are provided unmodified. As a result, at the worst case a received TLS record consists of both plaintext and ciphertext packets. These partially decrypted records must be reencrypted, only to be decrypted. The notable differences between SW KTLS Rx and this offload are as follows: 1. Partial decryption - Software must handle the case of a TLS record that was only partially decrypted by HW. This can happen due to packet reordering. 2. Resynchronization - tls_read_size calls the device driver to resynchronize HW after HW lost track of TLS record framing in the TCP stream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Fill software context without allocationBoris Pismenny2018-07-161-12/+22
| | | | | | | | | | | | | | | | | | | | This patch allows tls_set_sw_offload to fill the context in case it was already allocated previously. We will use it in TLS_DEVICE to fill the RX software context. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Split tls_sw_release_resources_rxBoris Pismenny2018-07-161-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | This patch splits tls_sw_release_resources_rx into two functions one which releases all inner software tls structures and another that also frees the containing structure. In TLS_DEVICE we will need to release the software structures without freeeing the containing structure, which contains other information. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Split decrypt_skb to two functionsBoris Pismenny2018-07-161-18/+26
| | | | | | | | | | | | | | | | | | | | | | Previously, decrypt_skb also updated the TLS context. Now, decrypt_skb only decrypts the payload using the current context, while decrypt_skb_update also updates the state. Later, in the tls_device Rx flow, we will use decrypt_skb directly. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/tls: Use aead_request_alloc/free for request alloc/freeVakul Garg2018-07-121-8/+4
| | | | | | | | | | | | | | | | | | | | | | Instead of kzalloc/free for aead_request allocation and free, use functions aead_request_alloc(), aead_request_free(). It ensures that any sensitive crypto material held in crypto transforms is securely erased from memory. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-07-031-9/+10
|\| | | | | | | | | | | | | | | | | | | Simple overlapping changes in stmmac driver. Adjust skb_gro_flush_final_remcsum function signature to make GRO list changes in net-next, as per Stephen Rothwell's example merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLLLinus Torvalds2018-06-281-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | net/tls: Remove VLA usage on nonceKees Cook2018-06-271-2/+8
| | | | | | | | | | | | | | | | | | | | It looks like the prior VLA removal, commit b16520f7493d ("net/tls: Remove VLA usage"), and a new VLA addition, commit c46234ebb4d1e ("tls: RX path for ktls"), passed in the night. This removes the newly added VLA, which happens to have its bounds based on the same max value. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tls: Removed unused variableVakul Garg2018-06-241-3/+0
|/ | | | | | | Removed unused variable 'rxm' from tls_queue(). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: fix waitall behavior in tls_sw_recvmsgDaniel Borkmann2018-06-151-1/+5
| | | | | | | | | | | | | | | | | | | | Current behavior in tls_sw_recvmsg() is to wait for incoming tls messages and copy up to exactly len bytes of data that the user provided. This is problematic in the sense that i) if no packet is currently queued in strparser we keep waiting until one has been processed and pushed into tls receive layer for tls_wait_data() to wake up and push the decrypted bits to user space. Given after tls decryption, we're back at streaming data, use sock_rcvlowat() hint from tcp socket instead. Retain current behavior with MSG_WAITALL flag and otherwise use the hint target for breaking the loop and returning to application. This is done if currently no ctx->recv_pkt is ready, otherwise continue to process it from our strparser backlog. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: fix use-after-free in tls_push_recordDaniel Borkmann2018-06-151-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller managed to trigger a use-after-free in tls like the following: BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls] Write of size 1 at addr ffff88037aa08000 by task a.out/2317 CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 kasan_report+0x258/0x380 ? tls_push_record.constprop.15+0x6a2/0x810 [tls] tls_push_record.constprop.15+0x6a2/0x810 [tls] tls_sw_push_pending_record+0x2e/0x40 [tls] tls_sk_proto_close+0x3fe/0x710 [tls] ? tcp_check_oom+0x4c0/0x4c0 ? tls_write_space+0x260/0x260 [tls] ? kmem_cache_free+0x88/0x1f0 inet_release+0xd6/0x1b0 __sock_release+0xc0/0x240 sock_close+0x11/0x20 __fput+0x22d/0x660 task_work_run+0x114/0x1a0 do_exit+0x71a/0x2780 ? mm_update_next_owner+0x650/0x650 ? handle_mm_fault+0x2f5/0x5f0 ? __do_page_fault+0x44f/0xa50 ? mm_fault_error+0x2d0/0x2d0 do_group_exit+0xde/0x300 __x64_sys_exit_group+0x3a/0x50 do_syscall_64+0x9a/0x300 ? page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happened through fault injection where aead_req allocation in tls_do_encryption() eventually failed and we returned -ENOMEM from the function. Turns out that the use-after-free is triggered from tls_sw_sendmsg() in the second tls_push_record(). The error then triggers a jump to waiting for memory in sk_stream_wait_memory() resp. returning immediately in case of MSG_DONTWAIT. What follows is the trim_both_sgl(sk, orig_size), which drops elements from the sg list added via tls_sw_sendmsg(). Now the use-after-free gets triggered when the socket is being closed, where tls_sk_proto_close() callback is invoked. The tls_complete_pending_work() will figure that there's a pending closed tls record to be flushed and thus calls into the tls_push_pending_closed_record() from there. ctx->push_pending_record() is called from the latter, which is the tls_sw_push_pending_record() from sw path. This again calls into tls_push_record(). And here the tls_fill_prepend() will panic since the buffer address has been freed earlier via trim_both_sgl(). One way to fix it is to move the aead request allocation out of tls_do_encryption() early into tls_push_record(). This means we don't prep the tls header and advance state to the TLS_PENDING_CLOSED_RECORD before allocation which could potentially fail happened. That fixes the issue on my side. Fixes: 3c4d7559159b ("tls: kernel TLS support") Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: fix NULL pointer dereference on pollDaniel Borkmann2018-06-111-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While hacking on kTLS, I ran into the following panic from an unprivileged netserver / netperf TCP session: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0 Oops: 0010 [#1] SMP KASAN PTI CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffff88036abcf740 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26 RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280 RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48 R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280 R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350 FS: 00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? tls_sw_poll+0xa4/0x160 [tls] ? sock_poll+0x20a/0x680 ? do_select+0x77b/0x11a0 ? poll_schedule_timeout.constprop.12+0x130/0x130 ? pick_link+0xb00/0xb00 ? read_word_at_a_time+0x13/0x20 ? vfs_poll+0x270/0x270 ? deref_stack_reg+0xad/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 [...] Debugging further, it turns out that calling into ctx->sk_poll() is invalid since sk_poll itself is NULL which was saved from the original TCP socket in order for tls_sw_poll() to invoke it. Looks like the recent conversion from poll to poll_mask callback started in 152524231023 ("net: add support for ->poll_mask in proto_ops") missed to eventually convert kTLS, too: TCP's ->poll was converted over to the ->poll_mask in commit 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") and therefore kTLS wrongly saved the ->poll old one which is now NULL. Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN | POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in tcp_poll_mask() as well that is mangled here. Fixes: 2c7d3dacebd4 ("net/tcp: convert to ->poll_mask") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Watson <davejwatson@fb.com> Tested-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* strparser: Add __strp_unpause and use it in ktls.Doron Roberts-Kedes2018-06-061-1/+1
| | | | | | | | | | | | | | | | | strp_unpause queues strp_work in order to parse any messages that arrived while the strparser was paused. However, the process invoking strp_unpause could eagerly parse a buffered message itself if it held the sock lock. __strp_unpause is an alternative to strp_pause that avoids the scheduling overhead that results when a receiving thread unpauses the strparser and waits for the next message to be delivered by the workqueue thread. This patch more than doubled the IOPS achieved in a benchmark of NBD traffic encrypted using ktls. Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-05-211-5/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | S390 bpf_jit.S is removed in net-next and had changes in 'net', since that code isn't used any more take the removal. TLS data structures split the TX and RX components in 'net-next', put the new struct members from the bug fix in 'net' into the RX part. The 'net-next' tree had some reworking of how the ERSPAN code works in the GRE tunneling code, overlapping with a one-line headroom calculation fix in 'net'. Overlapping changes in __sock_map_ctx_update_elem(), keep the bits that read the prog members via READ_ONCE() into local variables before using them. Signed-off-by: David S. Miller <davem@davemloft.net>
| * tls: don't use stack memory in a scatterlistMatt Mullins2018-05-171-5/+4
| | | | | | | | | | | | | | | | | | scatterlist code expects virt_to_page() to work, which fails with CONFIG_VMAP_STACK=y. Fixes: c46234ebb4d1e ("tls: RX path for ktls") Signed-off-by: Matt Mullins <mmullins@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/tls: Split conf to rx + txBoris Pismenny2018-05-011-59/+79
| | | | | | | | | | | | | | | | | | In TLS inline crypto, we can have one direction in software and another in hardware. Thus, we split the TLS configuration to separate structures for receive and transmit. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/tls: remove redundant second null check on sgoutColin Ian King2018-04-241-2/+1
|/ | | | | | | | | | | | A duplicated null check on sgout is redundant as it is known to be already true because of the identical earlier check. Remove it. Detected by cppcheck: net/tls/tls_sw.c:696: (warning) Identical inner 'if' condition is always true. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Remove VLA usageKees Cook2018-04-121-1/+9
| | | | | | | | | | | | In the quest to remove VLAs from the kernel[1], this replaces the VLA size with the only possible size used in the code, and adds a mechanism to double-check future IV sizes. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: RX path for ktlsDave Watson2018-03-231-58/+529
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add rx path for tls software implementation. recvmsg, splice_read, and poll implemented. An additional sockopt TLS_RX is added, with the same interface as TLS_TX. Either TLX_RX or TLX_TX may be provided separately, or together (with two different setsockopt calls with appropriate keys). Control messages are passed via CMSG in a similar way to transmit. If no cmsg buffer is passed, then only application data records will be passed to userspace, and EIO is returned for other types of alerts. EBADMSG is passed for decryption errors, and EMSGSIZE is passed for framing too big, and EBADMSG for framing too small (matching openssl semantics). EINVAL is returned for TLS versions that do not match the original setsockopt call. All are unrecoverable. strparser is used to parse TLS framing. Decryption is done directly in to userspace buffers if they are large enough to support it, otherwise sk_cow_data is called (similar to ipsec), and buffers are decrypted in place and copied. splice_read always decrypts in place, since no buffers are provided to decrypt in to. sk_poll is overridden, and only returns POLLIN if a full TLS message is received. Otherwise we wait for strparser to finish reading a full frame. Actual decryption is only done during recvmsg or splice_read calls. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Pass error code explicitly to tls_err_abortDave Watson2018-03-231-1/+1
| | | | | | | | | Pass EBADMSG explicitly to tls_err_abort. Receive path will pass additional codes - EMSGSIZE if framing is larger than max TLS record size, EINVAL if TLS version mismatch. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Move cipher info to a separate structDave Watson2018-03-231-28/+30
| | | | | | | | | | Separate tx crypto parameters to a separate cipher_context struct. The same parameters will be used for rx using the same struct. tls_advance_record_sn is modified to only take the cipher info. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Generalize zerocopy_from_iterDave Watson2018-03-231-12/+19
| | | | | | | | | | Refactor zerocopy_from_iter to take arguments for pages and size, such that it can be used for both tx and rx. RX will also support zerocopy direct to output iter, as long as the full message can be copied at once (a large enough userspace buffer was provided). Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: generalize sk_alloc_sg to work with scatterlist ringsJohn Fastabend2018-03-191-2/+2
| | | | | | | | | | | | | The current implementation of sk_alloc_sg expects scatterlist to always start at entry 0 and complete at entry MAX_SKB_FRAGS. Future patches will want to support starting at arbitrary offset into scatterlist so add an additional sg_start parameters and then default to the current values in TLS code paths. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* sock: make static tls function alloc_sg generic sock helperJohn Fastabend2018-03-191-62/+7
| | | | | | | | | | | | The TLS ULP module builds scatterlists from a sock using page_frag_refill(). This is going to be useful for other ULPs so move it into sock file for more general use. In the process remove useless goto at end of while loop. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* tls: Add support for encryption using async offload acceleratorVakul Garg2018-01-311-1/+7
| | | | | | | | | | | Async crypto accelerators (e.g. drivers/crypto/caam) support offloading GCM operation. If they are enabled, crypto_aead_encrypt() return error code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a completion till the time the response for crypto offload request is received. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Correct length of scatterlist in tls_sw_sendpageDave Watson2018-01-221-0/+2
| | | | | | | | | | | | | | | The scatterlist is reused by both sendmsg and sendfile. If a sendmsg of smaller number of pages is followed by a sendfile of larger number of pages, the scatterlist may be too short, resulting in a crash in gcm_encrypt. Add sg_unmark_end to make the list the correct length. tls_sw_sendmsg already calls sg_unmark_end correctly when it allocates memory in alloc_sg, or in zerocopy_from_iter. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: fix sw_ctx leakSabrina Dubroca2018-01-171-5/+7
| | | | | | | | | | | During setsockopt(SOL_TCP, TLS_TX), if initialization of the software context fails in tls_set_sw_offload(), we leak sw_ctx. We also don't reassign ctx->priv_ctx to NULL, so we can't even do another attempt to set it up on the same socket, as it will fail with -EEXIST. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Fix inverted error codes to avoid endless loopr.hering@avm.de2018-01-151-2/+2
| | | | | | | | | | | sendfile() calls can hang endless with using Kernel TLS if a socket error occurs. Socket error codes must be inverted by Kernel TLS before returning because they are stored with positive sign. If returned non-inverted they are interpreted as number of bytes sent, causing endless looping of the splice mechanic behind sendfile(). Signed-off-by: Robert Hering <r.hering@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Move tls_make_aad to header to allow sharingIlya Lesokhin2017-11-141-17/+1
| | | | | | | | | move tls_make_aad as it is going to be reused by the device offload code and rx path. Remove unused recv parameter. Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Fix TLS ulp context leak, when TLS_TX setsockopt is not used.Ilya Lesokhin2017-11-141-2/+2
| | | | | | | | | | | | | | | Previously the TLS ulp context would leak if we attached a TLS ulp to a socket but did not use the TLS_TX setsockopt, or did use it but it failed. This patch solves the issue by overriding prot[TLS_BASE_TX].close and fixing tls_sk_proto_close to work properly when its called with ctx->tx_conf == TLS_BASE_TX. This patch also removes ctx->free_resources as we can use ctx->tx_conf to obtain the relevant information. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: Use kzalloc for aead_request allocationIlya Lesokhin2017-11-141-1/+1
| | | | | | | | | Use kzalloc for aead_request allocation as we don't set all the bits in the request. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tls: make tls_sw_free_resources staticTobias Klauser2017-09-141-1/+1
| | | | | | | | Make the needlessly global function tls_sw_free_resources static to fix a gcc/sparse warning. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>