summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core
Commit message (Collapse)AuthorAgeFilesLines
* RDMA/ucma: Ensure that CM_ID exists prior to access itLeon Romanovsky2018-03-201-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to access UCMA commands, the context should be initialized and connected to CM_ID with ucma_create_id(). In case user skips this step, he can provide non-valid ctx without CM_ID and cause to multiple NULL dereferences. Also there are situations where the create_id can be raced with other user access, ensure that the context is only shared to other threads once it is fully initialized to avoid the races. [ 109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 109.090315] IP: ucma_connect+0x138/0x1d0 [ 109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0 [ 109.095384] Oops: 0000 [#1] SMP KASAN PTI [ 109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G B 4.16.0-rc1-00062-g2975d5de6428 #45 [ 109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 [ 109.105943] RIP: 0010:ucma_connect+0x138/0x1d0 [ 109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246 [ 109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2 [ 109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297 [ 109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb [ 109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000 [ 109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118 [ 109.126221] FS: 00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000 [ 109.129468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0 [ 109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 109.142057] Call Trace: [ 109.144160] ? ucma_listen+0x110/0x110 [ 109.146386] ? wake_up_q+0x59/0x90 [ 109.148853] ? futex_wake+0x10b/0x2a0 [ 109.151297] ? save_stack+0x89/0xb0 [ 109.153489] ? _copy_from_user+0x5e/0x90 [ 109.155500] ucma_write+0x174/0x1f0 [ 109.157933] ? ucma_resolve_route+0xf0/0xf0 [ 109.160389] ? __mod_node_page_state+0x1d/0x80 [ 109.162706] __vfs_write+0xc4/0x350 [ 109.164911] ? kernel_read+0xa0/0xa0 [ 109.167121] ? path_openat+0x1b10/0x1b10 [ 109.169355] ? fsnotify+0x899/0x8f0 [ 109.171567] ? fsnotify_unmount_inodes+0x170/0x170 [ 109.174145] ? __fget+0xa8/0xf0 [ 109.177110] vfs_write+0xf7/0x280 [ 109.179532] SyS_write+0xa1/0x120 [ 109.181885] ? SyS_read+0x120/0x120 [ 109.184482] ? compat_start_thread+0x60/0x60 [ 109.187124] ? SyS_read+0x120/0x120 [ 109.189548] do_syscall_64+0xeb/0x250 [ 109.192178] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 109.194725] RIP: 0033:0x7fabb61ebe99 [ 109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99 [ 109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004 [ 109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000 [ 109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0 [ 109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0 [ 109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff 31 c0 <66> 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7 [ 109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80 [ 109.226256] CR2: 0000000000000020 Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Reported-by: <syzbot+36712f50b0552615bf59@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/ucma: Fix use-after-free access in ucma_closeLeon Romanovsky2018-03-191-0/+3
| | | | | | | | | | | | The error in ucma_create_id() left ctx in the list of contexts belong to ucma file descriptor. The attempt to close this file descriptor causes to use-after-free accesses while iterating over such list. Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/ucma: Check AF family prior resolving addressLeon Romanovsky2018-03-151-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Garbage supplied by user will cause to UCMA module provide zero memory size for memcpy(), because it wasn't checked, it will produce unpredictable results in rdma_resolve_addr(). [ 42.873814] BUG: KASAN: null-ptr-deref in rdma_resolve_addr+0xc8/0xfb0 [ 42.874816] Write of size 28 at addr 00000000000000a0 by task resaddr/1044 [ 42.876765] [ 42.876960] CPU: 1 PID: 1044 Comm: resaddr Not tainted 4.16.0-rc1-00057-gaa56a5293d7e #34 [ 42.877840] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 [ 42.879691] Call Trace: [ 42.880236] dump_stack+0x5c/0x77 [ 42.880664] kasan_report+0x163/0x380 [ 42.881354] ? rdma_resolve_addr+0xc8/0xfb0 [ 42.881864] memcpy+0x34/0x50 [ 42.882692] rdma_resolve_addr+0xc8/0xfb0 [ 42.883366] ? deref_stack_reg+0x88/0xd0 [ 42.883856] ? vsnprintf+0x31a/0x770 [ 42.884686] ? rdma_bind_addr+0xc40/0xc40 [ 42.885327] ? num_to_str+0x130/0x130 [ 42.885773] ? deref_stack_reg+0x88/0xd0 [ 42.886217] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [ 42.887698] ? unwind_get_return_address_ptr+0x50/0x50 [ 42.888302] ? replace_slot+0x147/0x170 [ 42.889176] ? delete_node+0x12c/0x340 [ 42.890223] ? __radix_tree_lookup+0xa9/0x160 [ 42.891196] ? ucma_resolve_ip+0xb7/0x110 [ 42.891917] ucma_resolve_ip+0xb7/0x110 [ 42.893003] ? ucma_resolve_addr+0x190/0x190 [ 42.893531] ? _copy_from_user+0x5e/0x90 [ 42.894204] ucma_write+0x174/0x1f0 [ 42.895162] ? ucma_resolve_route+0xf0/0xf0 [ 42.896309] ? dequeue_task_fair+0x67e/0xd90 [ 42.897192] ? put_prev_entity+0x7d/0x170 [ 42.897870] ? ring_buffer_record_is_on+0xd/0x20 [ 42.898439] ? tracing_record_taskinfo_skip+0x20/0x50 [ 42.899686] __vfs_write+0xc4/0x350 [ 42.900142] ? kernel_read+0xa0/0xa0 [ 42.900602] ? firmware_map_remove+0xdf/0xdf [ 42.901135] ? do_task_dead+0x5d/0x60 [ 42.901598] ? do_exit+0xcc6/0x1220 [ 42.902789] ? __fget+0xa8/0xf0 [ 42.903190] vfs_write+0xf7/0x280 [ 42.903600] SyS_write+0xa1/0x120 [ 42.904206] ? SyS_read+0x120/0x120 [ 42.905710] ? compat_start_thread+0x60/0x60 [ 42.906423] ? SyS_read+0x120/0x120 [ 42.908716] do_syscall_64+0xeb/0x250 [ 42.910760] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 42.912735] RIP: 0033:0x7f138b0afe99 [ 42.914734] RSP: 002b:00007f138b799e98 EFLAGS: 00000287 ORIG_RAX: 0000000000000001 [ 42.917134] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138b0afe99 [ 42.919487] RDX: 000000000000002e RSI: 0000000020000c40 RDI: 0000000000000004 [ 42.922393] RBP: 00007f138b799ec0 R08: 00007f138b79a700 R09: 0000000000000000 [ 42.925266] R10: 00007f138b79a700 R11: 0000000000000287 R12: 00007f138b799fc0 [ 42.927570] R13: 0000000000000000 R14: 00007ffdbae757c0 R15: 00007f138b79a9c0 [ 42.930047] [ 42.932681] Disabling lock debugging due to kernel taint [ 42.934795] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 [ 42.936939] IP: memcpy_erms+0x6/0x10 [ 42.938864] PGD 80000001bea92067 P4D 80000001bea92067 PUD 1bea96067 PMD 0 [ 42.941576] Oops: 0002 [#1] SMP KASAN PTI [ 42.943952] CPU: 1 PID: 1044 Comm: resaddr Tainted: G B 4.16.0-rc1-00057-gaa56a5293d7e #34 [ 42.946964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 [ 42.952336] RIP: 0010:memcpy_erms+0x6/0x10 [ 42.954707] RSP: 0018:ffff8801c8b479c8 EFLAGS: 00010286 [ 42.957227] RAX: 00000000000000a0 RBX: ffff8801c8b47ba0 RCX: 000000000000001c [ 42.960543] RDX: 000000000000001c RSI: ffff8801c8b47bbc RDI: 00000000000000a0 [ 42.963867] RBP: ffff8801c8b47b60 R08: 0000000000000000 R09: ffffed0039168ed1 [ 42.967303] R10: 0000000000000001 R11: ffffed0039168ed0 R12: ffff8801c8b47bbc [ 42.970685] R13: 00000000000000a0 R14: 1ffff10039168f4a R15: 0000000000000000 [ 42.973631] FS: 00007f138b79a700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000 [ 42.976831] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.979239] CR2: 00000000000000a0 CR3: 00000001be908002 CR4: 00000000003606a0 [ 42.982060] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 42.984877] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 42.988033] Call Trace: [ 42.990487] rdma_resolve_addr+0xc8/0xfb0 [ 42.993202] ? deref_stack_reg+0x88/0xd0 [ 42.996055] ? vsnprintf+0x31a/0x770 [ 42.998707] ? rdma_bind_addr+0xc40/0xc40 [ 43.000985] ? num_to_str+0x130/0x130 [ 43.003410] ? deref_stack_reg+0x88/0xd0 [ 43.006302] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [ 43.008780] ? unwind_get_return_address_ptr+0x50/0x50 [ 43.011178] ? replace_slot+0x147/0x170 [ 43.013517] ? delete_node+0x12c/0x340 [ 43.016019] ? __radix_tree_lookup+0xa9/0x160 [ 43.018755] ? ucma_resolve_ip+0xb7/0x110 [ 43.021270] ucma_resolve_ip+0xb7/0x110 [ 43.023968] ? ucma_resolve_addr+0x190/0x190 [ 43.026312] ? _copy_from_user+0x5e/0x90 [ 43.029384] ucma_write+0x174/0x1f0 [ 43.031861] ? ucma_resolve_route+0xf0/0xf0 [ 43.034782] ? dequeue_task_fair+0x67e/0xd90 [ 43.037483] ? put_prev_entity+0x7d/0x170 [ 43.040215] ? ring_buffer_record_is_on+0xd/0x20 [ 43.042990] ? tracing_record_taskinfo_skip+0x20/0x50 [ 43.045595] __vfs_write+0xc4/0x350 [ 43.048624] ? kernel_read+0xa0/0xa0 [ 43.051604] ? firmware_map_remove+0xdf/0xdf [ 43.055379] ? do_task_dead+0x5d/0x60 [ 43.058000] ? do_exit+0xcc6/0x1220 [ 43.060783] ? __fget+0xa8/0xf0 [ 43.063133] vfs_write+0xf7/0x280 [ 43.065677] SyS_write+0xa1/0x120 [ 43.068647] ? SyS_read+0x120/0x120 [ 43.071179] ? compat_start_thread+0x60/0x60 [ 43.074025] ? SyS_read+0x120/0x120 [ 43.076705] do_syscall_64+0xeb/0x250 [ 43.079006] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 43.081606] RIP: 0033:0x7f138b0afe99 [ 43.083679] RSP: 002b:00007f138b799e98 EFLAGS: 00000287 ORIG_RAX: 0000000000000001 [ 43.086802] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f138b0afe99 [ 43.089989] RDX: 000000000000002e RSI: 0000000020000c40 RDI: 0000000000000004 [ 43.092866] RBP: 00007f138b799ec0 R08: 00007f138b79a700 R09: 0000000000000000 [ 43.096233] R10: 00007f138b79a700 R11: 0000000000000287 R12: 00007f138b799fc0 [ 43.098913] R13: 0000000000000000 R14: 00007ffdbae757c0 R15: 00007f138b79a9c0 [ 43.101809] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 [ 43.107950] RIP: memcpy_erms+0x6/0x10 RSP: ffff8801c8b479c8 Reported-by: <syzbot+1d8c43206853b369d00c@syzkaller.appspotmail.com> Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/ucma: Don't allow join attempts for unsupported AF familyLeon Romanovsky2018-03-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users can provide garbage while calling to ucma_join_ip_multicast(), it will indirectly cause to rdma_addr_size() return 0, making the call to ucma_process_join(), which had the right checks, but it is better to check the input as early as possible. The following crash from syzkaller revealed it. kernel BUG at lib/string.c:1052! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051 RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286 RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000 RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12 RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998 R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00 FS: 0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memcpy include/linux/string.h:344 [inline] ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633 __vfs_write+0xef/0x970 fs/read_write.c:480 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline] do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f9ec99 RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100 RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de 55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56 RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0 Fixes: 5bc2b7b397b0 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast") Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/ucma: Fix access to non-initialized CM_ID objectLeon Romanovsky2018-03-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The attempt to join multicast group without ensuring that CMA device exists will lead to the following crash reported by syzkaller. [ 64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0 [ 64.076797] Read of size 8 at addr 00000000000000b0 by task join/691 [ 64.076797] [ 64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23 [ 64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4 [ 64.076803] Call Trace: [ 64.076809] dump_stack+0x5c/0x77 [ 64.076817] kasan_report+0x163/0x380 [ 64.085859] ? rdma_join_multicast+0x26e/0x12c0 [ 64.086634] rdma_join_multicast+0x26e/0x12c0 [ 64.087370] ? rdma_disconnect+0xf0/0xf0 [ 64.088579] ? __radix_tree_replace+0xc3/0x110 [ 64.089132] ? node_tag_clear+0x81/0xb0 [ 64.089606] ? idr_alloc_u32+0x12e/0x1a0 [ 64.090517] ? __fprop_inc_percpu_max+0x150/0x150 [ 64.091768] ? tracing_record_taskinfo+0x10/0xc0 [ 64.092340] ? idr_alloc+0x76/0xc0 [ 64.092951] ? idr_alloc_u32+0x1a0/0x1a0 [ 64.093632] ? ucma_process_join+0x23d/0x460 [ 64.094510] ucma_process_join+0x23d/0x460 [ 64.095199] ? ucma_migrate_id+0x440/0x440 [ 64.095696] ? futex_wake+0x10b/0x2a0 [ 64.096159] ucma_join_multicast+0x88/0xe0 [ 64.096660] ? ucma_process_join+0x460/0x460 [ 64.097540] ? _copy_from_user+0x5e/0x90 [ 64.098017] ucma_write+0x174/0x1f0 [ 64.098640] ? ucma_resolve_route+0xf0/0xf0 [ 64.099343] ? rb_erase_cached+0x6c7/0x7f0 [ 64.099839] __vfs_write+0xc4/0x350 [ 64.100622] ? perf_syscall_enter+0xe4/0x5f0 [ 64.101335] ? kernel_read+0xa0/0xa0 [ 64.103525] ? perf_sched_cb_inc+0xc0/0xc0 [ 64.105510] ? syscall_exit_register+0x2a0/0x2a0 [ 64.107359] ? __switch_to+0x351/0x640 [ 64.109285] ? fsnotify+0x899/0x8f0 [ 64.111610] ? fsnotify_unmount_inodes+0x170/0x170 [ 64.113876] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 64.115813] ? ring_buffer_record_is_on+0xd/0x20 [ 64.117824] ? __fget+0xa8/0xf0 [ 64.119869] vfs_write+0xf7/0x280 [ 64.122001] SyS_write+0xa1/0x120 [ 64.124213] ? SyS_read+0x120/0x120 [ 64.126644] ? SyS_read+0x120/0x120 [ 64.128563] do_syscall_64+0xeb/0x250 [ 64.130732] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 64.132984] RIP: 0033:0x7f5c994ade99 [ 64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99 [ 64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015 [ 64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000 [ 64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0 [ 64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0 [ 64.151060] [ 64.153703] Disabling lock debugging due to kernel taint [ 64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [ 64.159066] IP: rdma_join_multicast+0x26e/0x12c0 [ 64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0 [ 64.164442] Oops: 0000 [#1] SMP KASAN PTI [ 64.166817] CPU: 1 PID: 691 Comm: join Tainted: G B 4.16.0-rc1-00219-gb97853b65b93 #23 [ 64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4 [ 64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0 [ 64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282 [ 64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522 [ 64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297 [ 64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7 [ 64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000 [ 64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400 [ 64.196105] FS: 00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000 [ 64.199211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0 [ 64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 64.211554] Call Trace: [ 64.213464] ? rdma_disconnect+0xf0/0xf0 [ 64.216124] ? __radix_tree_replace+0xc3/0x110 [ 64.219337] ? node_tag_clear+0x81/0xb0 [ 64.222140] ? idr_alloc_u32+0x12e/0x1a0 [ 64.224422] ? __fprop_inc_percpu_max+0x150/0x150 [ 64.226588] ? tracing_record_taskinfo+0x10/0xc0 [ 64.229763] ? idr_alloc+0x76/0xc0 [ 64.232186] ? idr_alloc_u32+0x1a0/0x1a0 [ 64.234505] ? ucma_process_join+0x23d/0x460 [ 64.237024] ucma_process_join+0x23d/0x460 [ 64.240076] ? ucma_migrate_id+0x440/0x440 [ 64.243284] ? futex_wake+0x10b/0x2a0 [ 64.245302] ucma_join_multicast+0x88/0xe0 [ 64.247783] ? ucma_process_join+0x460/0x460 [ 64.250841] ? _copy_from_user+0x5e/0x90 [ 64.253878] ucma_write+0x174/0x1f0 [ 64.257008] ? ucma_resolve_route+0xf0/0xf0 [ 64.259877] ? rb_erase_cached+0x6c7/0x7f0 [ 64.262746] __vfs_write+0xc4/0x350 [ 64.265537] ? perf_syscall_enter+0xe4/0x5f0 [ 64.267792] ? kernel_read+0xa0/0xa0 [ 64.270358] ? perf_sched_cb_inc+0xc0/0xc0 [ 64.272575] ? syscall_exit_register+0x2a0/0x2a0 [ 64.275367] ? __switch_to+0x351/0x640 [ 64.277700] ? fsnotify+0x899/0x8f0 [ 64.280530] ? fsnotify_unmount_inodes+0x170/0x170 [ 64.283156] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 64.286182] ? ring_buffer_record_is_on+0xd/0x20 [ 64.288749] ? __fget+0xa8/0xf0 [ 64.291136] vfs_write+0xf7/0x280 [ 64.292972] SyS_write+0xa1/0x120 [ 64.294965] ? SyS_read+0x120/0x120 [ 64.297474] ? SyS_read+0x120/0x120 [ 64.299751] do_syscall_64+0xeb/0x250 [ 64.301826] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 64.304352] RIP: 0033:0x7f5c994ade99 [ 64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99 [ 64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015 [ 64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000 [ 64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0 [ 64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0 [ 64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8 [ 64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860 [ 64.332979] CR2: 00000000000000b0 [ 64.335550] ---[ end trace 0c00c17a408849c1 ]--- Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com> Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/core: Do not use invalid destination in determining port reuseTatyana Nikolova2018-03-141-5/+7
| | | | | | | | | | | | | | | | | | | | cma_port_is_unique() allows local port reuse if the quad (source address and port, destination address and port) for this connection is unique. However, if the destination info is zero or unspecified, it can't make a correct decision but still allows port reuse. For example, sometimes rdma_bind_addr() is called with unspecified destination and reusing the port can lead to creating a connection with a duplicate quad, after the destination is resolved. The issue manifests when MPI scale-up tests hang after the duplicate quad is used. Set the destination address family and add checks for zero destination address and port to prevent source port reuse based on invalid destination. Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id") Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/ucma: Check that user doesn't overflow QP stateLeon Romanovsky2018-03-071-0/+3
| | | | | | | | | | The QP state is limited and declared in enum ib_qp_state, but ucma user was able to supply any possible (u32) value. Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/ucma: Limit possible option sizeLeon Romanovsky2018-03-071-0/+3
| | | | | | | | | | | | | | | Users of ucma are supposed to provide size of option level, in most paths it is supposed to be equal to u8 or u16, but it is not the case for the IB path record, where it can be multiple of struct ib_path_rec_data. This patch takes simplest possible approach and prevents providing values more than possible to allocate. Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com Fixes: 7ce86409adcd ("RDMA/ucma: Allow user space to set service type") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Fix possible crash to access NULL netdevParav Pandit2018-03-071-4/+3
| | | | | | | | | | | | | | | | | resolved_dev returned might be NULL as ifindex is transient number. Ignoring NULL check of resolved_dev might crash the kernel. Therefore perform NULL check before accessing resolved_dev. Additionally rdma_resolve_ip_route() invokes addr_resolve() which performs check and address translation for loopback ifindex. Therefore, checking it again in rdma_resolve_ip_route() is not helpful. Therefore, the code is simplified to avoid IFF_LOOPBACK check. Fixes: 200298326b27 ("IB/core: Validate route when we init ah") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/core: Reduce poll batch for direct cq pollingMax Gurtovoy2018-03-061-10/+11
| | | | | | | | | | | | | | | | | | | | Fix warning limit for kernel stack consumption: drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct': drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Using smaller ib_wc array on the stack brings us comfortably below that limit again. Fixes: 246d8b184c10 ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct") Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sergey Gorenko <sergeygo@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/core : Add null pointer check in addr_resolveMuneendra Kumar M2018-02-281-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_get_by_index is being called in addr_resolve function which returns NULL and NULL pointer access leads to kernel crash. Following call trace is observed while running rdma_lat test application [ 146.173149] BUG: unable to handle kernel NULL pointer dereference at 00000000000004a0 [ 146.173198] IP: addr_resolve+0x9e/0x3e0 [ib_core] [ 146.173221] PGD 0 P4D 0 [ 146.173869] Oops: 0000 [#1] SMP PTI [ 146.182859] CPU: 8 PID: 127 Comm: kworker/8:1 Tainted: G O 4.15.0-rc6+ #18 [ 146.183758] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01KN179, BIOS-[TCE132H-2.50]- 10/11/2017 [ 146.184691] Workqueue: ib_cm cm_work_handler [ib_cm] [ 146.185632] RIP: 0010:addr_resolve+0x9e/0x3e0 [ib_core] [ 146.186584] RSP: 0018:ffffc9000362faa0 EFLAGS: 00010246 [ 146.187521] RAX: 000000000000001b RBX: ffffc9000362fc08 RCX: 0000000000000006 [ 146.188472] RDX: 0000000000000000 RSI: 0000000000000096 RDI : ffff88087fc16990 [ 146.189427] RBP: ffffc9000362fb18 R08: 00000000ffffff9d R09: 00000000000004ac [ 146.190392] R10: 00000000000001e7 R11: 0000000000000001 R12: ffff88086af2e090 [ 146.191361] R13: 0000000000000000 R14: 0000000000000001 R15: 00000000ffffff9d [ 146.192327] FS: 0000000000000000(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000 [ 146.193301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 146.194274] CR2: 00000000000004a0 CR3: 000000000220a002 CR4: 00000000003606e0 [ 146.195258] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 146.196256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 146.197231] Call Trace: [ 146.198209] ? rdma_addr_register_client+0x30/0x30 [ib_core] [ 146.199199] rdma_resolve_ip+0x1af/0x280 [ib_core] [ 146.200196] rdma_addr_find_l2_eth_by_grh+0x154/0x2b0 [ib_core] The below patch adds the missing NULL pointer check returned by dev_get_by_index before accessing the netdev to avoid kernel crash. We observed the below crash when we try to do the below test. server client --------- --------- |1.1.1.1|<----rxe-channel--->|1.1.1.2| --------- --------- On server: rdma_lat -c -n 2 -s 1024 On client:rdma_lat 1.1.1.1 -c -n 2 -s 1024 Fixes: 200298326b27 ("IB/core: Validate route when we init ah") Signed-off-by: Muneendra <muneendra.kumar@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/core: Fix missing RDMA cgroups release in case of failure to register deviceParav Pandit2018-02-281-2/+4
| | | | | | | | | | | | | During IB device registration process, if query_device() fails or if ib_core fails to registers sysfs entries, rdma cgroup cleanup is skipped. Cc: <stable@vger.kernel.org> # v4.2+ Fixes: 4be3a4fa51f4 ("IB/core: Fix kernel crash during fail to initialize device") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/uverbs: Fix kernel panic while using XRC_TGT QP typeLeon Romanovsky2018-02-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong invocation) will trigger the following kernel panic. It is caused by the fact that such QPs missed uobject initialization. [ 17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 17.412645] IP: rdma_lookup_put_uobject+0x9/0x50 [ 17.416567] PGD 0 P4D 0 [ 17.419262] Oops: 0000 [#1] SMP PTI [ 17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86 [ 17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50 [ 17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246 [ 17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000 [ 17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 [ 17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac [ 17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800 [ 17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000 [ 17.443971] FS: 00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000 [ 17.446623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0 [ 17.449677] Call Trace: [ 17.450247] modify_qp.isra.20+0x219/0x2f0 [ 17.451151] ib_uverbs_modify_qp+0x90/0xe0 [ 17.452126] ib_uverbs_write+0x1d2/0x3c0 [ 17.453897] ? __handle_mm_fault+0x93c/0xe40 [ 17.454938] __vfs_write+0x36/0x180 [ 17.455875] vfs_write+0xad/0x1e0 [ 17.456766] SyS_write+0x52/0xc0 [ 17.457632] do_syscall_64+0x75/0x180 [ 17.458631] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 17.460004] RIP: 0033:0x7fc29198f5a0 [ 17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0 [ 17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003 [ 17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050 [ 17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300 [ 17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000 [ 17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a [ 17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90 [ 17.476841] CR2: 0000000000000048 [ 17.477764] ---[ end trace 1dbcc5354071a712 ]--- [ 17.478880] Kernel panic - not syncing: Fatal exception [ 17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 2f08ee363fe0 ("RDMA/restrack: don't use uaccess_kernel()") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/restrack: don't use uaccess_kernel()Steve Wise2018-02-164-7/+22
| | | | | | | | | | | | | | | | | | uaccess_kernel() isn't sufficient to determine if an rdma resource is user-mode or not. For example, resources allocated in the add_one() function of an ib_client get falsely labeled as user mode, when they are kernel mode allocations. EG: mad qps. The result is that these qps are skipped over during a nldev query because of an erroneous namespace mismatch. So now we determine if the resource is user-mode by looking at the object struct's uobject or similar pointer to know if it was allocated for user mode applications. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/verbs: Check existence of function prior to accessing itLeon Romanovsky2018-02-162-0/+24
| | | | | | | | | | | | | | Update all the flows to ensure that function pointer exists prior to accessing it. This is much safer than checking the uverbs_ex_mask variable, especially since we know that test isn't working properly and will be removed in -next. This prevents a user triggereable oops. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/uverbs: Sanitize user entered port numbers prior to access itLeon Romanovsky2018-02-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ================================================================== BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs+0x6f2/0x8c0 Read of size 4 at addr ffff88006476a198 by task syzkaller697701/265 CPU: 0 PID: 265 Comm: syzkaller697701 Not tainted 4.15.0+ #90 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ? show_regs_print_info+0x17/0x17 ? lock_contended+0x11a0/0x11a0 print_address_description+0x83/0x3e0 kasan_report+0x18c/0x4b0 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0 ? lookup_get_idr_uobject+0x120/0x200 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0 copy_ah_attr_from_uverbs+0x6f2/0x8c0 ? modify_qp+0xd0e/0x1350 modify_qp+0xd0e/0x1350 ib_uverbs_modify_qp+0xf9/0x170 ? ib_uverbs_query_qp+0xa70/0xa70 ib_uverbs_write+0x7f9/0xef0 ? attach_entity_load_avg+0x8b0/0x8b0 ? ib_uverbs_query_qp+0xa70/0xa70 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? sched_clock_cpu+0x18/0x200 ? _raw_spin_unlock_irq+0x29/0x40 ? _raw_spin_unlock_irq+0x29/0x40 ? _raw_spin_unlock_irq+0x29/0x40 ? time_hardirqs_on+0x27/0x670 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1bd/0x7a0 ? finish_task_switch+0x194/0x7a0 ? prandom_u32_state+0xe/0x180 ? rcu_read_unlock+0x80/0x80 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x433c29 RSP: 002b:00007ffcf2be82a8 EFLAGS: 00000217 Allocated by task 62: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc+0x141/0x480 dup_fd+0x101/0xcc0 copy_process.part.62+0x166f/0x4390 _do_fork+0x1cb/0xe90 kernel_thread+0x34/0x40 call_usermodehelper_exec_work+0x112/0x260 process_one_work+0x929/0x1aa0 worker_thread+0x5c6/0x12a0 kthread+0x346/0x510 ret_from_fork+0x3a/0x50 Freed by task 259: kasan_slab_free+0x71/0xc0 kmem_cache_free+0xf3/0x4c0 put_files_struct+0x225/0x2c0 exit_files+0x88/0xc0 do_exit+0x67c/0x1520 do_group_exit+0xe8/0x380 SyS_exit_group+0x1e/0x20 entry_SYSCALL_64_fastpath+0x1e/0x8b The buggy address belongs to the object at ffff88006476a000 which belongs to the cache files_cache of size 832 The buggy address is located 408 bytes inside of 832-byte region [ffff88006476a000, ffff88006476a340) The buggy address belongs to the page: page:ffffea000191da80 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 0000000000000000 0000000000000000 0000000100080008 raw: 0000000000000000 0000000100000001 ffff88006bcf7a80 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88006476a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88006476a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88006476a180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88006476a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88006476a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/uverbs: Fix circular locking dependencyLeon Romanovsky2018-02-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid circular locking dependency by calling to uobj_alloc_commit() outside of xrcd_tree_mutex lock. ====================================================== WARNING: possible circular locking dependency detected 4.15.0+ #87 Not tainted ------------------------------------------------------ syzkaller401056/269 is trying to acquire lock: (&uverbs_dev->xrcd_tree_mutex){+.+.}, at: [<000000006c12d2cd>] uverbs_free_xrcd+0xd2/0x360 but task is already holding lock: (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ucontext->uobjects_lock){+.+.}: __mutex_lock+0x111/0x1720 rdma_alloc_commit_uobject+0x22c/0x600 ib_uverbs_open_xrcd+0x61a/0xdd0 ib_uverbs_write+0x7f9/0xef0 __vfs_write+0x10d/0x700 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 entry_SYSCALL_64_fastpath+0x1e/0x8b -> #0 (&uverbs_dev->xrcd_tree_mutex){+.+.}: lock_acquire+0x19d/0x440 __mutex_lock+0x111/0x1720 uverbs_free_xrcd+0xd2/0x360 remove_commit_idr_uobject+0x6d/0x110 uverbs_cleanup_ucontext+0x2f0/0x730 ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120 ib_uverbs_close+0xf2/0x570 __fput+0x2cd/0x8d0 task_work_run+0xec/0x1d0 do_exit+0x6a1/0x1520 do_group_exit+0xe8/0x380 SyS_exit_group+0x1e/0x20 entry_SYSCALL_64_fastpath+0x1e/0x8b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ucontext->uobjects_lock); lock(&uverbs_dev->xrcd_tree_mutex); lock(&ucontext->uobjects_lock); lock(&uverbs_dev->xrcd_tree_mutex); *** DEADLOCK *** 3 locks held by syzkaller401056/269: #0: (&file->cleanup_mutex){+.+.}, at: [<00000000c9f0c252>] ib_uverbs_close+0xac/0x570 #1: (&ucontext->cleanup_rwsem){++++}, at: [<00000000b6994d49>] uverbs_cleanup_ucontext+0xf6/0x730 #2: (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730 stack backtrace: CPU: 0 PID: 269 Comm: syzkaller401056 Not tainted 4.15.0+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ? uverbs_cleanup_ucontext+0x168/0x730 ? console_unlock+0x502/0xbd0 print_circular_bug.isra.24+0x35e/0x396 ? print_circular_bug_header+0x12e/0x12e ? find_usage_backwards+0x30/0x30 ? entry_SYSCALL_64_fastpath+0x1e/0x8b validate_chain.isra.28+0x25d1/0x40c0 ? check_usage+0xb70/0xb70 ? graph_lock+0x160/0x160 ? find_usage_backwards+0x30/0x30 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? __lock_acquire+0x93d/0x1630 __lock_acquire+0x93d/0x1630 lock_acquire+0x19d/0x440 ? uverbs_free_xrcd+0xd2/0x360 __mutex_lock+0x111/0x1720 ? uverbs_free_xrcd+0xd2/0x360 ? uverbs_free_xrcd+0xd2/0x360 ? __mutex_lock+0x828/0x1720 ? mutex_lock_io_nested+0x1550/0x1550 ? uverbs_cleanup_ucontext+0x168/0x730 ? __lock_acquire+0x9a9/0x1630 ? mutex_lock_io_nested+0x1550/0x1550 ? uverbs_cleanup_ucontext+0xf6/0x730 ? lock_contended+0x11a0/0x11a0 ? uverbs_free_xrcd+0xd2/0x360 uverbs_free_xrcd+0xd2/0x360 remove_commit_idr_uobject+0x6d/0x110 uverbs_cleanup_ucontext+0x2f0/0x730 ? sched_clock_cpu+0x18/0x200 ? uverbs_close_fd+0x1c0/0x1c0 ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120 ib_uverbs_close+0xf2/0x570 ? ib_uverbs_remove_one+0xb50/0xb50 ? ib_uverbs_remove_one+0xb50/0xb50 __fput+0x2cd/0x8d0 task_work_run+0xec/0x1d0 do_exit+0x6a1/0x1520 ? fsnotify_first_mark+0x220/0x220 ? exit_notify+0x9f0/0x9f0 ? entry_SYSCALL_64_fastpath+0x5/0x8b ? entry_SYSCALL_64_fastpath+0x5/0x8b ? trace_hardirqs_on_thunk+0x1a/0x1c ? time_hardirqs_on+0x27/0x670 ? time_hardirqs_off+0x27/0x490 ? syscall_return_slowpath+0x6c/0x460 ? entry_SYSCALL_64_fastpath+0x5/0x8b do_group_exit+0xe8/0x380 SyS_exit_group+0x1e/0x20 entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x431ce9 Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: fd3c7904db6e ("IB/core: Change idr objects to use the new schema") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcdLeon Romanovsky2018-02-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no matching lock for this mutex. Git history suggests this is just a missed remnant from an earlier version of the function before this locking was moved into uverbs_free_xrcd. Originally this lock was protecting the xrcd_table_delete() ===================================== WARNING: bad unlock balance detected! 4.15.0+ #87 Not tainted ------------------------------------- syzkaller223405/269 is trying to release lock (&uverbs_dev->xrcd_tree_mutex) at: [<00000000b8703372>] ib_uverbs_close_xrcd+0x195/0x1f0 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller223405/269: #0: (&uverbs_dev->disassociate_srcu){....}, at: [<000000005af3b960>] ib_uverbs_write+0x265/0xef0 stack backtrace: CPU: 0 PID: 269 Comm: syzkaller223405 Not tainted 4.15.0+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ? ib_uverbs_write+0x265/0xef0 ? console_unlock+0x502/0xbd0 ? ib_uverbs_close_xrcd+0x195/0x1f0 print_unlock_imbalance_bug+0x131/0x160 lock_release+0x59d/0x1100 ? ib_uverbs_close_xrcd+0x195/0x1f0 ? lock_acquire+0x440/0x440 ? lock_acquire+0x440/0x440 __mutex_unlock_slowpath+0x88/0x670 ? wait_for_completion+0x4c0/0x4c0 ? rdma_lookup_get_uobject+0x145/0x2f0 ib_uverbs_close_xrcd+0x195/0x1f0 ? ib_uverbs_open_xrcd+0xdd0/0xdd0 ib_uverbs_write+0x7f9/0xef0 ? cyc2ns_read_end+0x10/0x10 ? ib_uverbs_open_xrcd+0xdd0/0xdd0 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x18/0x200 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? __fget+0x358/0x5d0 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x4335c9 Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: fd3c7904db6e ("IB/core: Change idr objects to use the new schema") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/restrack: Increment CQ restrack object before committingLeon Romanovsky2018-02-151-3/+3
| | | | | | | | | | | | Once the uobj is committed it is immediately possible another thread could destroy it, which worst case, can result in a use-after-free of the restrack objects. Cc: syzkaller <syzkaller@googlegroups.com> Fixes: 08f294a1524b ("RDMA/core: Add resource tracking for create and destroy CQs") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/uverbs: Protect from command mask overflowLeon Romanovsky2018-02-151-7/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The command number is not bounds checked against the command mask before it is shifted, resulting in an ubsan hit. This does not cause malfunction since the command number is eventually bounds checked, but we can make this ubsan clean by moving the bounds check to before the mask check. ================================================================================ UBSAN: Undefined behaviour in drivers/infiniband/core/uverbs_main.c:647:21 shift exponent 207 is too large for 64-bit type 'long long unsigned int' CPU: 0 PID: 446 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #61 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ubsan_epilogue+0xe/0x81 __ubsan_handle_shift_out_of_bounds+0x293/0x2f7 ? debug_check_no_locks_freed+0x340/0x340 ? __ubsan_handle_load_invalid_value+0x19b/0x19b ? lock_acquire+0x440/0x440 ? lock_acquire+0x19d/0x440 ? __might_fault+0xf4/0x240 ? ib_uverbs_write+0x68d/0xe20 ib_uverbs_write+0x68d/0xe20 ? __lock_acquire+0xcf7/0x3940 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x18/0x200 ? sched_clock_cpu+0x18/0x200 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? __fget+0x35b/0x5d0 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x448e29 RSP: 002b:00007f033f567c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f033f5686bc RCX: 0000000000448e29 RDX: 0000000000000060 RSI: 0000000020001000 RDI: 0000000000000012 RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000056a0 R14: 00000000006e8740 R15: 0000000000000000 ================================================================================ Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.5 Fixes: 2dbd5186a39c ("IB/core: IB/core: Allow legacy verbs through extended interfaces") Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroyJason Gunthorpe2018-02-151-2/+3
| | | | | | | | | | | | | | | | If remove_commit fails then the lock is left locked while the uobj still exists. Eventually the kernel will deadlock. lockdep detects this and says: test/4221 is leaving the kernel with locks still held! 1 lock held by test/4221: #0: (&ucontext->cleanup_rwsem){.+.+}, at: [<000000001e5c7523>] rdma_explicit_destroy+0x37/0x120 [ib_uverbs] Fixes: 4da70da23e9b ("IB/core: Explicitly destroy an object while keeping uobject") Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Improve lockdep_checkJason Gunthorpe2018-02-151-7/+7
| | | | | | | | | | | | | | This is really being used as an assert that the expected usecnt is being held and implicitly that the usecnt is valid. Rename it to assert_uverbs_usecnt and tighten the checks to only accept valid values of usecnt (eg 0 and < -1 are invalid). The tigher checkes make the assertion cover more cases and is more likely to find bugs via syzkaller/etc. Fixes: 3832125624b7 ("IB/core: Add support for idr types") Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/uverbs: Protect from races between lookup and destroy of uobjectsLeon Romanovsky2018-02-151-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The race is between lookup_get_idr_uobject and uverbs_idr_remove_uobj -> uverbs_uobject_put. We deliberately do not call sychronize_rcu after the idr_remove in uverbs_idr_remove_uobj for performance reasons, instead we call kfree_rcu() during uverbs_uobject_put. However, this means we can obtain pointers to uobj's that have already been released and must protect against krefing them using kref_get_unless_zero. ================================================================== BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs.isra.2+0x860/0xa00 Read of size 4 at addr ffff88005fda1ac8 by task syz-executor2/441 CPU: 1 PID: 441 Comm: syz-executor2 Not tainted 4.15.0-rc2+ #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0x8d/0xd4 print_address_description+0x73/0x290 kasan_report+0x25c/0x370 ? copy_ah_attr_from_uverbs.isra.2+0x860/0xa00 copy_ah_attr_from_uverbs.isra.2+0x860/0xa00 ? uverbs_try_lock_object+0x68/0xc0 ? modify_qp.isra.7+0xdc4/0x10e0 modify_qp.isra.7+0xdc4/0x10e0 ib_uverbs_modify_qp+0xfe/0x170 ? ib_uverbs_query_qp+0x970/0x970 ? __lock_acquire+0xa11/0x1da0 ib_uverbs_write+0x55a/0xad0 ? ib_uverbs_query_qp+0x970/0x970 ? ib_uverbs_query_qp+0x970/0x970 ? ib_uverbs_open+0x760/0x760 ? futex_wake+0x147/0x410 ? sched_clock_cpu+0x18/0x180 ? check_prev_add+0x1680/0x1680 ? do_futex+0x3b6/0xa30 ? sched_clock_cpu+0x18/0x180 __vfs_write+0xf7/0x5c0 ? ib_uverbs_open+0x760/0x760 ? kernel_read+0x110/0x110 ? lock_acquire+0x370/0x370 ? __fget+0x264/0x3b0 vfs_write+0x18a/0x460 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x448e29 RSP: 002b:00007f443fee0c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f443fee16bc RCX: 0000000000448e29 RDX: 0000000000000078 RSI: 00000000209f8000 RDI: 0000000000000012 RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000008e98 R14: 00000000006ebf38 R15: 0000000000000000 Allocated by task 1: kmem_cache_alloc_trace+0x16c/0x2f0 mlx5_alloc_cmd_msg+0x12e/0x670 cmd_exec+0x419/0x1810 mlx5_cmd_exec+0x40/0x70 mlx5_core_mad_ifc+0x187/0x220 mlx5_MAD_IFC+0xd7/0x1b0 mlx5_query_mad_ifc_gids+0x1f3/0x650 mlx5_ib_query_gid+0xa4/0xc0 ib_query_gid+0x152/0x1a0 ib_query_port+0x21e/0x290 mlx5_port_immutable+0x30f/0x490 ib_register_device+0x5dd/0x1130 mlx5_ib_add+0x3e7/0x700 mlx5_add_device+0x124/0x510 mlx5_register_interface+0x11f/0x1c0 mlx5_ib_init+0x56/0x61 do_one_initcall+0xa3/0x250 kernel_init_freeable+0x309/0x3b8 kernel_init+0x14/0x180 ret_from_fork+0x24/0x30 Freed by task 1: kfree+0xeb/0x2f0 mlx5_free_cmd_msg+0xcd/0x140 cmd_exec+0xeba/0x1810 mlx5_cmd_exec+0x40/0x70 mlx5_core_mad_ifc+0x187/0x220 mlx5_MAD_IFC+0xd7/0x1b0 mlx5_query_mad_ifc_gids+0x1f3/0x650 mlx5_ib_query_gid+0xa4/0xc0 ib_query_gid+0x152/0x1a0 ib_query_port+0x21e/0x290 mlx5_port_immutable+0x30f/0x490 ib_register_device+0x5dd/0x1130 mlx5_ib_add+0x3e7/0x700 mlx5_add_device+0x124/0x510 mlx5_register_interface+0x11f/0x1c0 mlx5_ib_init+0x56/0x61 do_one_initcall+0xa3/0x250 kernel_init_freeable+0x309/0x3b8 kernel_init+0x14/0x180 ret_from_fork+0x24/0x30 The buggy address belongs to the object at ffff88005fda1ab0 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 24 bytes inside of 32-byte region [ffff88005fda1ab0, ffff88005fda1ad0) The buggy address belongs to the page: page:00000000d5655c19 count:1 mapcount:0 mapping: (null) index:0xffff88005fda1fc0 flags: 0x4000000000000100(slab) raw: 4000000000000100 0000000000000000 ffff88005fda1fc0 0000000180550008 raw: ffffea00017f6780 0000000400000004 ffff88006c803980 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88005fda1980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb ffff88005fda1a00: fb fb fc fc fb fb fb fb fc fc 00 00 00 00 fc fc ffff88005fda1a80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb ffff88005fda1b00: fc fc 00 00 00 00 fc fc fb fb fb fb fc fc fb fb ffff88005fda1b80: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc ==================================================================@ Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: 3832125624b7 ("IB/core: Add support for idr types") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Hold the uobj write lock after allocateJason Gunthorpe2018-02-151-1/+10
| | | | | | | | | | | | | | This clarifies the design intention that time between allocate and commit has the uobj exclusive to the caller. We already guarantee this by delaying publishing the uobj pointer via idr_insert, fd_install, list_add, etc. Additionally holding the usecnt lock during this period provides extra clarity and more protection against future mistakes. Fixes: 3832125624b7 ("IB/core: Add support for idr types") Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Fix possible oops with duplicate ioctl attributesMatan Barak2018-02-151-0/+3
| | | | | | | | | | | | | | | | | | If the same attribute is listed twice by the user in the ioctl attribute list then error unwind can cause the kernel to deref garbage. This happens when an object with WRITE access is sent twice. The second parse properly fails but corrupts the state required for the error unwind it triggers. Fixing this by making duplicates in the attribute list invalid. This is not something we need to support. The ioctl interface is currently recommended to be disabled in kConfig. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Add ioctl support for 32bit processesMatan Barak2018-02-151-0/+2
| | | | | | | | | | | | | | | | 32 bit processes running on a 64 bit kernel call compat_ioctl so that implementations can revise any structure layout issues. Point compat_ioctl at our normal ioctl because: - All our structures are designed to be the same on 32 and 64 bit, ie we use __aligned_u64 when required and are careful to manage padding. - Any pointers are stored in u64's and userspace is expected to prepare them properly. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Fix method merging in uverbs_ioctl_mergeMatan Barak2018-02-151-9/+9
| | | | | | | | | | | | | Fix a bug in uverbs_ioctl_merge that looked at the object's iterator number instead of the method's iterator number when merging methods. While we're at it, make the uverbs_ioctl_merge code a bit more clear and faster. Fixes: 118620d3686b ('IB/core: Add uverbs merge trees functionality') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Use u64_to_user_ptr() not a unionJason Gunthorpe2018-02-151-2/+2
| | | | | | | | | | The union approach will get the endianness wrong sometimes if the kernel's pointer size is 32 bits resulting in EFAULTs when trying to copy to/from user. Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Use inline data transfer for UHW_INJason Gunthorpe2018-02-151-1/+4
| | | | | | | | | | | | | The rule for the API is pointers less than 8 bytes are inlined into the .data field of the attribute. Fix the creation of the driver udata struct to follow this rule and point to the .data itself when the size is less than 8 bytes. Otherwise if the UHW struct is less than 8 bytes the driver will get EFAULT during copy_from_user. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/uverbs: Always use the attribute size provided by the userMatan Barak2018-02-151-2/+3
| | | | | | | | | | | | | | | This fixes several bugs around the copy_to/from user path: - copy_to used the user provided size of the attribute and could copy data beyond the end of the kernel buffer into userspace. - copy_from didn't know the size of the kernel buffer and could have left kernel memory unexpectedly un-initialized. - copy_from did not use the user length to determine if the attribute data is inlined or not. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA/restrack: Remove unimplemented XRCD objectLeon Romanovsky2018-02-151-5/+0
| | | | | | | | | Resource tracking of XRCD objects is not implemented in current version of restrack and hence can be removed. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-114-5/+5
| | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-02-062-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more rdma updates from Doug Ledford: "Items of note: - two patches fix a regression in the 4.15 kernel. The 4.14 kernel worked fine with NVMe over Fabrics and mlx5 adapters. That broke in 4.15. The fix is here. - one of the patches (the endian notation patch from Lijun) looks like a lot of lines of change, but it's mostly mechanical in nature. It amounts to the biggest chunk of change in it (it's about 2/3rds of the overall pull request). Summary: - Clean up some function signatures in rxe for clarity - Tidy the RDMA netlink header to remove unimplemented constants - bnxt_re driver fixes, one is a regression this window. - Minor hns driver fixes - Various fixes from Dan Carpenter and his tool - Fix IRQ cleanup race in HFI1 - HF1 performance optimizations and a fix to report counters in the right units - Fix for an IPoIB startup sequence race with the external manager - Oops fix for the new kabi path - Endian cleanups for hns - Fix for mlx5 related to the new automatic affinity support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits) net/mlx5: increase async EQ to avoid EQ overrun mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 RDMA/hns: Fix the endian problem for hns IB/uverbs: Use the standard kConfig format for experimental IB: Update references to libibverbs IB/hfi1: Add 16B rcvhdr trace support IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node IB/core: Avoid a potential OOPs for an unused optional parameter IB/core: Map iWarp AH type to undefined in rdma_ah_find_type IB/ipoib: Fix for potential no-carrier state IB/hfi1: Show fault stats in both TX and RX directions IB/hfi1: Remove blind constants from 16B update IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times IB/hfi1: Do not override given pcie_pset value IB/hfi1: Optimize process_receive_ib() IB/hfi1: Remove unnecessary fecn and becn fields IB/hfi1: Look up ibport using a pointer in receive path IB/hfi1: Optimize packet type comparison using 9B and bypass code paths IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet IB/hfi1: Remove dependence on qp->s_hdrwords ...
| * IB/core: Avoid a potential OOPs for an unused optional parameterMichael J. Ruhl2018-02-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ev_file is an optional parameter for CQ creation. If the parameter is not passed, the ev_file pointer will be NULL. Using that pointer to set the cq_context will result in an OOPs. Verify that ev_file is not NULL before using. Cc: <stable@vger.kernel.org> # 4.14.x Fixes: 9ee79fce3642 ("IB/core: Add completion queue (cq) object actions") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/nldev: missing error code in nldev_res_get_doit()Dan Carpenter2018-02-011-1/+3
| | | | | | | | | | | | | | | | | | | | We should return -ENOMEM if the allocation fails. The current code accidentally returns success. Fixes: bf3c5a93c523 ("RDMA/nldev: Provide global resource utilization") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-01-3128-661/+1327
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull RDMA subsystem updates from Jason Gunthorpe: "Overall this cycle did not have any major excitement, and did not require any shared branch with netdev. Lots of driver updates, particularly of the scale-up and performance variety. The largest body of core work was Parav's patches fixing and restructing some of the core code to make way for future RDMA containerization. Summary: - misc small driver fixups to bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes - several major feature adds to bnxt_re driver: SRIOV VF RoCE support, HugePages support, extended hardware stats support, and SRQ support - a notable number of fixes to the i40iw driver from debugging scale up testing - more work to enable the new hip08 chip in the hns driver - misc small ULP fixups to srp/srpt//ipoib - preparation for srp initiator and target to support the RDMA-CM protocol for connections - add RDMA-CM support to srp initiator, srp target is still a WIP - fixes for a couple of places where ipoib could spam the dmesg log - fix encode/decode of FDR/EDR data rates in the core - many patches from Parav with ongoing work to clean up inconsistencies and bugs in RoCE support around the rdma_cm - mlx5 driver support for the userspace features 'thread domain', 'wallclock timestamps' and 'DV Direct Connected transport'. Support for the firmware dual port rocee capability - core support for more than 32 rdma devices in the char dev allocation - kernel doc updates from Randy Dunlap - new netlink uAPI for inspecting RDMA objects similar in spirit to 'ss' - one minor change to the kobject code acked by Greg KH" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (259 commits) RDMA/nldev: Provide detailed QP information RDMA/nldev: Provide global resource utilization RDMA/core: Add resource tracking for create and destroy PDs RDMA/core: Add resource tracking for create and destroy CQs RDMA/core: Add resource tracking for create and destroy QPs RDMA/restrack: Add general infrastructure to track RDMA resources RDMA/core: Save kernel caller name when creating PD and CQ objects RDMA/core: Use the MODNAME instead of the function name for pd callers RDMA: Move enum ib_cq_creation_flags to uapi headers IB/rxe: Change RDMA_RXE kconfig to use select IB/qib: remove qib_keys.c IB/mthca: remove mthca_user.h RDMA/cm: Fix access to uninitialized variable RDMA/cma: Use existing netif_is_bond_master function IB/core: Avoid SGID attributes query while converting GID from OPA to IB RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure IB/umad: Fix use of unprotected device pointer IB/iser: Combine substrings for three messages IB/iser: Delete an unnecessary variable initialisation in iser_send_data_out() IB/iser: Delete an error message for a failed memory allocation in iser_send_data_out() ...
| * RDMA/nldev: Provide detailed QP informationLeon Romanovsky2018-01-291-0/+227
| | | | | | | | | | | | | | | | | | | | | | | | | | Implement RDMA nldev netlink interface to get detailed information on each QP in the system. This includes the owning process or kernel ULP and detailed information from the qp_attrs. Currently only the dumpit variant is implemented. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/nldev: Provide global resource utilizationLeon Romanovsky2018-01-291-0/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose through the netlink interface the global per-device utilization of the supported object types. Provide both dumpit and doit callbacks. As an example of possible output from rdmatool for system with 5 mlx5 cards: $ rdma res 1: mlx5_0: qp 4 cq 5 pd 3 2: mlx5_1: qp 4 cq 5 pd 3 3: mlx5_2: qp 4 cq 5 pd 3 4: mlx5_3: qp 2 cq 3 pd 2 5: mlx5_4: qp 4 cq 5 pd 3 Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/core: Add resource tracking for create and destroy PDsLeon Romanovsky2018-01-292-0/+7
| | | | | | | | | | | | | | | | | | Track create and destroy operations of PD objects. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/core: Add resource tracking for create and destroy CQsLeon Romanovsky2018-01-294-0/+14
| | | | | | | | | | | | | | | | | | Track create and destroy operations of CQ objects. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/core: Add resource tracking for create and destroy QPsLeon Romanovsky2018-01-293-5/+30
| | | | | | | | | | | | | | | | | | Track create and destroy operations of QP objects. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/restrack: Add general infrastructure to track RDMA resourcesLeon Romanovsky2018-01-294-1/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RDMA subsystem has very strict set of objects to work with, but it completely lacks tracking facilities and has no visibility of resource utilization. The following patch adds such infrastructure to keep track of RDMA resources to help with debugging of user space applications. The primary user of this infrastructure is RDMA nldev netlink (following patches), to be exposed to userspace via rdmatool, but it is not limited too that. At this stage, the main three objects (PD, CQ and QP) are added, and more will be added later. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/core: Save kernel caller name when creating PD and CQ objectsLeon Romanovsky2018-01-292-6/+8
| | | | | | | | | | | | | | | | | | | | The KBUILD_MODNAME variable contains the module name and it is known for kernel users during compilation, so let's reuse it to track the owners. Followup patches will store this for resource tracking. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/cm: Fix access to uninitialized variableLeon Romanovsky2018-01-281-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | The ndev will be initialized and held only for successful ib_get_cached_gid(), otherwise it is garbage stack memory. Calling dev_put() in failure path is wrong. Fixes: 16c72e402867 ("IB/cm: Refactor to avoid setting path record software only fields") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/cma: Use existing netif_is_bond_master functionParav Pandit2018-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When checking whatever the current netdev is the bond master interface, use kernel API netif_is_bond_master() instead of hardcoding the check. No functionality is changed. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/core: Avoid SGID attributes query while converting GID from OPA to IBParav Pandit2018-01-281-3/+1
| | | | | | | | | | | | | | | | | | | | SGID attributes are not used during OPA to IB GID conversion. Therefore don't query it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * IB/umad: Fix use of unprotected device pointerJack Morgenstein2018-01-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ib_write_umad() is protected by taking the umad file mutex. However, it accesses file->port->ib_dev -- which is protected only by the port's mutex (field file_mutex). The ib_umad_remove_one() calls ib_umad_kill_port() which sets port->ib_dev to NULL under the port mutex (NOT the file mutex). It then sets the mad agent to "dead" under the umad file mutex. This is a race condition -- because there is a window where port->ib_dev is NULL, while the agent is not "dead". As a result, we saw stack traces like: [16490.678059] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [16490.678246] IP: ib_umad_write+0x29c/0xa3a [ib_umad] [16490.678333] PGD 0 P4D 0 [16490.678404] Oops: 0000 [#1] SMP PTI [16490.678466] Modules linked in: rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx4_en(OE) ptp pps_core mlx4_ib(OE-) ib_core(OE) mlx4_core(OE) mlx_compat (OE) memtrack(OE) devlink mst_pciconf(OE) mst_pci(OE) netconsole nfsv3 nfs_acl nfs lockd grace fscache cfg80211 rfkill esp6_offload esp6 esp4_offload esp4 sunrpc kvm_intel kvm ppdev parport_pc irqbypass parport joydev i2c_piix4 virtio_balloon cirrus drm_kms_helper ttm drm e1000 serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg [last unloaded: mlxfw] [16490.679202] CPU: 4 PID: 3115 Comm: sminfo Tainted: G OE 4.14.13-300.fc27.x86_64 #1 [16490.679339] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [16490.679477] task: ffff9cf753890000 task.stack: ffffaf70c26b0000 [16490.679571] RIP: 0010:ib_umad_write+0x29c/0xa3a [ib_umad] [16490.679664] RSP: 0018:ffffaf70c26b3d90 EFLAGS: 00010202 [16490.679747] RAX: 0000000000000010 RBX: ffff9cf75610fd80 RCX: 0000000000000000 [16490.679856] RDX: 0000000000000001 RSI: 00007ffdf2bfd714 RDI: ffff9cf6bb2a9c00 In the above trace, ib_umad_write is trying to dereference the NULL file->port->ib_dev pointer. Fix this by using the agent's device pointer (the device field in struct ib_mad_agent) -- which IS protected by the umad file mutex. Cc: <stable@vger.kernel.org> # v4.11 Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/uverbs: Use an unambiguous errno for method not supportedJason Gunthorpe2018-01-251-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Returning EOPNOTSUPP is problematic because it can also be returned by the method function, and we use it in quite a few places in drivers these days. Instead, dedicate EPROTONOSUPPORT to indicate that the ioctl framework is enabled but the requested object and method are not supported by the kernel. No other case will return this code, and it lets userspace know to fall back to write(). grep says we do not use it today in drivers/infiniband subsystem. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/cma: Update RoCE multicast routines to use net namespaceParav Pandit2018-01-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | rdma_dev_addr contains the net namespace pointer, while referring bound_dev_if of the rdma_dev_addr, refer to the net namespace of rdma_cm_id stored in rdma_dev_addr. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/cma: Update cma_validate_port to honor net namespaceParav Pandit2018-01-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | cma_validate_port uses rdma_dev_addr to validate the port of the cm_id. It needs to honor the net namespace which is setup during cm_id creation when finding netdevice. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>