summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ipv6: add a wrapper for ip6_dst_store() with flowi6 checksAlexey Kodanev2018-04-043-8/+21
| | | | | | | | | | | | Move commonly used pattern of ip6_dst_store() usage to a separate function - ip6_sk_dst_store_flow(), which will check the addresses for equality using the flow information, before saving them. There is no functional changes in this patch. In addition, it will be used in the next patch, in ip6_sk_dst_lookup_flow(). Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: marvell10g: add thermal hwmon deviceRussell King2018-04-041-2/+182
| | | | | | | | | | Add a thermal monitoring device for the Marvell 88x3310, which updates once a second. We also need to hook into the suspend/resume mechanism to ensure that the thermal monitoring is reconfigured when we resume. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* pptp: remove a buggy dst release in pptp_connect()Eric Dumazet2018-04-041-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once dst has been cached in socket via sk_setup_caps(), it is illegal to call ip_rt_put() (or dst_release()), since sk_setup_caps() did not change dst refcount. We can still dereference it since we hold socket lock. Caugth by syzbot : BUG: KASAN: use-after-free in atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline] BUG: KASAN: use-after-free in dst_release+0x27/0xa0 net/core/dst.c:185 Write of size 4 at addr ffff8801c54dc040 by task syz-executor4/20088 CPU: 1 PID: 20088 Comm: syz-executor4 Not tainted 4.16.0+ #376 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1a7/0x27d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x137/0x190 mm/kasan/kasan.c:267 kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278 atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline] dst_release+0x27/0xa0 net/core/dst.c:185 sk_dst_set include/net/sock.h:1812 [inline] sk_dst_reset include/net/sock.h:1824 [inline] sock_setbindtodevice net/core/sock.c:610 [inline] sock_setsockopt+0x431/0x1b20 net/core/sock.c:707 SYSC_setsockopt net/socket.c:1845 [inline] SyS_setsockopt+0x2ff/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4552d9 RSP: 002b:00007f4878126c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007f48781276d4 RCX: 00000000004552d9 RDX: 0000000000000019 RSI: 0000000000000001 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000010 R09: 0000000000000000 R10: 00000000200010c0 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000526 R14: 00000000006fac30 R15: 0000000000000000 Allocated by task 20088: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3542 dst_alloc+0x11f/0x1a0 net/core/dst.c:104 rt_dst_alloc+0xe9/0x540 net/ipv4/route.c:1520 __mkroute_output net/ipv4/route.c:2265 [inline] ip_route_output_key_hash_rcu+0xa49/0x2c60 net/ipv4/route.c:2493 ip_route_output_key_hash+0x20b/0x370 net/ipv4/route.c:2322 __ip_route_output_key include/net/route.h:126 [inline] ip_route_output_flow+0x26/0xa0 net/ipv4/route.c:2577 ip_route_output_ports include/net/route.h:163 [inline] pptp_connect+0xa84/0x1170 drivers/net/ppp/pptp.c:453 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Freed by task 20082: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527 __cache_free mm/slab.c:3486 [inline] kmem_cache_free+0x83/0x2a0 mm/slab.c:3744 dst_destroy+0x266/0x380 net/core/dst.c:140 dst_destroy_rcu+0x16/0x20 net/core/dst.c:153 __rcu_reclaim kernel/rcu/rcu.h:178 [inline] rcu_do_batch kernel/rcu/tree.c:2675 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2930 [inline] __rcu_process_callbacks kernel/rcu/tree.c:2897 [inline] rcu_process_callbacks+0xd6c/0x17b0 kernel/rcu/tree.c:2914 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 The buggy address belongs to the object at ffff8801c54dc000 which belongs to the cache ip_dst_cache of size 168 The buggy address is located 64 bytes inside of 168-byte region [ffff8801c54dc000, ffff8801c54dc0a8) The buggy address belongs to the page: page:ffffea0007153700 count:1 mapcount:0 mapping:ffff8801c54dc000 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffff8801c54dc000 0000000000000000 0000000100000010 raw: ffffea0006b34b20 ffffea0006b6c1e0 ffff8801d674a1c0 0000000000000000 page dumped because: kasan: bad access detected Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: mt7530: Use NULL instead of plain integerFlorian Fainelli2018-04-041-3/+3
| | | | | | | | | | We would be passing 0 instead of NULL as the rsp argument to mt7530_fdb_cmd(), fix that. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: b53: Fix sparse warnings in b53_mmap.cFlorian Fainelli2018-04-041-9/+24
| | | | | | | | | | | | | | | | | | | sparse complains about the following warnings: drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in initializer (different address spaces) drivers/net/dsa/b53/b53_mmap.c:33:31: expected unsigned char [noderef] [usertype] <asn:2>*regs drivers/net/dsa/b53/b53_mmap.c:33:31: got void *priv and indeed, while what we are doing is functional, we are dereferencing a void * pointer into a void __iomem * which is not great. Just use the defined b53_mmap_priv structure which holds our register base and use that. Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_unix: remove redundant lockdep classCong Wang2018-04-041-10/+0
| | | | | | | | | | | | | | | After commit 581319c58600 ("net/socket: use per af lockdep classes for sk queues") sock queue locks now have per-af lockdep classes, including unix socket. It is no longer necessary to workaround it. I noticed this while looking at a syzbot deadlock report, this patch itself doesn't fix it (this is why I don't add Reported-by). Fixes: 581319c58600 ("net/socket: use per af lockdep classes for sk queues") Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'net-Broadcom-drivers-sparse-fixes'David S. Miller2018-04-042-10/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Florian Fainelli says: ==================== net: Broadcom drivers sparse fixes This patch series fixes the same warning reported by sparse in bcmsysport and bcmgenet in the code that deals with inserting the TX checksum pointers: drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16 drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types) drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: expected unsigned short [unsigned] [usertype] val drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: got restricted __be16 [usertype] protocol This patch fixes both issues by using the same construct and not swapping skb->protocol but instead the values we are checking against. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()Florian Fainelli2018-04-041-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | skb->protocol is a __be16 which we would be calling htons() against, while this is not wrong per-se as it correctly results in swapping the value on LE hosts, this still upsets sparse. Adopt a similar pattern to what other drivers do and just assign ip_ver to skb->protocol, and then use htons() against the different constants such that the compiler can resolve the values at build time. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()Florian Fainelli2018-04-041-5/+6
|/ | | | | | | | | | | | | skb->protocol is a __be16 which we would be calling htons() against, while this is not wrong per-se as it correctly results in swapping the value on LE hosts, this still upsets sparse. Adopt a similar pattern to what other drivers do and just assign ip_ver to skb->protocol, and then use htons() against the different constants such that the compiler can resolve the values at build time. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* rxrpc: Fix undefined packet handlingDavid Howells2018-04-042-0/+12
| | | | | | | | | | By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11 should just be discarded rather than being aborted like other undefined packet types. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'userns-linus' of ↵Linus Torvalds2018-04-0315-455/+346
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "There was a lot of work this cycle fixing bugs that were discovered after the merge window and getting everything ready where we can reasonably support fully unprivileged fuse. The bug fixes you already have and much of the unprivileged fuse work is coming in via other trees. Still left for fully unprivileged fuse is figuring out how to cleanly handle .set_acl and .get_acl in the legacy case, and properly handling of evm xattrs on unprivileged mounts. Included in the tree is a cleanup from Alexely that replaced a linked list with a statically allocated fix sized array for the pid caches, which simplifies and speeds things up. Then there is are some cleanups and fixes for the ipc namespace. The motivation was that in reviewing other code it was discovered that access ipc objects from different pid namespaces recorded pids in such a way that when asked the wrong pids were returned. In the worst case there has been a measured 30% performance impact for sysvipc semaphores. Other test cases showed no measurable performance impact. Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc performance both gave the nod that this is good enough. Casey Schaufler and James Morris have given their approval to the LSM side of the changes. I simplified the types and the code dealing with sysvipc to pass just kern_ipc_perm for all three types of ipc. Which reduced the header dependencies throughout the kernel and simplified the lsm code. Which let me work on the pid fixes without having to worry about trivial changes causing complete kernel recompiles" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ipc/shm: Fix pid freeing. ipc/shm: fix up for struct file no longer being available in shm.h ipc/smack: Tidy up from the change in type of the ipc security hooks ipc: Directly call the security hook in ipc_ops.associate ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces. ipc/util: Helpers for making the sysvipc operations pid namespace aware ipc: Move IPCMNI from include/ipc.h into ipc/util.h msg: Move struct msg_queue into ipc/msg.c shm: Move struct shmid_kernel into ipc/shm.c sem: Move struct sem and struct sem_array into ipc/sem.c msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks pidns: simpler allocation of pid_* caches
| * ipc/shm: Fix pid freeing.Eric W. Biederman2018-03-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 0day kernel test build report reported an oops: > > IP: put_pid+0x22/0x5c > PGD 19efa067 P4D 19efa067 PUD 0 > Oops: 0000 [#1] > CPU: 0 PID: 727 Comm: trinity Not tainted 4.16.0-rc2-00010-g98f929b #1 > RIP: 0010:put_pid+0x22/0x5c > RSP: 0018:ffff986719f73e48 EFLAGS: 00010202 > RAX: 00000006d765f710 RBX: ffff98671a4fa4d0 RCX: ffff986719f73d40 > RDX: 000000006f6e6125 RSI: 0000000000000000 RDI: ffffffffa01e6d21 > RBP: ffffffffa0955fe0 R08: 0000000000000020 R09: 0000000000000000 > R10: 0000000000000078 R11: ffff986719f73e76 R12: 0000000000001000 > R13: 00000000ffffffea R14: 0000000054000fb0 R15: 0000000000000000 > FS: 00000000028c2880(0000) GS:ffffffffa06ad000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000677846439 CR3: 0000000019fc1005 CR4: 00000000000606b0 > Call Trace: > ? ipc_update_pid+0x36/0x3e > ? newseg+0x34c/0x3a6 > ? ipcget+0x5d/0x528 > ? entry_SYSCALL_64_after_hwframe+0x52/0xb7 > ? SyS_shmget+0x5a/0x84 > ? do_syscall_64+0x194/0x1b3 > ? entry_SYSCALL_64_after_hwframe+0x42/0xb7 > Code: ff 05 e7 20 9b 03 58 c9 c3 48 ff 05 85 21 9b 03 48 85 ff 74 4f 8b 47 04 8b 17 48 ff 05 7c 21 9b 03 48 83 c0 03 48 c1 e0 04 ff ca <48> 8b 44 07 08 74 1f 48 ff 05 6c 21 9b 03 ff 0f 0f 94 c2 48 ff > RIP: put_pid+0x22/0x5c RSP: ffff986719f73e48 > CR2: 0000000677846439 > ---[ end trace ab8c5cb4389d37c5 ]--- > Kernel panic - not syncing: Fatal exception In newseg when changing shm_cprid and shm_lprid from pid_t to struct pid* I misread the kvmalloc as kvzalloc and thought shp was initialized to 0. As that is not the case it is not safe to for the error handling to address shm_cprid and shm_lprid before they are initialized. Therefore move the cleanup of shm_cprid and shm_lprid from the no_file error cleanup path to the no_id error cleanup path. Ensuring that an early error exit won't cause the oops above. Reported-by: kernel test robot <fengguang.wu@intel.com> Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * ipc/shm: fix up for struct file no longer being available in shm.hStephen Rothwell2018-03-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen Rothewell <sfr@canb.auug.org.au> wrote: > After merging the userns tree, today's linux-next build (powerpc > ppc64_defconfig) produced this warning: > > In file included from include/linux/sched.h:16:0, > from arch/powerpc/lib/xor_vmx_glue.c:14: > include/linux/shm.h:17:35: error: 'struct file' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] > bool is_file_shm_hugepages(struct file *file); > ^~~~ > > and many, many more (most warnings, but some errors - arch/powerpc is > mostly built with -Werror) I dug through this and I discovered that the error was caused by the removal of struct shmid_kernel from shm.h when building on powerpc. Except for observing the existence of "struct file *shm_file" in struct shmid_kernel I have no clue why the structure move would cause such a failure. I suspect shm.h always needed the forward declaration and someting had been confusing gcc into not issuing the warning. --EWB Fixes: a2e102cd3cdd ("shm: Move struct shmid_kernel into ipc/shm.c") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * ipc/smack: Tidy up from the change in type of the ipc security hooksEric W. Biederman2018-03-271-139/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the variables shp, sma, msq to isp. As that is how the code already refers to those variables. Collapse smack_of_shm, smack_of_sem, and smack_of_msq into smack_of_ipc, as the three functions had become completely identical. Collapse smack_shm_alloc_security, smack_sem_alloc_security and smack_msg_queue_alloc_security into smack_ipc_alloc_security as the three functions had become identical. Collapse smack_shm_free_security, smack_sem_free_security and smack_msg_queue_free_security into smack_ipc_free_security as the three functions had become identical. Requested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * ipc: Directly call the security hook in ipc_ops.associateEric W. Biederman2018-03-273-27/+3
| | | | | | | | | | | | | | | | | | After the last round of cleanups the shm, sem, and msg associate operations just became trivial wrappers around the appropriate security method. Simplify things further by just calling the security method directly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * ipc/sem: Fix semctl(..., GETPID, ...) between pid namespacesEric W. Biederman2018-03-271-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today the last process to update a semaphore is remembered and reported in the pid namespace of that process. If there are processes in any other pid namespace querying that process id with GETPID the result will be unusable nonsense as it does not make any sense in your own pid namespace. Due to ipc_update_pid I don't think you will be able to get System V ipc semaphores into a troublesome cache line ping-pong. Using struct pids from separate process are not a problem because they do not share a cache line. Using struct pid from different threads of the same process are unlikely to be a problem as the reference count update can be avoided. Further linux futexes are a much better tool for the job of mutual exclusion between processes than System V semaphores. So I expect programs that are performance limited by their interprocess mutual exclusion primitive will be using futexes. So while it is possible that enhancing the storage of the last rocess of a System V semaphore from an integer to a struct pid will cause a performance regression because of the effect of frequently updating the pid reference count. I don't expect that to happen in practice. This change updates semctl(..., GETPID, ...) to return the process id of the last process to update a semphore inthe pid namespace of the calling process. Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespacesEric W. Biederman2018-03-271-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today msg_lspid and msg_lrpid are remembered in the pid namespace of the creator and the processes that last send or received a sysvipc message. If you have processes in multiple pid namespaces that is just wrong. The process ids reported will not make the least bit of sense. This fix is slightly more susceptible to a performance problem than the related fix for System V shared memory. By definition the pids are updated by msgsnd and msgrcv, the fast path of System V message queues. The only concern over the previous implementation is the incrementing and decrementing of the pid reference count. As that is the only difference and multiple updates by of the task_tgid by threads in the same process have been shown in af_unix sockets to create a cache line ping-pong between cpus of the same processor. In this case I don't expect cache lines holding pid reference counts to ping pong between cpus. As senders and receivers update different pids there is a natural separation there. Further if multiple threads of the same process either send or receive messages the pid will be updated to the same value and ipc_update_pid will avoid the reference count update. Which means in the common case I expect msg_lspid and msg_lrpid to remain constant, and reference counts not to be updated when messages are sent. In rare cases it may be possible to trigger the issue which was observed for af_unix sockets, but it will require multiple processes with multiple threads to be either sending or receiving messages. It just does not feel likely that anyone would do that in practice. This change updates msgctl(..., IPC_STAT, ...) to return msg_lspid and msg_lrpid in the pid namespace of the process calling stat. This change also updates cat /proc/sysvipc/msg to return print msg_lspid and msg_lrpid in the pid namespace of the process that opened the proc file. Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user") Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.Eric W. Biederman2018-03-271-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today shm_cpid and shm_lpid are remembered in the pid namespace of the creator and the processes that last touched a sysvipc shared memory segment. If you have processes in multiple pid namespaces that is just wrong, and I don't know how this has been over-looked for so long. As only creation and shared memory attach and shared memory detach update the pids I do not expect there to be a repeat of the issues when struct pid was attached to each af_unix skb, which in some notable cases cut the performance in half. The problem was threads of the same process updating same struct pid from different cpus causing the cache line to be highly contended and bounce between cpus. As creation, attach, and detach are expected to be rare operations for sysvipc shared memory segments I do not expect that kind of cache line ping pong to cause probems. In addition because the pid is at a fixed location in the structure instead of being dynamic on a skb, the reference count of the pid does not need to be updated on each operation if the pid is the same. This ability to simply skip the pid reference count changes if the pid is unchanging further reduces the likelihood of the a cache line holding a pid reference count ping-ponging between cpus. Fixes: b488893a390e ("pid namespaces: changes to show virtual ids to user") Reviewed-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * ipc/util: Helpers for making the sysvipc operations pid namespace awareEric W. Biederman2018-03-242-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Capture the pid namespace when /proc/sysvipc/msg /proc/sysvipc/shm and /proc/sysvipc/sem are opened, and make it available through the new helper ipc_seq_pid_ns. This makes it possible to report the pids in these files in the pid namespace of the opener of the files. Implement ipc_update_pid. A simple impline helper that will only update a struct pid pointer if the new value does not equal the old value. This removes the need for wordy code sequences like: old = object->pid; object->pid = new; put_pid(old); and old = object->pid; if (old != new) { object->pid = new; put_pid(old); } Allowing the following to be written instead: ipc_update_pid(&object->pid, new); Which is easier to read and ensures that the pid reference count is not touched the old and the new values are the same. Not touching the reference count in this case is important to help avoid issues like af_unix experienced, where multiple threads of the same process managed to bounce the struct pid between cpu cache lines, but updating the pids reference count. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * ipc: Move IPCMNI from include/ipc.h into ipc/util.hEric W. Biederman2018-03-242-2/+1
| | | | | | | | | | | | | | | | The definition IPCMNI is only used in ipc/util.h and ipc/util.c. So there is no reason to keep it in a header file that the whole kernel can see. Move it into util.h to simplify future maintenance. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * msg: Move struct msg_queue into ipc/msg.cEric W. Biederman2018-03-242-18/+17
| | | | | | | | | | | | | | | | All of the users are now in ipc/msg.c so make the definition local to that file to make code maintenance easier. AKA to prevent rebuilding the entire kernel when struct msg_queue changes. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * shm: Move struct shmid_kernel into ipc/shm.cEric W. Biederman2018-03-242-22/+22
| | | | | | | | | | | | | | | | All of the users are now in ipc/shm.c so make the definition local to that file to make code maintenance easier. AKA to prevent rebuilding the entire kernel when struct shmid_kernel changes. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * sem: Move struct sem and struct sem_array into ipc/sem.cEric W. Biederman2018-03-222-39/+35
| | | | | | | | | | | | | | | | | | All of the users are now in ipc/sem.c so make the definitions local to that file to make code maintenance easier. AKA to prevent rebuilding the entire kernel when one of these files is changed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooksEric W. Biederman2018-03-226-65/+62
| | | | | | | | | | | | | | | | | | | | | | | | All of the implementations of security hooks that take msg_queue only access q_perm the struct kern_ipc_perm member. This means the dependencies of the msg_queue security hooks can be simplified by passing the kern_ipc_perm member of msg_queue. Making this change will allow struct msg_queue to become private to ipc/msg.c. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooksEric W. Biederman2018-03-226-56/+52
| | | | | | | | | | | | | | | | | | | | | | All of the implementations of security hooks that take shmid_kernel only access shm_perm the struct kern_ipc_perm member. This means the dependencies of the shm security hooks can be simplified by passing the kern_ipc_perm member of shmid_kernel.. Making this change will allow struct shmid_kernel to become private to ipc/shm.c. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * sem/security: Pass kern_ipc_perm not sem_array into the sem security hooksEric W. Biederman2018-03-226-57/+53
| | | | | | | | | | | | | | | | | | | | | | | | All of the implementations of security hooks that take sem_array only access sem_perm the struct kern_ipc_perm member. This means the dependencies of the sem security hooks can be simplified by passing the kern_ipc_perm member of sem_array. Making this change will allow struct sem and struct sem_array to become private to ipc/sem.c. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * pidns: simpler allocation of pid_* cachesAlexey Dobriyan2018-03-211-43/+24
| | | | | | | | | | | | | | | | | | | | | | | | Those pid_* caches are created on demand when a process advances to the new level of pid namespace. Which means pointers are stable, write only and thus can be packed into an array instead of spreading them over and using lists(!) to find them. Both first and subsequent clone/unshare(CLONE_NEWPID) become faster. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2018-04-035-34/+93
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull workqueue updates from Tejun Heo: "rcu_work addition and a couple trivial changes" * 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove the comment about the old manager_arb mutex workqueue: fix the comments of nr_idle fs/aio: Use rcu_work instead of explicit rcu and work item cgroup: Use rcu_work instead of explicit rcu and work item RCU, workqueue: Implement rcu_work
| * | workqueue: remove the comment about the old manager_arb mutexLai Jiangshan2018-03-201-1/+0
| | | | | | | | | | | | | | | | | | | | | The manager_arb mutex doesn't exist any more. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | workqueue: fix the comments of nr_idleLai Jiangshan2018-03-201-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the worker rebinding behavior was refactored, there is no idle worker off the idle_list now. The comment is outdated and can be just removed. It also groups nr_workers and nr_idle together. Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | fs/aio: Use rcu_work instead of explicit rcu and work itemTejun Heo2018-03-191-15/+6
| | | | | | | | | | | | | | | | | | | | | Workqueue now has rcu_work. Use it instead of open-coding rcu -> work item bouncing. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | cgroup: Use rcu_work instead of explicit rcu and work itemTejun Heo2018-03-192-15/+8
| | | | | | | | | | | | | | | | | | | | | Workqueue now has rcu_work. Use it instead of open-coding rcu -> work item bouncing. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | RCU, workqueue: Implement rcu_workTejun Heo2018-03-192-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are cases where RCU callback needs to be bounced to a sleepable context. This is currently done by the RCU callback queueing a work item, which can be cumbersome to write and confusing to read. This patch introduces rcu_work, a workqueue work variant which gets executed after a RCU grace period, and converts the open coded bouncing in fs/aio and kernel/cgroup. v3: Dropped queue_rcu_work_on(). Documented rcu grace period behavior after queue_rcu_work(). v2: Use rcu_barrier() instead of synchronize_rcu() to wait for completion of previously queued rcu callback as per Paul. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-4.17' of ↵Linus Torvalds2018-04-0323-135/+995
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: "Nothing too interesting. The biggest change is refcnting fix for ata_host - the bug is recent and can only be triggered on controller hotplug, so very few are hitting it. There also are a number of trivial license / error message changes and some hardware specific changes" * 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (23 commits) ahci: imx: add the imx8qm ahci sata support libata: ensure host is free'd on error exit paths ata: ahci-platform: add reset control support ahci: imx: fix the build warning ata: add Amiga Gayle PATA controller driver ahci: imx: add the imx6qp ahci sata support ata: change Tegra124 to Tegra ata: ahci_tegra: Add AHCI support for Tegra210 ata: ahci_tegra: disable DIPM ata: ahci_tegra: disable devslp for Tegra124 ata: ahci_tegra: initialize regulators from soc struct ata: ahci_tegra: Update initialization sequence dt-bindings: Tegra210: add binding documentation libata: add refcounting to ata_host pata_bk3710: clarify license version and use SPDX header pata_falcon: clarify license version and use SPDX header pata_it821x: Delete an error message for a failed memory allocation in it821x_firmware_command() pata_macio: Delete an error message for a failed memory allocation in two functions pata_mpc52xx: Delete an error message for a failed memory allocation in mpc52xx_ata_probe() sata_dwc_460ex: Delete an error message for a failed memory allocation in sata_dwc_port_start() ...
| * | | ahci: imx: add the imx8qm ahci sata supportRichard Zhu2018-03-291-0/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - There are three PHY lanes on iMX8QM, and can be used in the following three cases 1. a two lanes PCIE_A, and a single lane SATA. 2. a single lane PCIE_A, a single lane PCIE_B and a single lane SATA. 3. a two lanes PCIE_A, and a single lane PCIE_B. The configuration of the iMX8QM AHCI SATA is relied on the usage of PCIE ports in the case 1 and 2. Use standalone iMX8 AHCI SATA probe and enable functions to enable iMX8QM AHCI SATA support. - To save power consumption, PHY CLKs can be gated off after the configurations are done. - The impedance ratio should be configured refer to differnet REXT values. 0x6c <--> REXT value is 85Ohms 0x80 (default value) <--> REXT value is 100Ohms. In general, REXT value should be 85ohms in standalone PCIE HW board design, and 100ohms in SATA standalone HW board design. When the PCIE and the SATA are enabled simultaneously in the HW board design. The REXT value would be set to 85ohms. Configure the SATA PHY impedance ratio to 0x6c in default. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | libata: ensure host is free'd on error exit pathsColin Ian King2018-03-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The host structure is not being kfree'd on two error exit paths leading to memory leaks. Add in new err_free label and kfree host. Detected by CoverityScan, CID#1466103 ("Resource leak") Fixes: 2623c7a5f279 ("libata: add refcounting to ata_host") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: ahci-platform: add reset control supportKunihiko Hayashi2018-03-263-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to get and control a list of resets for the device as optional and shared. These resets must be kept de-asserted until the device is enabled. This is specified as shared because some SoCs like UniPhier series have common reset controls with all ahci controller instances. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ahci: imx: fix the build warningRichard Zhu2018-03-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the default as the last entry to fix the following build warning introduced by commit. e5878732a521 ("ahci: imx: add the imx6qp ahci sata support") drivers/ata/ahci_imx.c: In function 'imx_sata_disable': drivers/ata/ahci_imx.c:478:2: warning: enumeration value 'AHCI_IMX53' not handled in switch [-Wswitch] switch (imxpriv->type) { ^~~~~~ Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: add Amiga Gayle PATA controller driverBartlomiej Zolnierkiewicz2018-03-193-0/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Amiga Gayle PATA controller driver. It enables libata support for the on-board IDE interfaces on some Amiga models (A600, A1200, A4000 and A4000T) and also for IDE interfaces on the Zorro expansion bus (M-Tech E-Matrix 530 expansion card). Thanks to John Paul Adrian Glaubitz and Michael Schmitz for help with testing the driver. Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Michael Schmitz <schmitzmic@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ahci: imx: add the imx6qp ahci sata supportRichard Zhu2018-03-193-4/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Regarding to imx6q ahci sata, imx6qp ahci sata has the reset mechanism. Add the imx6qp ahci sata support in this commit. - Use the specific reset callback for imx53 sata, and use the default ahci_ops.softreset for the others. Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: change Tegra124 to TegraPreetham Ramchandra2018-03-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ahci_tegra driver now supports Tegra124, Tegra132 and Tegra210, so change Tegra124 to Tegra. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: ahci_tegra: Add AHCI support for Tegra210Preetham Ramchandra2018-03-141-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the AHCI-compliant Serial ATA host controller on the Tegra210 system-on-chip. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: ahci_tegra: disable DIPMPreetham Ramchandra2018-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tegra does not support DIPM and it should be disabled. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: ahci_tegra: disable devslp for Tegra124Preetham Ramchandra2018-03-141-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tegra124 does not support devslp and it should be disabled. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: ahci_tegra: initialize regulators from soc structPreetham Ramchandra2018-03-141-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get the regulator names to be initialized from soc structure and initialize them. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ata: ahci_tegra: Update initialization sequencePreetham Ramchandra2018-03-141-64/+224
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the controller initialization sequence and move Tegra124 specifics to tegra124_ahci_init. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | dt-bindings: Tegra210: add binding documentationPreetham Ramchandra2018-03-141-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds bindings documentation for the AHCI controller on Tegra210. Signed-off-by: Preetham Chandru R <pchandru@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | libata: add refcounting to ata_hostTaras Kondratiuk2018-03-134-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 9a6d6a2ddabb ("ata: make ata port as parent device of scsi host") manual driver unbind/remove causes use-after-free. Unbind unconditionally invokes devres_release_all() which calls ata_host_release() and frees ata_host/ata_port memory while it is still being referenced as a parent of SCSI host. When SCSI host is finally released scsi_host_dev_release() calls put_device(parent) and accesses freed ata_port memory. Add reference counting to make sure that ata_host lives long enough. Bug report: https://lkml.org/lkml/2017/11/1/945 Fixes: 9a6d6a2ddabb ("ata: make ata port as parent device of scsi host") Cc: Tejun Heo <tj@kernel.org> Cc: Lin Ming <minggr@gmail.com> Cc: linux-ide@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Taras Kondratiuk <takondra@cisco.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | pata_bk3710: clarify license version and use SPDX headerBartlomiej Zolnierkiewicz2018-03-011-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - clarify license version (it should be GPL 2.0) - use SPDX header Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | pata_falcon: clarify license version and use SPDX headerBartlomiej Zolnierkiewicz2018-03-011-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - clarify license version (it should be GPL 2.0) - use SPDX header Cc: Michael Schmitz <schmitzmic@gmail.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>