summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
...
* blk-mq: add callback of .cleanup_rqMing Lei2019-10-051-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 226b4fc75c78f9c497c5182d939101b260cfb9f3 ] SCSI maintains its own driver private data hooked off of each SCSI request, and the pridate data won't be freed after scsi_queue_rq() returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver (e.g. dm-rq) may need to retry these SCSI requests, before SCSI has fully dispatched them, due to a lower level SCSI driver's resource limitation identified in scsi_queue_rq(). Currently SCSI's per-request private data is leaked when the upper layer driver (dm-rq) frees and then retries these requests in response to BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE returns from scsi_queue_rq(). This usecase is so specialized that it doesn't warrant training an existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to account for freeing its driver private data -- doing so would add an extra branch for handling a special case that all other consumers of SCSI (and blk-mq) won't ever need to worry about. So the most pragmatic way forward is to delegate freeing SCSI driver private data to the upper layer driver (dm-rq). Do so by adding new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method from dm-rq. A following commit will implement the .cleanup_rq() hook in scsi_mq_ops. Cc: Ewan D. Milne <emilne@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: <stable@vger.kernel.org> Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mmc: core: Add helper function to indicate if SDIO IRQs is enabledUlf Hansson2019-10-051-0/+9
| | | | | | | | | | | | | | | | | | | [ Upstream commit bd880b00697befb73eff7220ee20bdae4fdd487b ] To avoid each host driver supporting SDIO IRQs, from keeping track internally about if SDIO IRQs has been claimed, let's introduce a common helper function, sdio_irq_claimed(). The function returns true if SDIO IRQs are claimed, via using the information about the number of claimed irqs. This is safe, even without any locks, as long as the helper function is called only from runtime/system suspend callbacks of the host driver. Tested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* kprobes: Prohibit probing on BUG() and WARN() addressMasami Hiramatsu2019-10-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e336b4027775cb458dc713745e526fa1a1996b2a ] Since BUG() and WARN() may use a trap (e.g. UD2 on x86) to get the address where the BUG() has occurred, kprobes can not do single-step out-of-line that instruction. So prohibit probing on such address. Without this fix, if someone put a kprobe on WARN(), the kernel will crash with invalid opcode error instead of outputing warning message, because kernel can not find correct bug address. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S . Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N . Rao <naveen.n.rao@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/156750890133.19112.3393666300746167111.stgit@devnote2 Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* IB/core: Add an unbound WQ type to the new CQ APIJack Morgenstein2019-10-011-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit f794809a7259dfaa3d47d90ef5a86007cf48b1ce upstream. The upstream kernel commit cited below modified the workqueue in the new CQ API to be bound to a specific CPU (instead of being unbound). This caused ALL users of the new CQ API to use the same bound WQ. Specifically, MAD handling was severely delayed when the CPU bound to the WQ was busy handling (higher priority) interrupts. This caused a delay in the MAD "heartbeat" response handling, which resulted in ports being incorrectly classified as "down". To fix this, add a new "unbound" WQ type to the new CQ API, so that users have the option to choose either a bound WQ or an unbound WQ. For MADs, choose the new "unbound" WQ. Fixes: b7363e67b23e ("IB/device: Convert ib-comp-wq to be CPU-bound") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.m> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_infoJuliana Rodrigueiro2019-09-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 89a26cd4b501e9511d3cd3d22327fc76a75a38b3 ] When running a 64-bit kernel with a 32-bit iptables binary, the size of the xt_nfacct_match_info struct diverges. kernel: sizeof(struct xt_nfacct_match_info) : 40 iptables: sizeof(struct xt_nfacct_match_info)) : 36 Trying to append nfacct related rules results in an unhelpful message. Although it is suggested to look for more information in dmesg, nothing can be found there. # iptables -A <chain> -m nfacct --nfacct-name <acct-object> iptables: Invalid argument. Run `dmesg' for more information. This patch fixes the memory misalignment by enforcing 8-byte alignment within the struct's first revision. This solution is often used in many other uapi netfilter headers. Signed-off-by: Juliana Rodrigueiro <juliana.rodrigueiro@intra2net.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* udp: correct reuseport selection with connected socketsWillem de Bruijn2019-09-211-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit acdcecc61285faed359f1a3568c32089cc3a8329 ] UDP reuseport groups can hold a mix unconnected and connected sockets. Ensure that connections only receive all traffic to their 4-tuple. Fast reuseport returns on the first reuseport match on the assumption that all matches are equal. Only if connections are present, return to the previous behavior of scoring all sockets. Record if connections are present and if so (1) treat such connected sockets as an independent match from the group, (2) only return 2-tuple matches from reuseport and (3) do not return on the first 2-tuple reuseport match to allow for a higher scoring match later. New field has_conns is set without locks. No other fields in the bitmap are modified at runtime and the field is only ever set unconditionally, so an RMW cannot miss a change. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* isdn/capi: check message length in capi_write()Eric Biggers2019-09-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit fe163e534e5eecdfd7b5920b0dfd24c458ee85d6 ] syzbot reported: BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313 capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 do_loop_readv_writev fs/read_write.c:703 [inline] do_iter_write+0x83e/0xd80 fs/read_write.c:961 vfs_writev fs/read_write.c:1004 [inline] do_writev+0x397/0x840 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 [...] The problem is that capi_write() is reading past the end of the message. Fix it by checking the message's length in the needed places. Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* gpio: don't WARN() on NULL descs if gpiolib is disabledBartosz Golaszewski2019-09-161-31/+31
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ffe0bbabb0cffceceae07484fde1ec2a63b1537c ] If gpiolib is disabled, we use the inline stubs from gpio/consumer.h instead of regular definitions of GPIO API. The stubs for 'optional' variants of gpiod_get routines return NULL in this case as if the relevant GPIO wasn't found. This is correct so far. Calling other (non-gpio_get) stubs from this header triggers a warning because the GPIO descriptor couldn't have been requested. The warning however is unconditional (WARN_ON(1)) and is emitted even if the passed descriptor pointer is NULL. We don't want to force the users of 'optional' gpio_get to check the returned pointer before calling e.g. gpiod_set_value() so let's only WARN on non-NULL descriptors. Cc: stable@vger.kernel.org Reported-by: Claus H. Stovgaard <cst@phaseone.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* dm mpath: fix missing call of path selector type->end_ioYufen Yu2019-09-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b ] After commit 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback"), map_request() will requeue the tio when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. Thus, if device driver status is error, a tio may be requeued multiple times until the return value is not DM_MAPIO_REQUEUE. That means type->start_io may be called multiple times, while type->end_io is only called when IO complete. In fact, even without commit 396eaf21ee17, setup_clone() failure can also cause tio requeue and associated missed call to type->end_io. The service-time path selector selects path based on in_flight_size, which is increased by st_start_io() and decreased by st_end_io(). Missed calls to st_end_io() can lead to in_flight_size count error and will cause the selector to make the wrong choice. In addition, queue-length path selector will also be affected. To fix the problem, call type->end_io in ->release_clone_rq before tio requeue. map_info is passed to ->release_clone_rq() for map_request() error path that result in requeue. Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Cc: stable@vger.kernl.org Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* drm/vblank: Allow dynamic per-crtc max_vblank_countVille Syrjälä2019-09-162-1/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ed20151a7699bb2c77eba3610199789a126940c4 ] On i965gm we need to adjust max_vblank_count dynamically depending on whether the TV encoder is used or not. To that end add a per-crtc max_vblank_count that takes precedence over its device wide counterpart. The driver can now call drm_crtc_set_max_vblank_count() to configure the per-crtc value before calling drm_vblank_on(). Also looks like there was some discussion about exynos needing similar treatment. v2: Drop the extra max_vblank_count!=0 check for the WARN(last!=current), will take care of it in i915 code (Daniel) WARN_ON(!inmodeset) (Daniel) WARN_ON(dev->max_vblank_count) Pimp up the docs (Daniel) Cc: stable@vger.kernel.org Cc: Inki Dae <inki.dae@samsung.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181127182004.28885-1-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sasha Levin <sashal@kernel.org>
* keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.hDavid Howells2019-09-161-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2ecefa0a15fd0ef88b9cd5d15ceb813008136431 ] The keyctl_dh_params struct in uapi/linux/keyctl.h contains the symbol "private" which means that the header file will cause compilation failure if #included in to a C++ program. Further, the patch that added the same struct to the keyutils package named the symbol "priv", not "private". The previous attempt to fix this (commit 8a2336e549d3) did so by simply renaming the kernel's copy of the field to dh_private, but this then breaks existing userspace and as such has been reverted (commit 8c0f9f5b309d). [And note, to those who think that wrapping the struct in extern "C" {} will work: it won't; that only changes how symbol names are presented to the assembler and linker.]. Instead, insert an anonymous union around the "private" member and add a second member in there with the name "priv" to match the one in the keyutils package. The "private" member is then wrapped in !__cplusplus cpp-conditionals to hide it from C++. Fixes: ddbb41148724 ("KEYS: Add KEYCTL_DH_COMPUTE command") Fixes: 8a2336e549d3 ("uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name") Signed-off-by: David Howells <dhowells@redhat.com> cc: Randy Dunlap <rdunlap@infradead.org> cc: Lubomir Rintel <lkundrak@v3.sk> cc: James Morris <jmorris@namei.org> cc: Mat Martineau <mathew.j.martineau@linux.intel.com> cc: Stephan Mueller <smueller@chronox.de> cc: Andrew Morton <akpm@linux-foundation.org> cc: Linus Torvalds <torvalds@linux-foundation.org> cc: stable@vger.kernel.org Signed-off-by: James Morris <james.morris@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* media: cec/v4l2: move V4L2 specific CEC functions to V4L2Hans Verkuil2019-09-162-80/+6
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 9cfd2753f8f3923f89cbb15f940f3aa0e7202d3e ] Several CEC functions are actually specific for use with receivers, i.e. they should be part of the V4L2 subsystem, not CEC. These functions deal with validating and modifying EDIDs for (HDMI) receivers, and they do not actually have anything to do with the CEC subsystem and whether or not CEC is enabled. The problem was that if the CEC_CORE config option was not set, then these functions would become stubs, but that's not right: they should always be valid. So replace the cec_ prefix by v4l2_ and move them to v4l2-dv-timings.c. Update all drivers that call these accordingly. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Reported-by: Lars-Peter Clausen <lars@metafoo.de> Cc: <stable@vger.kernel.org> # for v4.17 and up Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* {nl,mac}80211: fix interface combinations on crypto controlled devicesManikanta Pubbisetty2019-09-161-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e6f4051123fd33901e9655a675b22aefcdc5d277 ] Commit 33d915d9e8ce ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices") has introduced a change which allows 4addr operation on crypto controlled devices (ex: ath10k). This change has inadvertently impacted the interface combinations logic on such devices. General rule is that software interfaces like AP/VLAN should not be listed under supported interface combinations and should not be considered during validation of these combinations; because of the aforementioned change, AP/VLAN interfaces(if present) will be checked against interfaces supported by the device and blocks valid interface combinations. Consider a case where an AP and AP/VLAN are up and running; when a second AP device is brought up on the same physical device, this AP will be checked against the AP/VLAN interface (which will not be part of supported interface combinations of the device) and blocks second AP to come up. Add a new API cfg80211_iftype_allowed() to fix the problem, this API works for all devices with/without SW crypto control. Signed-off-by: Manikanta Pubbisetty <mpubbise@codeaurora.org> Fixes: 33d915d9e8ce ("{nl,mac}80211: allow 4addr AP operation on crypto controlled devices") Link: https://lore.kernel.org/r/1563779690-9716-1-git-send-email-mpubbise@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* libceph: allow ceph_buffer_put() to receive a NULL ceph_bufferLuis Henriques2019-09-101-1/+2
| | | | | | | | | [ Upstream commit 5c498950f730aa17c5f8a2cdcb903524e4002ed2 ] Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* gpio: Fix build error of function redefinitionYueHaibing2019-09-101-24/+0
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 68e03b85474a51ec1921b4d13204782594ef7223 ] when do randbuilding, I got this error: In file included from drivers/hwmon/pmbus/ucd9000.c:19:0: ./include/linux/gpio/driver.h:576:1: error: redefinition of gpiochip_add_pin_range gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/hwmon/pmbus/ucd9000.c:18:0: ./include/linux/gpio.h:245:1: note: previous definition of gpiochip_add_pin_range was here gpiochip_add_pin_range(struct gpio_chip *chip, const char *pinctl_name, ^~~~~~~~~~~~~~~~~~~~~~ Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 964cb341882f ("gpio: move pincontrol calls to <linux/gpio/driver.h>") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20190731123814.46624-1-yuehaibing@huawei.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* netfilter: nf_tables: use-after-free in failing rule with bound setPablo Neira Ayuso2019-09-101-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6a0a8d10a3661a036b55af695542a714c429ab7c ] If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* net_sched: fix a NULL pointer deref in ipt actionCong Wang2019-09-101-1/+3
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 981471bd3abf4d572097645d765391533aac327d ] The net pointer in struct xt_tgdtor_param is not explicitly initialized therefore is still NULL when dereferencing it. So we have to find a way to pass the correct net pointer to ipt_destroy_target(). The best way I find is just saving the net pointer inside the per netns struct tcf_idrinfo, which could make this patch smaller. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Reported-and-tested-by: itugrok@yahoo.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: sched: act_sample: fix psample group handling on overwriteVlad Buslov2019-09-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit dbf47a2a094edf58983265e323ca4bdcdb58b5ee ] Action sample doesn't properly handle psample_group pointer in overwrite case. Following issues need to be fixed: - In tcf_sample_init() function RCU_INIT_POINTER() is used to set s->psample_group, even though we neither setting the pointer to NULL, nor preventing concurrent readers from accessing the pointer in some way. Use rcu_swap_protected() instead to safely reset the pointer. - Old value of s->psample_group is not released or deallocated in any way, which results resource leak. Use psample_group_put() on non-NULL value obtained with rcu_swap_protected(). - The function psample_group_put() that released reference to struct psample_group pointed by rcu-pointer s->psample_group doesn't respect rcu grace period when deallocating it. Extend struct psample_group with rcu head and use kfree_rcu when freeing it. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Pass error information to the pgio error cleanup routineTrond Myklebust2019-09-061-1/+1
| | | | | | | | | | | [ Upstream commit df3accb849607a86278a37c35e6b313635ccc48b ] Allow the caller to pass error information when cleaning up a failed I/O request so that we can conditionally take action to cancel the request altogether if the error turned out to be fatal. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* NFS: Clean up list moves of struct nfs_pageTrond Myklebust2019-09-061-0/+10
| | | | | | | | | | | [ Upstream commit 078b5fd92c4913dd367361db6c28568386077c89 ] In several places we're just moving the struct nfs_page from one list to another by first removing from the existing list, then adding to the new one. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* lib: logic_pio: Add logic_pio_unregister_range()John Garry2019-09-061-0/+1
| | | | | | | | | | | | | | | | | commit b884e2de2afc68ce30f7093747378ef972dde253 upstream. Add a function to unregister a logical PIO range. Logical PIO space can still be leaked when unregistering certain LOGIC_PIO_CPU_MMIO regions, but this acceptable for now since there are no callers to unregister LOGIC_PIO_CPU_MMIO regions, and the logical PIO region allocation scheme would need significant work to improve this. Cc: stable@vger.kernel.org Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Wei Xu <xuwei5@hisilicon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* rxrpc: Fix read-after-free in rxrpc_queue_local()David Howells2019-08-291-3/+3
| | | | | | | | | | | | | | | | | | | | commit 06d9532fa6b34f12a6d75711162d47c17c1add72 upstream. rxrpc_queue_local() attempts to queue the local endpoint it is given and then, if successful, prints a trace line. The trace line includes the current usage count - but we're not allowed to look at the local endpoint at this point as we passed our ref on it to the workqueue. Fix this by reading the usage count before queuing the work item. Also fix the reading of local->debug_id for trace lines, which must be done with the same consideration as reading the usage count. Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting") Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm/i915/cfl: Add a new CFL PCI ID.Rodrigo Vivi2019-08-251-0/+1
| | | | | | | | | | | | | | commit d0e062ebb3a44b56a7e672da568334c76f763552 upstream. One more CFL ID added to spec. Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180803232721.20038-1-rodrigo.vivi@intel.com Signed-off-by: Wan Yusof, Wan Fahim AsqalaniX <wan.fahim.asqalanix.wan.yusof@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to blockMarc Zyngier2019-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c upstream. Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* asm-generic: fix -Wtype-limits compiler warningsQian Cai2019-08-251-30/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cbedfe11347fe418621bd188d58a206beb676218 ] Commit d66acc39c7ce ("bitops: Optimise get_order()") introduced a compilation warning because "rx_frag_size" is an "ushort" while PAGE_SHIFT here is 16. The commit changed the get_order() to be a multi-line macro where compilers insist to check all statements in the macro even when __builtin_constant_p(rx_frag_size) will return false as "rx_frag_size" is a module parameter. In file included from ./arch/powerpc/include/asm/page_64.h:107, from ./arch/powerpc/include/asm/page.h:242, from ./arch/powerpc/include/asm/mmu.h:132, from ./arch/powerpc/include/asm/lppaca.h:47, from ./arch/powerpc/include/asm/paca.h:17, from ./arch/powerpc/include/asm/current.h:13, from ./include/linux/thread_info.h:21, from ./arch/powerpc/include/asm/processor.h:39, from ./include/linux/prefetch.h:15, from drivers/net/ethernet/emulex/benet/be_main.c:14: drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create': ./include/asm-generic/getorder.h:54:9: warning: comparison is always true due to limited range of data type [-Wtype-limits] (((n) < (1UL << PAGE_SHIFT)) ? 0 : \ ^ drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion of macro 'get_order' adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE; ^~~~~~~~~ Fix it by moving all of this multi-line macro into a proper function, and killing __get_order() off. [akpm@linux-foundation.org: remove __get_order() altogether] [cai@lca.pw: v2] Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw Fixes: d66acc39c7ce ("bitops: Optimise get_order()") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Howells <dhowells@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Bill Wendling <morbo@google.com> Cc: James Y Knight <jyknight@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* KVM: Fix leak vCPU's VMCS value into other pCPUWanpeng Li2019-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 17e433b54393a6269acbcb792da97791fe1592d8 upstream. After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting in the VMs after stress testing: INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073) Call Trace: flush_tlb_mm_range+0x68/0x140 tlb_flush_mmu.part.75+0x37/0xe0 tlb_finish_mmu+0x55/0x60 zap_page_range+0x142/0x190 SyS_madvise+0x3cd/0x9c0 system_call_fastpath+0x1c/0x21 swait_active() sustains to be true before finish_swait() is called in kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop greatly increases the probability condition kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv is enabled the yield-candidate vCPU's VMCS RVI field leaks(by vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current VMCS. This patch fixes it by checking conservatively a subset of events. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Marc Zyngier <Marc.Zyngier@arm.com> Cc: stable@vger.kernel.org Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: compress: Fix regression on compressed capture streamsCharles Keepax2019-08-161-4/+1
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 4475f8c4ab7b248991a60d9c02808dbb813d6be8 ] A previous fix to the stop handling on compressed capture streams causes some knock on issues. The previous fix updated snd_compr_drain_notify to set the state back to PREPARED for capture streams. This causes some issues however as the handling for snd_compr_poll differs between the two states and some user-space applications were relying on the poll failing after the stream had been stopped. To correct this regression whilst still fixing the original problem the patch was addressing, update the capture handling to skip the PREPARED state rather than skipping the SETUP state as it has done until now. Fixes: 4f2ab5e1d13d ("ALSA: compress: Fix stop handling on compressed capture streams") Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Acked-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nl80211: fix NL80211_HE_MAX_CAPABILITY_LENJohn Crispin2019-08-161-1/+1
| | | | | | | | | | | | [ Upstream commit 5edaac063bbf1267260ad2a5b9bb803399343e58 ] NL80211_HE_MAX_CAPABILITY_LEN has changed between D2.0 and D4.0. It is now MAC (6) + PHY (11) + MCS (12) + PPE (25) = 54. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20190627095832.19445-1-john@phrozen.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* crypto: ccp - Add support for valid authsize values less than 16Gary R Hook2019-08-161-0/+2
| | | | | | | | | | | | | | | commit 9f00baf74e4b6f79a3a3dfab44fb7bb2e797b551 upstream. AES GCM encryption allows for authsize values of 4, 8, and 12-16 bytes. Validate the requested authsize, and retain it to save in the request context. Fixes: 36cf515b9bbe2 ("crypto: ccp - Enable support for AES GCM on v5 CCPs") Cc: <stable@vger.kernel.org> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cgroup: Include dying leaders with live threads in PROCS iterationsTejun Heo2019-08-092-0/+2
| | | | | | | | | | | | | | | | | | commit c03cd7738a83b13739f00546166969342c8ff014 upstream. CSS_TASK_ITER_PROCS currently iterates live group leaders; however, this means that a process with dying leader and live threads will be skipped. IOW, cgroup.procs might be empty while cgroup.threads isn't, which is confusing to say the least. Fix it by making cset track dying tasks and include dying leaders with live threads in PROCS iteration. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Topi Miettinen <toiwoton@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* cgroup: Implement css_task_iter_skip()Tejun Heo2019-08-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | commit b636fd38dc40113f853337a7d2a6885ad23b8811 upstream. When a task is moved out of a cset, task iterators pointing to the task are advanced using the normal css_task_iter_advance() call. This is fine but we'll be tracking dying tasks on csets and thus moving tasks from cset->tasks to (to be added) cset->dying_tasks. When we remove a task from cset->tasks, if we advance the iterators, they may move over to the next cset before we had the chance to add the task back on the dying list, which can allow the task to escape iteration. This patch separates out skipping from advancing. Skipping only moves the affected iterators to the next pointer rather than fully advancing it and the following advancing will recognize that the cursor has already been moved forward and do the rest of advancing. This ensures that when a task moves from one list to another in its cset, as long as it moves in the right direction, it's always visible to iteration. This doesn't cause any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* compat_ioctl: pppoe: fix PPPOEIOCSFWD handlingArnd Bergmann2019-08-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 055d88242a6046a1ceac3167290f054c72571cd9 ] Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/mlx5e: Prevent encap flow counter update async to user queryAriel Levkovich2019-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 90bb769291161cf25a818d69cf608c181654473e ] This patch prevents a race between user invoked cached counters query and a neighbor last usage updater. The cached flow counter stats can be queried by calling "mlx5_fc_query_cached" which provides the number of bytes and packets that passed via this flow since the last time this counter was queried. It does so by reducting the last saved stats from the current, cached stats and then updating the last saved stats with the cached stats. It also provide the lastuse value for that flow. Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the last usage time of encapsulation flows, it calls the flow counter query method periodically and async to user queries of the flow counter using cls_flower. This call is causing the driver to update the last reported bytes and packets from the cache and therefore, future user queries of the flow stats will return lower than expected number for bytes and packets since the last saved stats in the driver was updated async to the last saved stats in cls_flower. This causes wrong stats presentation of encapsulation flows to user. Since the neighbor usage updater only needs the lastuse stats from the cached counter, the fix is to use a dedicated lastuse query call that returns the lastuse value without synching between the cached stats and the last saved stats. Fixes: f6dfb4c3f216 ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/mlx5: Fix modify_cq_in alignmentEdward Srouji2019-08-091-1/+6
| | | | | | | | | | | | | | | [ Upstream commit 7a32f2962c56d9d8a836b4469855caeee8766bd4 ] Fix modify_cq_in alignment to match the device specification. After this fix the 'cq_umem_valid' field will be in the right offset. Cc: <stable@vger.kernel.org> # 4.19 Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits") Signed-off-by: Edward Srouji <edwards@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drivers/base: Introduce kill_device()Dan Williams2019-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 00289cd87676e14913d2d8492d1ce05c4baafdae upstream. The libnvdimm subsystem arranges for devices to be destroyed as a result of a sysfs operation. Since device_unregister() cannot be called from an actively running sysfs attribute of the same device libnvdimm arranges for device_unregister() to be performed in an out-of-line async context. The driver core maintains a 'dead' state for coordinating its own racing async registration / de-registration requests. Rather than add local 'dead' state tracking infrastructure to libnvdimm device objects, export the existing state tracking via a new kill_device() helper. The kill_device() helper simply marks the device as dead, i.e. that it is on its way to device_del(), or returns that the device was already dead. This can be used in advance of calling device_unregister() for subsystems like libnvdimm that might need to handle multiple user threads racing to delete a device. This refactoring does not change any behavior, but it is a pre-requisite for follow-on fixes and therefore marked for -stable. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...") Cc: <stable@vger.kernel.org> Tested-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* scsi: fcoe: Embed fc_rport_priv in fcoe_rport structureHannes Reinecke2019-08-091-0/+1
| | | | | | | | | | | | | | | | commit 023358b136d490ca91735ac6490db3741af5a8bd upstream. Gcc-9 complains for a memset across pointer boundaries, which happens as the code tries to allocate a flexible array on the stack. Turns out we cannot do this without relying on gcc-isms, so with this patch we'll embed the fc_rport_priv structure into fcoe_rport, can use the normal 'container_of' outcast, and will only have to do a memset over one structure. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side ↵Mikko Rapeli2019-08-062-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | headers [ Upstream commit f90fb3c7e2c13ae829db2274b88b845a75038b8a ] Only users of upc_req in kernel side fs/coda/psdev.c and fs/coda/upcall.c already include linux/coda_psdev.h. Suggested by Jan Harkes <jaharkes@cs.cmu.edu> in https://lore.kernel.org/lkml/20150531111913.GA23377@cs.cmu.edu/ Fixes these include/uapi/linux/coda_psdev.h compilation errors in userspace: linux/coda_psdev.h:12:19: error: field `uc_chain' has incomplete type struct list_head uc_chain; ^ linux/coda_psdev.h:13:2: error: unknown type name `caddr_t' caddr_t uc_data; ^ linux/coda_psdev.h:14:2: error: unknown type name `u_short' u_short uc_flags; ^ linux/coda_psdev.h:15:2: error: unknown type name `u_short' u_short uc_inSize; /* Size is at most 5000 bytes */ ^ linux/coda_psdev.h:16:2: error: unknown type name `u_short' u_short uc_outSize; ^ linux/coda_psdev.h:17:2: error: unknown type name `u_short' u_short uc_opcode; /* copied from data to save lookup */ ^ linux/coda_psdev.h:19:2: error: unknown type name `wait_queue_head_t' wait_queue_head_t uc_sleep; /* process' wait queue */ ^ Link: http://lkml.kernel.org/r/9f99f5ce6a0563d5266e6cf7aa9585aac2cae971.1558117389.git.jaharkes@cs.cmu.edu Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Sam Protsenko <semen.protsenko@linaro.org> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Zhouyang Jia <jiazhouyang09@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* coda: fix build using bare-metal toolchainSam Protsenko2019-08-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b2a57e334086602be56b74958d9f29b955cd157f ] The kernel is self-contained project and can be built with bare-metal toolchain. But bare-metal toolchain doesn't define __linux__. Because of this u_quad_t type is not defined when using bare-metal toolchain and codafs build fails. This patch fixes it by defining u_quad_t type unconditionally. Link: http://lkml.kernel.org/r/3cbb40b0a57b6f9923a9d67b53473c0b691a3eaa.1558117389.git.jaharkes@cs.cmu.edu Signed-off-by: Sam Protsenko <semen.protsenko@linaro.org> Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Colin Ian King <colin.king@canonical.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Mikko Rapeli <mikko.rapeli@iki.fi> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Zhouyang Jia <jiazhouyang09@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* ACPI: fix false-positive -Wuninitialized warningArnd Bergmann2019-08-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit dfd6f9ad36368b8dbd5f5a2b2f0a4705ae69a323 ] clang gets confused by an uninitialized variable in what looks to it like a never executed code path: arch/x86/kernel/acpi/boot.c:618:13: error: variable 'polarity' is uninitialized when used here [-Werror,-Wuninitialized] polarity = polarity ? ACPI_ACTIVE_LOW : ACPI_ACTIVE_HIGH; ^~~~~~~~ arch/x86/kernel/acpi/boot.c:606:32: note: initialize the variable 'polarity' to silence this warning int rc, irq, trigger, polarity; ^ = 0 arch/x86/kernel/acpi/boot.c:617:12: error: variable 'trigger' is uninitialized when used here [-Werror,-Wuninitialized] trigger = trigger ? ACPI_LEVEL_SENSITIVE : ACPI_EDGE_SENSITIVE; ^~~~~~~ arch/x86/kernel/acpi/boot.c:606:22: note: initialize the variable 'trigger' to silence this warning int rc, irq, trigger, polarity; ^ = 0 This is unfortunately a design decision in clang and won't be fixed. Changing the acpi_get_override_irq() macro to an inline function reliably avoids the issue. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* block, scsi: Change the preempt-only flag into a counterBart Van Assche2019-08-041-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cd84a62e0078dce09f4ed349bec84f86c9d54b30 upstream. The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sched/fair: Use RCU accessors consistently for ->numa_groupJann Horn2019-08-041-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | commit cb361d8cdef69990f6b4504dc1fd9a594d983c97 upstream. The old code used RCU annotations and accessors inconsistently for ->numa_group, which can lead to use-after-frees and NULL dereferences. Let all accesses to ->numa_group use proper RCU helpers to prevent such issues. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults") Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sched/fair: Don't free p->numa_faults with concurrent readersJann Horn2019-08-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 16d51a590a8ce3befb1308e0e7ab77f3b661af33 upstream. When going through execve(), zero out the NUMA fault statistics instead of freeing them. During execve, the task is reachable through procfs and the scheduler. A concurrent /proc/*/sched reader can read data from a freed ->numa_faults allocation (confirmed by KASAN) and write it back to userspace. I believe that it would also be possible for a use-after-free read to occur through a race between a NUMA fault and execve(): task_numa_fault() can lead to task_numa_compare(), which invokes task_weight() on the currently running task of a different CPU. Another way to fix this would be to make ->numa_faults RCU-managed or add extra locking, but it seems easier to wipe the NUMA fault statistics on execve. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Fixes: 82727018b0d3 ("sched/numa: Call task_numa_free() from do_execve()") Link: https://lkml.kernel.org/r/20190716152047.14424-1-jannh@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVAJoerg Roedel2019-08-041-1/+1
| | | | | | | | | | | | | commit 201c1db90cd643282185a00770f12f95da330eca upstream. The stub function for !CONFIG_IOMMU_IOVA needs to be 'static inline'. Fixes: effa467870c76 ('iommu/vt-d: Don't queue_iova() if there is no flush queue') Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* iommu/vt-d: Don't queue_iova() if there is no flush queueDmitry Safonov2019-08-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit effa467870c7612012885df4e246bdb8ffd8e44c upstream. Intel VT-d driver was reworked to use common deferred flushing implementation. Previously there was one global per-cpu flush queue, afterwards - one per domain. Before deferring a flush, the queue should be allocated and initialized. Currently only domains with IOMMU_DOMAIN_DMA type initialize their flush queue. It's probably worth to init it for static or unmanaged domains too, but it may be arguable - I'm leaving it to iommu folks. Prevent queuing an iova flush if the domain doesn't have a queue. The defensive check seems to be worth to keep even if queue would be initialized for all kinds of domains. And is easy backportable. On 4.19.43 stable kernel it has a user-visible effect: previously for devices in si domain there were crashes, on sata devices: BUG: spinlock bad magic on CPU#6, swapper/0/1 lock: 0xffff88844f582008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #1 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x45/0x115 intel_unmap+0x107/0x113 intel_unmap_sg+0x6b/0x76 __ata_qc_complete+0x7f/0x103 ata_qc_complete+0x9b/0x26a ata_qc_complete_multiple+0xd0/0xe3 ahci_handle_port_interrupt+0x3ee/0x48a ahci_handle_port_intr+0x73/0xa9 ahci_single_level_irq_intr+0x40/0x60 __handle_irq_event_percpu+0x7f/0x19a handle_irq_event_percpu+0x32/0x72 handle_irq_event+0x38/0x56 handle_edge_irq+0x102/0x121 handle_irq+0x147/0x15c do_IRQ+0x66/0xf2 common_interrupt+0xf/0xf RIP: 0010:__do_softirq+0x8c/0x2df The same for usb devices that use ehci-pci: BUG: spinlock bad magic on CPU#0, swapper/0/1 lock: 0xffff88844f402008, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.43 #4 Call Trace: <IRQ> dump_stack+0x61/0x7e spin_bug+0x9d/0xa3 do_raw_spin_lock+0x22/0x8e _raw_spin_lock_irqsave+0x32/0x3a queue_iova+0x77/0x145 intel_unmap+0x107/0x113 intel_unmap_page+0xe/0x10 usb_hcd_unmap_urb_setup_for_dma+0x53/0x9d usb_hcd_unmap_urb_for_dma+0x17/0x100 unmap_urb_for_dma+0x22/0x24 __usb_hcd_giveback_urb+0x51/0xc3 usb_giveback_urb_bh+0x97/0xde tasklet_action_common.isra.4+0x5f/0xa1 tasklet_action+0x2d/0x30 __do_softirq+0x138/0x2df irq_exit+0x7d/0x8b smp_apic_timer_interrupt+0x10f/0x151 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x17/0x39 Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: iommu@lists.linux-foundation.org Cc: <stable@vger.kernel.org> # 4.14+ Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Dmitry Safonov <dima@arista.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> [v4.14-port notes: o minor conflict with untrusted IOMMU devices check under if-condition] Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* access: avoid the RCU grace period for the temporary subjective credentialsLinus Torvalds2019-07-311-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream. It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU work because it installs a temporary credential that gets allocated and freed for each system call. The allocation and freeing overhead is mostly benign, but because credentials can be accessed under the RCU read lock, the freeing involves a RCU grace period. Which is not a huge deal normally, but if you have a lot of access() calls, this causes a fair amount of seconday damage: instead of having a nice alloc/free patterns that hits in hot per-CPU slab caches, you have all those delayed free's, and on big machines with hundreds of cores, the RCU overhead can end up being enormous. But it turns out that all of this is entirely unnecessary. Exactly because access() only installs the credential as the thread-local subjective credential, the temporary cred pointer doesn't actually need to be RCU free'd at all. Once we're done using it, we can just free it synchronously and avoid all the RCU overhead. So add a 'non_rcu' flag to 'struct cred', which can be set by users that know they only use it in non-RCU context (there are other potential users for this). We can make it a union with the rcu freeing list head that we need for the RCU case, so this doesn't need any extra storage. Note that this also makes 'get_current_cred()' clear the new non_rcu flag, in case we have filesystems that take a long-term reference to the cred and then expect the RCU delayed freeing afterwards. It's not entirely clear that this is required, but it makes for clear semantics: the subjective cred remains non-RCU as long as you only access it synchronously using the thread-local accessors, but you _can_ use it as a generic cred if you want to. It is possible that we should just remove the whole RCU markings for ->cred entirely. Only ->real_cred is really supposed to be accessed through RCU, and the long-term cred copies that nfs uses might want to explicitly re-enable RCU freeing if required, rather than have get_current_cred() do it implicitly. But this is a "minimal semantic changes" change for the immediate problem. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Glauber <jglauber@marvell.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: Greg KH <greg@kroah.com> Cc: Kees Cook <keescook@chromium.org> Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* gpu: host1x: Increase maximum DMA segment sizeThierry Reding2019-07-311-0/+2
| | | | | | | | | | | | | | | | | | | [ Upstream commit 1e390478cfb527e34c9ab89ba57212cb05c33c51 ] Recent versions of the DMA API debug code have started to warn about violations of the maximum DMA segment size. This is because the segment size defaults to 64 KiB, which can easily be exceeded in large buffer allocations such as used in DRM/KMS for framebuffers. Technically the Tegra SMMU and ARM SMMU don't have a maximum segment size (they map individual pages irrespective of whether they are contiguous or not), so the choice of 4 MiB is a bit arbitrary here. The maximum segment size is a 32-bit unsigned integer, though, so we can't set it to the correct maximum size, which would be the size of the aperture. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* jbd2: introduce jbd2_inode dirty range scopingRoss Zwisler2019-07-281-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6ba0e7dc64a5adcda2fbe65adc466891795d639e upstream. Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mm: add filemap_fdatawait_range_keep_errors()Ross Zwisler2019-07-281-0/+2
| | | | | | | | | | | | | | | | commit aa0bfcd939c30617385ffa28682c062d78050eba upstream. In the spirit of filemap_fdatawait_range() and filemap_fdatawait_keep_errors(), introduce filemap_fdatawait_range_keep_errors() which both takes a range upon which to wait and does not clear errors from the address space. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* perf/core: Fix exclusive events' groupingAlexander Shishkin2019-07-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8a58ddae23796c733c5dfbd717538d89d036c5bd upstream. So far, we tried to disallow grouping exclusive events for the fear of complications they would cause with moving between contexts. Specifically, moving a software group to a hardware context would violate the exclusivity rules if both groups contain matching exclusive events. This attempt was, however, unsuccessful: the check that we have in the perf_event_open() syscall is both wrong (looks at wrong PMU) and insufficient (group leader may still be exclusive), as can be illustrated by running: $ perf record -e '{intel_pt//,cycles}' uname $ perf record -e '{cycles,intel_pt//}' uname ultimately successfully. Furthermore, we are completely free to trigger the exclusivity violation by: perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}' even though the helpful perf record will not allow that, the ABI will. The warning later in the perf_event_open() path will also not trigger, because it's also wrong. Fix all this by validating the original group before moving, getting rid of broken safeguards and placing a useful one to perf_install_in_context(). Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: mathieu.poirier@linaro.org Cc: will.deacon@arm.com Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events") Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/tls: make sure offload also gets the keys wipedJakub Kicinski2019-07-281-0/+1
| | | | | | | | | | | | | | | [ Upstream commit acd3e96d53a24d219f720ed4012b62723ae05da1 ] Commit 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") added memzero_explicit() calls to clear the key material before freeing struct tls_context, but it missed tls_device.c has its own way of freeing this structure. Replace the missing free. Fixes: 86029d10af18 ("tls: zero the crypto information from tls_context before freeing") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>