summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/tm: Fix oops on sigreturn on systems without TMMichael Neuling2019-07-312-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream. On systems like P9 powernv where we have no TM (or P8 booted with ppc_tm=off), userspace can construct a signal context which still has the MSR TS bits set. The kernel tries to restore this context which results in the following crash: Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69 NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000 REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8) MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000 CFAR: c0000000000022e0 IRQMASK: 0 GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669 GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000 GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420 GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000 GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000 GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728 NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80 LR [00007fffb2d67e48] 0x7fffb2d67e48 Call Trace: Instruction dump: e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00 e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18 The problem is the signal code assumes TM is enabled when CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as with P9 powernv or if `ppc_tm=off` is used on P8. This means any local user can crash the system. Fix the problem by returning a bad stack frame to the user if they try to set the MSR TS bits with sigreturn() on systems where TM is not supported. Found with sigfuz kernel selftest on P9. This fixes CVE-2019-13648. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9 Reported-by: Praveen Pandey <Praveen.Pandey@in.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/mm: Limit rma_size to 1TB when running without HV modeSuraj Jitindar Singh2019-07-311-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit da0ef93310e67ae6902efded60b6724dab27a5d1 upstream. The virtual real mode addressing (VRMA) mechanism is used when a partition is using HPT (Hash Page Table) translation and performs real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode effective address bits 0:23 are treated as zero (i.e. the access is aliased to 0) and the access is performed using an implicit 1TB SLB entry. The size of the RMA (Real Memory Area) is communicated to the guest as the size of the first memory region in the device tree. And because of the mechanism described above can be expected to not exceed 1TB. In the event that the host erroneously represents the RMA as being larger than 1TB, guest accesses in real mode to memory addresses above 1TB will be aliased down to below 1TB. This means that a memory access performed in real mode may differ to one performed in virtual mode for the same memory address, which would likely have unintended consequences. To avoid this outcome have the guest explicitly limit the size of the RMA to the current maximum, which is 1TB. This means that even if the first memory block is larger than 1TB, only the first 1TB should be accessed in real mode. Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190710052018.14628-1-sjitindarsingh@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask()Gautham R. Shenoy2019-07-311-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4d202c8c8ed3822327285747db1765967110b274 upstream. xive_find_target_in_mask() has the following for(;;) loop which has a bug when @first == cpumask_first(@mask) and condition 1 fails to hold for every CPU in @mask. In this case we loop forever in the for-loop. first = cpu; for (;;) { if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1 return cpu; cpu = cpumask_next(cpu, mask); if (cpu == first) // condition 2 break; if (cpu >= nr_cpu_ids) // condition 3 cpu = cpumask_first(mask); } This is because, when @first == cpumask_first(@mask), we never hit the condition 2 (cpu == first) since prior to this check, we would have executed "cpu = cpumask_next(cpu, mask)" which will set the value of @cpu to a value greater than @first or to nr_cpus_ids. When this is coupled with the fact that condition 1 is not met, we will never exit this loop. This was discovered by the hard-lockup detector while running LTP test concurrently with SMT switch tests. watchdog: CPU 12 detected hard LOCKUP on other CPUs 68 watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago) watchdog: CPU 68 Hard LOCKUP watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago) CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1 NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000 REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le) MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000 CFAR: c0000000006f558c IRQMASK: 1 GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18 GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8 GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8 GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001 GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18 GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0 GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f NIP [c0000000006f5578] find_next_bit+0x38/0x90 LR [c000000000cba9ec] cpumask_next+0x2c/0x50 Call Trace: [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable) [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240 [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0 [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260 [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0 [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0 [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90 [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220 [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x] [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x] [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0 [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50 [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0 [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0 [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70 [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160 [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70 Instruction dump: 78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039 4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040 To fix this, move the check for condition 2 after the check for condition 3, so that we are able to break out of the loop soon after iterating through all the CPUs in the @mask in the problem case. Use do..while() to achieve this. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Reported-by: Indira P. Joga <indira.priya@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* powerpc/dma: Fix invalid DMA mmap behaviorShawn Anastasio2019-07-313-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b4fc36e60f25cf22bf8b7b015a701015740c3743 upstream. The refactor of powerpc DMA functions in commit 6666cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly changes the way DMA mappings are handled on powerpc. Since this change, all mapped pages are marked as cache-inhibited through the default implementation of arch_dma_mmap_pgprot. This differs from the previous behavior of only marking pages in noncoherent mappings as cache-inhibited and has resulted in sporadic system crashes in certain hardware configurations and workloads (see Bugzilla). This commit restores the previous correct behavior by providing an implementation of arch_dma_mmap_pgprot that only marks pages in noncoherent mappings as cache-inhibited. As this behavior should be universal for all powerpc platforms a new file, dma-generic.c, was created to store it. Fixes: 6666cc17d780 ("powerpc/dma: remove dma_nommu_mmap_coherent") # NOTE: fixes commit 6666cc17d780 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ # NOTE: fixes commit 6666cc17d780 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Shawn Anastasio <shawn@anastas.io> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: hda - Add a conexant codec entry to let mute led workHui Wang2019-07-311-0/+1
| | | | | | | | | | | | | | | | commit 3f8809499bf02ef7874254c5e23fc764a47a21a0 upstream. This conexant codec isn't in the supported codec list yet, the hda generic driver can drive this codec well, but on a Lenovo machine with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI to make the leds work. After adding this codec to the list, the driver patch_conexant.c will apply THINKPAD_ACPI to this machine. Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chipsTakashi Iwai2019-07-311-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 2756d9143aa517b97961e85412882b8ce31371a6 upstream. It turned out that the recent Intel HD-audio controller chips show a significant stall during the system PM resume intermittently. It doesn't happen so often and usually it may read back successfully after one or more seconds, but in some rare worst cases the driver went into fallback mode. After trial-and-error, we found out that the communication stall seems covered by issuing the sync after each verb write, as already done for AMD and other chipsets. So this patch enables the write-sync flag for the recent Intel chips, Skylake and onward, as a workaround. Also, since Broxton and co have the very same driver flags as Skylake, refer to the Skylake driver flags instead of defining the same contents again for simplification. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201901 Reported-and-tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: pcm: Fix refcount_inc() on zero usageTakashi Iwai2019-07-311-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0e279dcea0ec897af1c979ebee4ec92b461793f5 upstream. The recent rewrite of PCM link lock management introduced the refcount in snd_pcm_group object, managed by the kernel refcount_t API. This caused unexpected kernel warnings when the kernel is built with CONFIG_REFCOUNT_FULL=y. As the warning line indicates, the problem is obviously that we start with refcount=0 and do refcount_inc() for adding each PCM link, while refcount_t API doesn't like refcount_inc() performed on zero. For adapting the proper refcount_t usage, this patch changes the logic slightly: - The initial refcount is 1, assuming the single list entry - The refcount is incremented / decremented at each PCM link addition and deletion - ... which allows us concentrating only on the refcount as a release condition Fixes: f57f3df03a8e ("ALSA: pcm: More fine-grained PCM link locking") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204221 Reported-and-tested-by: Duncan Overbruck <kernel@duncano.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1Kai-Heng Feng2019-07-311-1/+1
| | | | | | | | | | | | | | | | | commit 70256b42caaf3e13c2932c2be7903a73fbe8bb8b upstream. Commit 7b9584fa1c0b ("staging: line6: Move altsetting to properties") set a wrong altsetting for LINE6_PODHD500_1 during refactoring. Set the correct altsetting number to fix the issue. BugLink: https://bugs.launchpad.net/bugs/1790595 Fixes: 7b9584fa1c0b ("staging: line6: Move altsetting to properties") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ALSA: ac97: Fix double free of ac97_codec_deviceDing Xiang2019-07-311-9/+4
| | | | | | | | | | | | | | | commit 607975b30db41aad6edc846ed567191aa6b7d893 upstream. put_device will call ac97_codec_release to free ac97_codec_device and other resources, so remove the kfree and other redundant code. Fixes: 74426fbff66e ("ALSA: ac97: add an ac97 bus") Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm/panel: Add support for Armadeus ST0700 AdaptSébastien Szymanski2019-07-312-0/+38
| | | | | | | | | | | | | | | | commit c479450f61c7f1f248c9a54aedacd2a6ca521ff8 upstream. This patch adds support for the Armadeus ST0700 Adapt. It comes with a Santek ST0700I5Y-RBSLW 7.0" WVGA (800x480) TFT and an adapter board so that it can be connected on the TFT header of Armadeus Dev boards. Cc: stable@vger.kernel.org # v4.19 Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Sébastien Szymanski <sebastien.szymanski@armadeus.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190507152713.27494-1-sebastien.szymanski@armadeus.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* hpet: Fix division by zero in hpet_time_div()Kefeng Wang2019-07-311-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0c7d37f4d9b8446956e97b7c5e61173cdb7c8522 upstream. The base value in do_div() called by hpet_time_div() is truncated from unsigned long to uint32_t, resulting in a divide-by-zero exception. UBSAN: Undefined behaviour in ../drivers/char/hpet.c:572:2 division by zero CPU: 1 PID: 23682 Comm: syz-executor.3 Not tainted 4.4.184.x86_64+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 0000000000000000 b573382df1853d00 ffff8800a3287b98 ffffffff81ad7561 ffff8800a3287c00 ffffffff838b35b0 ffffffff838b3860 ffff8800a3287c20 0000000000000000 ffff8800a3287bb0 ffffffff81b8f25e ffffffff838b35a0 Call Trace: [<ffffffff81ad7561>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff81ad7561>] dump_stack+0xc1/0x120 lib/dump_stack.c:51 [<ffffffff81b8f25e>] ubsan_epilogue+0x12/0x8d lib/ubsan.c:166 [<ffffffff81b900cb>] __ubsan_handle_divrem_overflow+0x282/0x2c8 lib/ubsan.c:262 [<ffffffff823560dd>] hpet_time_div drivers/char/hpet.c:572 [inline] [<ffffffff823560dd>] hpet_ioctl_common drivers/char/hpet.c:663 [inline] [<ffffffff823560dd>] hpet_ioctl_common.cold+0xa8/0xad drivers/char/hpet.c:577 [<ffffffff81e63d56>] hpet_ioctl+0xc6/0x180 drivers/char/hpet.c:676 [<ffffffff81711590>] vfs_ioctl fs/ioctl.c:43 [inline] [<ffffffff81711590>] file_ioctl fs/ioctl.c:470 [inline] [<ffffffff81711590>] do_vfs_ioctl+0x6e0/0xf70 fs/ioctl.c:605 [<ffffffff81711eb4>] SYSC_ioctl fs/ioctl.c:622 [inline] [<ffffffff81711eb4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:613 [<ffffffff82846003>] tracesys_phase2+0x90/0x95 The main C reproducer autogenerated by syzkaller, syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); memcpy((void*)0x20000100, "/dev/hpet\000", 10); syscall(__NR_openat, 0xffffffffffffff9c, 0x20000100, 0, 0); syscall(__NR_ioctl, r[0], 0x40086806, 0x40000000000000); Fix it by using div64_ul(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Zhang HongJun <zhanghongjun2@huawei.com> Cc: stable <stable@vger.kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20190711132757.130092-1-wangkefeng.wang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* eeprom: make older eeprom drivers select NVMEM_SYSFSArseny Solokha2019-07-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1b5621832f9bd9899370ea6928462cd02ebe7dc0 upstream. misc/eeprom/{at24,at25,eeprom_93xx46} drivers all register their corresponding devices in the nvmem framework in compat mode which requires nvmem sysfs interface to be present. The latter, however, has been split out from nvmem under a separate Kconfig in commit ae0c2d725512 ("nvmem: core: add NVMEM_SYSFS Kconfig"). As a result, probing certain I2C-attached EEPROMs now fails with at24: probe of 0-0050 failed with error -38 because of a stub implementation of nvmem_sysfs_setup_compat() in drivers/nvmem/nvmem.h. Update the nvmem dependency for these drivers so they could load again: at24 0-0050: 32768 byte 24c256 EEPROM, writable, 64 bytes/write Cc: Adrian Bunk <bunk@kernel.org> Cc: Bartosz Golaszewski <brgl@bgdev.pl> Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru> Link: https://lore.kernel.org/r/20190716111236.27803-1-asolokha@kb.kras.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mei: me: add mule creek canyon (EHL) device idsAlexander Usyskin2019-07-312-0/+6
| | | | | | | | | | | | | commit 1be8624a0cbef720e8da39a15971e01abffc865b upstream. Add Mule Creek Canyon (PCH) MEI device ids for Elkhart Lake (EHL) Platform. Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190712095814.20746-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fpga-manager: altera-ps-spi: Fix build errorYueHaibing2019-07-311-0/+1
| | | | | | | | | | | | | | | | | | | | | commit 3d139703d397f6281368047ba7ad1c8bf95aa8ab upstream. If BITREVERSE is m and FPGA_MGR_ALTERA_PS_SPI is y, build fails: drivers/fpga/altera-ps-spi.o: In function `altera_ps_write': altera-ps-spi.c:(.text+0x4ec): undefined reference to `byte_rev_table' Select BITREVERSE to fix this. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: fcfe18f885f6 ("fpga-manager: altera-ps-spi: use bitrev8x4") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: stable <stable@vger.kernel.org> Acked-by: Moritz Fischer <mdf@kernel.org> Link: https://lore.kernel.org/r/20190708071356.50928-1-yuehaibing@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* binder: prevent transactions to context manager from its own process.Hridya Valsaraju2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | commit 49ed96943a8e0c62cc5a9b0a6cfc88be87d1fcec upstream. Currently, a transaction to context manager from its own process is prevented by checking if its binder_proc struct is the same as that of the sender. However, this would not catch cases where the process opens the binder device again and uses the new fd to send a transaction to the context manager. Reported-by: syzbot+8b3c354d33c4ac78bfad@syzkaller.appspotmail.com Signed-off-by: Hridya Valsaraju <hridya@google.com> Acked-by: Todd Kjos <tkjos@google.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190715191804.112933-1-hridya@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* binder: Set end of SG buffer area properly.Martijn Coenen2019-07-311-1/+2
| | | | | | | | | | | | | | | | | commit a56587065094fd96eb4c2b5ad65571daad32156d upstream. In case the target node requests a security context, the extra_buffers_size is increased with the size of the security context. But, that size is not available for use by regular scatter-gather buffers; make sure the ending of that buffer is marked correctly. Acked-by: Todd Kjos <tkjos@google.com> Fixes: ec74136ded79 ("binder: create node flag to request sender's security context") Signed-off-by: Martijn Coenen <maco@android.com> Cc: stable@vger.kernel.org # 5.1+ Link: https://lore.kernel.org/r/20190709110923.220736-1-maco@android.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user()Eiichi Tsukata2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 2af7c85714d8cafadf925d55441458eae312cd6b upstream. When arch_stack_walk_user() is called from atomic contexts, access_ok() can trigger the following warning if compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y. Reproducer: // CONFIG_DEBUG_ATOMIC_SLEEP=y # cd /sys/kernel/debug/tracing # echo 1 > options/userstacktrace # echo 1 > events/irq/irq_handler_entry/enable WARNING: CPU: 0 PID: 2649 at arch/x86/kernel/stacktrace.c:103 arch_stack_walk_user+0x6e/0xf6 CPU: 0 PID: 2649 Comm: bash Not tainted 5.3.0-rc1+ #99 RIP: 0010:arch_stack_walk_user+0x6e/0xf6 Call Trace: <IRQ> stack_trace_save_user+0x10a/0x16d trace_buffer_unlock_commit_regs+0x185/0x240 trace_event_buffer_commit+0xec/0x330 trace_event_raw_event_irq_handler_entry+0x159/0x1e0 __handle_irq_event_percpu+0x22d/0x440 handle_irq_event_percpu+0x70/0x100 handle_irq_event+0x5a/0x8b handle_edge_irq+0x12f/0x3f0 handle_irq+0x34/0x40 do_IRQ+0xa6/0x1f0 common_interrupt+0xf/0xf </IRQ> Fix it by calling __range_not_ok() directly instead of access_ok() as copy_from_user_nmi() does. This is fine here because the actual copy is inside a pagefault disabled region. Reported-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190722083216.16192-2-devel@etsukata.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/speculation/mds: Apply more accurate check on hypervisor platformZhenzhong Duan2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | commit 517c3ba00916383af6411aec99442c307c23f684 upstream. X86_HYPER_NATIVE isn't accurate for checking if running on native platform, e.g. CONFIG_HYPERVISOR_GUEST isn't set or "nopv" is enabled. Checking the CPU feature bit X86_FEATURE_HYPERVISOR to determine if it's running on native platform is more accurate. This still doesn't cover the platforms on which X86_FEATURE_HYPERVISOR is unsupported, e.g. VMware, but there is nothing which can be done about this scenario. Fixes: 8a4b06d391b0 ("x86/speculation/mds: Add sysfs reporting for MDS") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1564022349-17338-1-git-send-email-zhenzhong.duan@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* x86/sysfb_efi: Add quirks for some devices with swapped width and heightHans de Goede2019-07-311-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | commit d02f1aa39189e0619c3525d5cd03254e61bf606a upstream. Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but advertise a landscape resolution and pitch, resulting in a messed up display if the kernel tries to show anything on the efifb (because of the wrong pitch). Fix this by adding a new DMI match table for devices which need to have their width and height swapped. At first it was tried to use the existing table for overriding some of the efifb parameters, but some of the affected devices have variants with different LCD resolutions which will not work with hardcoded override values. Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* selinux: check sidtab limit before adding a new entryOndrej Mosnacek2019-07-311-0/+5
| | | | | | | | | | | | | | | | commit acbc372e6109c803cbee4733769d02008381740f upstream. We need to error out when trying to add an entry above SIDTAB_MAX in sidtab_reverse_lookup() to avoid overflow on the odd chance that this happens. Cc: stable@vger.kernel.org Fixes: ee1a84fdfeed ("selinux: overhaul sidtab to fix bug and improve performance") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* btrfs: inode: Don't compress if NODATASUM or NODATACOW setQu Wenruo2019-07-311-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 42c16da6d684391db83788eb680accd84f6c2083 upstream. As btrfs(5) specified: Note If nodatacow or nodatasum are enabled, compression is disabled. If NODATASUM or NODATACOW set, we should not compress the extent. Normally NODATACOW is detected properly in run_delalloc_range() so compression won't happen for NODATACOW. However for NODATASUM we don't have any check, and it can cause compressed extent without csum pretty easily, just by: mkfs.btrfs -f $dev mount $dev $mnt -o nodatasum touch $mnt/foobar mount -o remount,datasum,compress $mnt xfs_io -f -c "pwrite 0 128K" $mnt/foobar And in fact, we have a bug report about corrupted compressed extent without proper data checksum so even RAID1 can't recover the corruption. (https://bugzilla.kernel.org/show_bug.cgi?id=199707) Running compression without proper checksum could cause more damage when corruption happens, as compressed data could make the whole extent unreadable, so there is no need to allow compression for NODATACSUM. The fix will refactor the inode compression check into two parts: - inode_can_compress() As the hard requirement, checked at btrfs_run_delalloc_range(), so no compression will happen for NODATASUM inode at all. - inode_need_compress() As the soft requirement, checked at btrfs_run_delalloc_range() and compress_file_range(). Reported-by: James Harvey <jamespharvey20@gmail.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already ↵Hans Verkuil2019-07-311-1/+7
| | | | | | | | | | | | | | | | | | | | in use commit 22be8233b34f4f468934c5fefcbe6151766fb8f2 upstream. The V4L2_PIX_FMT_BGRA444 define clashed with the pre-existing V4L2_PIX_FMT_SGRBG12 which strangely enough used the same fourcc, even though that fourcc made no sense for a Bayer format. In any case, you can't have duplicates, so change the fourcc of V4L2_PIX_FMT_BGRA444. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: <stable@vger.kernel.org> # for v5.2 and up Fixes: 6c84f9b1d2900 ("media: v4l: Add definitions for missing 16-bit RGB4444 formats") Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: XIVE: fix rollback when kvmppc_xive_create failsCédric Le Goater2019-07-312-5/+3
| | | | | | | | | | | | | | | | | | commit 9798f4ea71eaf8eaad7e688c5b298528089c7bf8 upstream. The XIVE device structure is now allocated in kvmppc_xive_get_device() and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create() will result in a double free and corrupt the host memory. Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6ea6998b-a890-2511-01d1-747d7621eb19@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Save and restore guest visible PSSCR bits on pseriesSuraj Jitindar Singh2019-07-311-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c8b4083db915dfe5a3b4a755ad2317e0509b43f1 upstream. The Performance Stop Status and Control Register (PSSCR) is used to control the power saving facilities of the processor. This register has various fields, some of which can be modified only in hypervisor state, and others which can be modified in both hypervisor and privileged non-hypervisor state. The bits which can be modified in privileged non-hypervisor state are referred to as guest visible. Currently the L0 hypervisor saves and restores both it's own host value as well as the guest value of the PSSCR when context switching between the hypervisor and guest. However a nested hypervisor running it's own nested guests (as indicated by kvmhv_on_pseries()) doesn't context switch the PSSCR register. That means if a nested (L2) guest modifies the PSSCR then the L1 guest hypervisor will run with that modified value, and if the L1 guest hypervisor modifies the PSSCR and then goes to run the nested (L2) guest again then the L2 PSSCR value will be lost. Fix this by having the (L1) nested hypervisor save and restore both its host and the guest PSSCR value when entering and exiting a nested (L2) guest. Note that only the guest visible parts of the PSSCR are context switched since this is all the L1 nested hypervisor can access, this is fine however as these are the only fields the L0 hypervisor provides guest control of anyway and so all other fields are ignored. This could also have been implemented by adding the PSSCR register to the hv_regs passed to the L0 hypervisor as input to the H_ENTER_NESTED hcall, however this would have meant updating the structure layout and thus required modifications to both the L0 and L1 kernels. Whereas the approach used doesn't require L0 kernel modifications while achieving the same result. Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190703012022.15644-3-sjitindarsingh@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: PPC: Book3S HV: Always save guest pmu for guest capable of nestingSuraj Jitindar Singh2019-07-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 63279eeb7f93abb1692573c26f1e038e1a87358b upstream. The performance monitoring unit (PMU) registers are saved on guest exit when the guest has set the pmcregs_in_use flag in its lppaca, if it exists, or unconditionally if it doesn't. If a nested guest is being run then the hypervisor doesn't, and in most cases can't, know if the PMU registers are in use since it doesn't know the location of the lppaca for the nested guest, although it may have one for its immediate guest. This results in the values of these registers being lost across nested guest entry and exit in the case where the nested guest was making use of the performance monitoring facility while it's nested guest hypervisor wasn't. Further more the hypervisor could interrupt a guest hypervisor between when it has loaded up the PMU registers and it calling H_ENTER_NESTED or between returning from the nested guest to the guest hypervisor and the guest hypervisor reading the PMU registers, in kvmhv_p9_guest_entry(). This means that it isn't sufficient to just save the PMU registers when entering or exiting a nested guest, but that it is necessary to always save the PMU registers whenever a guest is capable of running nested guests to ensure the register values aren't lost in the context switch. Ensure the PMU register values are preserved by always saving their value into the vcpu struct when a guest is capable of running nested guests. This should have minimal performance impact however any impact can be avoided by booting a guest with "-machine pseries,cap-nested-hv=false" on the qemu commandline. Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190703012022.15644-1-sjitindarsingh@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* KVM: X86: Fix fpu state crash in kvm guestWanpeng Li2019-07-311-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e751732486eb3f159089a64d1901992b1357e7cc upstream. The idea before commit 240c35a37 (which has just been reverted) was that we have the following FPU states: userspace (QEMU) guest --------------------------------------------------------------------------- processor vcpu->arch.guest_fpu >>> KVM_RUN: kvm_load_guest_fpu vcpu->arch.user_fpu processor >>> preempt out vcpu->arch.user_fpu current->thread.fpu >>> preempt in vcpu->arch.user_fpu processor >>> back to userspace >>> kvm_put_guest_fpu processor vcpu->arch.guest_fpu --------------------------------------------------------------------------- With the new lazy model we want to get the state back to the processor when schedule in from current->thread.fpu. Reported-by: Thomas Lambertz <mail@thomaslambertz.de> Reported-by: anthony <antdev66@gmail.com> Tested-by: anthony <antdev66@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Lambertz <mail@thomaslambertz.de> Cc: anthony <antdev66@gmail.com> Cc: stable@vger.kernel.org Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Add a comment in front of the warning. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: usb251xb: Reallow swap-dx-lanes to apply to the upstream portLucas Stach2019-07-311-6/+7
| | | | | | | | | | | | | | | | | | | | | | | commit 4849ee6129702dcb05d36f9c7c61b4661fcd751f upstream. This is a partial revert of 73d31def1aab "usb: usb251xb: Create a ports field collector method", which broke a existing devicetree (arch/arm64/boot/dts/freescale/imx8mq.dtsi). There is no reason why the swap-dx-lanes property should not apply to the upstream port. The reason given in the breaking commit was that it's inconsitent with respect to other port properties, but in fact it is not. All other properties which only apply to the downstream ports explicitly reject port 0, so there is pretty strong precedence that the driver referred to the upstream port as port 0. So there is no inconsistency in this property at all, other than the swapping being also applicable to the upstream port. CC: stable@vger.kernel.org #5.2 Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20190719084407.28041-3-l.stach@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "usb: usb251xb: Add US port lanes inversion property"Lucas Stach2019-07-311-2/+0
| | | | | | | | | | | | | | commit 79f6fafad4e2a874015cb67d735f9f87f1834367 upstream. This property isn't needed and not yet used anywhere. The swap-dx-lanes property is perfectly fine for doing the swap on the upstream port lanes. CC: stable@vger.kernel.org #5.2 Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20190719084407.28041-2-l.stach@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "usb: usb251xb: Add US lanes inversion dts-bindings"Lucas Stach2019-07-311-4/+2
| | | | | | | | | | | | | | | commit bafe64e5f0edaa689e72e2f8dc236641da37fed4 upstream. This reverts commit 3342ce35a1, as there is no need for this separate property and it breaks compatibility with existing devicetree files (arch/arm64/boot/dts/freescale/imx8mq.dtsi). CC: stable@vger.kernel.org #5.2 Fixes: 3342ce35a183 ("usb: usb251xb: Add US lanes inversion dts-bindings") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20190719084407.28041-1-l.stach@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: pci-quirks: Correct AMD PLL quirk detectionRyan Kennedy2019-07-311-12/+19
| | | | | | | | | | | | | | | | | | | | commit f3dccdaade4118070a3a47bef6b18321431f9ac6 upstream. The AMD PLL USB quirk is incorrectly enabled on newer Ryzen chipsets. The logic in usb_amd_find_chipset_info currently checks for unaffected chipsets rather than affected ones. This broke once a new chipset was added in e788787ef. It makes more sense to reverse the logic so it won't need to be updated as new chipsets are added. Note that the core of the workaround in usb_amd_quirk_pll does correctly check the chipset. Signed-off-by: Ryan Kennedy <ryan5544@gmail.com> Fixes: e788787ef4f9 ("usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume") Cc: stable <stable@vger.kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20190704153529.9429-2-ryan5544@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb: wusbcore: fix unbalanced get/put cluster_idPhong Tran2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f90bf1ece48a736097ea224430578fe586a9544c upstream. syzboot reported that https://syzkaller.appspot.com/bug?extid=fd2bd7df88c606eea4ef There is not consitency parameter in cluste_id_get/put calling. In case of getting the id with result is failure, the wusbhc->cluster_id will not be updated and this can not be used for wusb_cluster_id_put(). Tested report https://groups.google.com/d/msg/syzkaller-bugs/0znZopp3-9k/oxOrhLkLEgAJ Reproduce and gdb got the details: 139 addr = wusb_cluster_id_get(); (gdb) n 140 if (addr == 0) (gdb) print addr $1 = 254 '\376' (gdb) n 142 result = __hwahc_set_cluster_id(hwahc, addr); (gdb) print result $2 = -71 (gdb) break wusb_cluster_id_put Breakpoint 3 at 0xffffffff836e3f20: file drivers/usb/wusbcore/wusbhc.c, line 384. (gdb) s Thread 2 hit Breakpoint 3, wusb_cluster_id_put (id=0 '\000') at drivers/usb/wusbcore/wusbhc.c:384 384 id = 0xff - id; (gdb) n 385 BUG_ON(id >= CLUSTER_IDS); (gdb) print id $3 = 255 '\377' Reported-by: syzbot+fd2bd7df88c606eea4ef@syzkaller.appspotmail.com Signed-off-by: Phong Tran <tranmanphong@gmail.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20190724020601.15257-1-tranmanphong@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* usb-storage: Add a limitation for blk_queue_max_hw_sectors()Yoshihiro Shimoda2019-07-311-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d74ffae8b8dd17eaa8b82fc163e6aa2076dc8fb1 upstream. This patch fixes an issue that the following error happens on swiotlb environment: xhci-hcd ee000000.usb: swiotlb buffer is full (sz: 524288 bytes), total 32768 (slots), used 1338 (slots) On the kernel v5.1, block settings of a usb-storage with SuperSpeed were the following so that the block layer will allocate buffers up to 64 KiB, and then the issue didn't happen. max_segment_size = 65536 max_hw_sectors_kb = 1024 After the commit 09324d32d2a0 ("block: force an unlimited segment size on queues with a virt boundary") is applied, the block settings are the following. So, the block layer will allocate buffers up to 1024 KiB, and then the issue happens: max_segment_size = 4294967295 max_hw_sectors_kb = 1024 To fix the issue, the usb-storage driver checks the maximum size of a mapping for the device and then adjusts the max_hw_sectors_kb if required. After this patch is applied, the block settings will be the following, and then the issue doesn't happen. max_segment_size = 4294967295 max_hw_sectors_kb = 256 Fixes: 09324d32d2a0 ("block: force an unlimited segment size on queues with a virt boundary") Cc: stable <stable@vger.kernel.org> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1563793105-20597-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* xhci: Fix crash if scatter gather is used with Immediate Data Transfer (IDT).Mathias Nyman2019-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d39b5bad8658d6d94cb2d98a44a7e159db4f5030 upstream. A second regression was found in the immediate data transfer (IDT) support which was added to 5.2 kernel IDT is used to transfer small amounts of data (up to 8 bytes) in the field normally used for data dma address, thus avoiding dma mapping. If the data was not already dma mapped, then IDT support assumed data was in urb->transfer_buffer, and did not take into accound that even small amounts of data (8 bytes) can be in a scatterlist instead. This caused a NULL pointer dereference when sg_dma_len() was used with non-dma mapped data. Solve this by not using IDT if scatter gather buffer list is used. Fixes: 33e39350ebd2 ("usb: xhci: add Immediate Data Transfer support") Cc: <stable@vger.kernel.org> # v5.2 Reported-by: Maik Stohn <maik.stohn@seal-one.com> Tested-by: Maik Stohn <maik.stohn@seal-one.com> CC: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/1564044861-1445-1-git-send-email-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* locking/lockdep: Hide unused 'class' variableArnd Bergmann2019-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 68037aa78208f34bda4e5cd76c357f718b838cbb ] The usage is now hidden in an #ifdef, so we need to move the variable itself in there as well to avoid this warning: kernel/locking/lockdep_proc.c:203:21: error: unused variable 'class' [-Werror,-Wunused-variable] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yuyang Du <duyuyang@gmail.com> Cc: frederic@kernel.org Fixes: 68d41d8c94a3 ("locking/lockdep: Fix lock used or unused stats error") Link: https://lkml.kernel.org/r/20190715092809.736834-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm, swap: fix race between swapoff and some swap operationsHuang Ying2019-07-314-39/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit eb085574a7526c4375965c5fbf7e5b0c19cdd336 ] When swapin is performed, after getting the swap entry information from the page table, system will swap in the swap entry, without any lock held to prevent the swap device from being swapoff. This may cause the race like below, CPU 1 CPU 2 ----- ----- do_swap_page swapin_readahead __read_swap_cache_async swapoff swapcache_prepare p->swap_map = NULL __swap_duplicate p->swap_map[?] /* !!! NULL pointer access */ Because swapoff is usually done when system shutdown only, the race may not hit many people in practice. But it is still a race need to be fixed. To fix the race, get_swap_device() is added to check whether the specified swap entry is valid in its swap device. If so, it will keep the swap entry valid via preventing the swap device from being swapoff, until put_swap_device() is called. Because swapoff() is very rare code path, to make the normal path runs as fast as possible, rcu_read_lock/unlock() and synchronize_rcu() instead of reference count is used to implement get/put_swap_device(). >From get_swap_device() to put_swap_device(), RCU reader side is locked, so synchronize_rcu() in swapoff() will wait until put_swap_device() is called. In addition to swap_map, cluster_info, etc. data structure in the struct swap_info_struct, the swap cache radix tree will be freed after swapoff, so this patch fixes the race between swap cache looking up and swapoff too. Races between some other swap cache usages and swapoff are fixed too via calling synchronize_rcu() between clearing PageSwapCache() and freeing swap cache data structure. Another possible method to fix this is to use preempt_off() + stop_machine() to prevent the swap device from being swapoff when its data structure is being accessed. The overhead in hot-path of both methods is similar. The advantages of RCU based method are, 1. stop_machine() may disturb the normal execution code path on other CPUs. 2. File cache uses RCU to protect its radix tree. If the similar mechanism is used for swap cache too, it is easier to share code between them. 3. RCU is used to protect swap cache in total_swapcache_pages() and exit_swap_address_space() already. The two mechanisms can be merged to simplify the logic. Link: http://lkml.kernel.org/r/20190522015423.14418-1-ying.huang@intel.com Fixes: 235b62176712 ("mm/swap: add cluster lock") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com> Not-nacked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm: use down_read_killable for locking mmap_sem in access_remote_vmKonstantin Khlebnikov2019-07-312-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1e426fe28261b03f297992e89da3320b42816f4e ] This function is used by ptrace and proc files like /proc/pid/cmdline and /proc/pid/environ. Access_remote_vm never returns error codes, all errors are ignored and only size of successfully read data is returned. So, if current task was killed we'll simply return 0 (bytes read). Mmap_sem could be locked for a long time or forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Link: http://lkml.kernel.org/r/156007494202.3335.16782303099589302087.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* locking/lockdep: Fix lock used or unused stats errorYuyang Du2019-07-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 68d41d8c94a31dfb8233ab90b9baf41a2ed2da68 ] The stats variable nr_unused_locks is incremented every time a new lock class is register and decremented when the lock is first used in __lock_acquire(). And after all, it is shown and checked in lockdep_stats. However, under configurations that either CONFIG_TRACE_IRQFLAGS or CONFIG_PROVE_LOCKING is not defined: The commit: 091806515124b20 ("locking/lockdep: Consolidate lock usage bit initialization") missed marking the LOCK_USED flag at IRQ usage initialization because as mark_usage() is not called. And the commit: 886532aee3cd42d ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING") further made mark_lock() not defined such that the LOCK_USED cannot be marked at all when the lock is first acquired. As a result, we fix this by not showing and checking the stats under such configurations for lockdep_stats. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Yuyang Du <duyuyang@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: arnd@arndb.de Cc: frederic@kernel.org Link: https://lkml.kernel.org/r/20190709101522.9117-1-duyuyang@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* proc: use down_read_killable mmap_sem for /proc/pid/mapsKonstantin Khlebnikov2019-07-312-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 8a713e7df3352b8d9392476e9cf29e4e185dac32 ] Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. This function is also used for /proc/pid/smaps. Link: http://lkml.kernel.org/r/156007493160.3335.14447544314127417266.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* cxgb4: reduce kernel stack usage in cudbg_collect_mem_region()Arnd Bergmann2019-07-311-6/+13
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 752c2ea2d8e7c23b0f64e2e7d4337f3604d44c9f ] The cudbg_collect_mem_region() and cudbg_read_fw_mem() both use several hundred kilobytes of kernel stack space. One gets inlined into the other, which causes the stack usage to be combined beyond the warning limit when building with clang: drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c:1057:12: error: stack frame size of 1244 bytes in function 'cudbg_collect_mem_region' [-Werror,-Wframe-larger-than=] Restructuring cudbg_collect_mem_region() lets clang do the same optimization that gcc does and reuse the stack slots as it can see that the large variables are never used together. A better fix might be to avoid using cudbg_meminfo on the stack altogether, but that requires a larger rewrite. Fixes: a1c69520f785 ("cxgb4: collect MC memory dump") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
* proc: use down_read_killable mmap_sem for /proc/pid/map_filesKonstantin Khlebnikov2019-07-311-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit cd9e2bb8271c971d9f37c722be2616c7f8ba0664 ] Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. It seems ->d_revalidate() could return any error (except ECHILD) to abort validation and pass error as result of lookup sequence. [akpm@linux-foundation.org: fix proc_map_files_lookup() return value, per Andrei] Link: http://lkml.kernel.org/r/156007493995.3335.9595044802115356911.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* proc: use down_read_killable mmap_sem for /proc/pid/clear_refsKonstantin Khlebnikov2019-07-311-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c46038017fbdcac627b670c9d4176f1d0c2f5fa3 ] Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Replace the only unkillable mmap_sem lock in clear_refs_write(). Link: http://lkml.kernel.org/r/156007493826.3335.5424884725467456239.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* proc: use down_read_killable mmap_sem for /proc/pid/pagemapKonstantin Khlebnikov2019-07-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ad80b932c57d85fd6377f97f359b025baf179a87 ] Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Link: http://lkml.kernel.org/r/156007493638.3335.4872164955523928492.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollupKonstantin Khlebnikov2019-07-311-2/+6
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a26a97815548574213fd37f29b4b78ccc6d9ed20 ] Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Link: http://lkml.kernel.org/r/156007493429.3335.14666825072272692455.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Roman Gushchin <guro@fb.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm/mmu_notifier: use hlist_add_head_rcu()Jean-Philippe Brucker2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 543bdb2d825fe2400d6e951f1786d92139a16931 ] Make mmu_notifier_register() safer by issuing a memory barrier before registering a new notifier. This fixes a theoretical bug on weakly ordered CPUs. For example, take this simplified use of notifiers by a driver: my_struct->mn.ops = &my_ops; /* (1) */ mmu_notifier_register(&my_struct->mn, mm) ... hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */ ... Once mmu_notifier_register() releases the mm locks, another thread can invalidate a range: mmu_notifier_invalidate_range() ... hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) { if (mn->ops->invalidate_range) The read side relies on the data dependency between mn and ops to ensure that the pointer is properly initialized. But the write side doesn't have any dependency between (1) and (2), so they could be reordered and the readers could dereference an invalid mn->ops. mmu_notifier_register() does take all the mm locks before adding to the hlist, but those have acquire semantics which isn't sufficient. By calling hlist_add_head_rcu() instead of hlist_add_head() we update the hlist using a store-release, ensuring that readers see prior initialization of my_struct. This situation is better illustated by litmus test MP+onceassign+derefonce. Link: http://lkml.kernel.org/r/20190502133532.24981-1-jean-philippe.brucker@arm.com Fixes: cddb8a5c14aa ("mmu-notifiers: core") Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* memcg, fsnotify: no oom-kill for remote memcg chargingShakeel Butt2019-07-312-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ec165450968b26298bd1c373de37b0ab6d826b33 ] Commit d46eb14b735b ("fs: fsnotify: account fsnotify metadata to kmemcg") added remote memcg charging for fanotify and inotify event objects. The aim was to charge the memory to the listener who is interested in the events but without triggering the OOM killer. Otherwise there would be security concerns for the listener. At the time, oom-kill trigger was not in the charging path. A parallel work added the oom-kill back to charging path i.e. commit 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path"). So to not trigger oom-killer in the remote memcg, explicitly add __GFP_RETRY_MAYFAIL to the fanotigy and inotify event allocations. Link: http://lkml.kernel.org/r/20190514212259.156585-2-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm/gup.c: remove some BUG_ONs from get_gate_page()Andy Lutomirski2019-07-311-3/+6
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit b5d1c39f34d1c9bca0c4b9ae2e339fbbe264a9c7 ] If we end up without a PGD or PUD entry backing the gate area, don't BUG -- just fail gracefully. It's not entirely implausible that this could happen some day on x86. It doesn't right now even with an execute-only emulated vsyscall page because the fixmap shares the PUD, but the core mm code shouldn't rely on that particular detail to avoid OOPSing. Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm/gup.c: mark undo_dev_pagemap as __maybe_unusedGuenter Roeck2019-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit 790c73690c2bbecb3f6f8becbdb11ddc9bcff8cc ] Several mips builds generate the following build warning. mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used The function is declared unconditionally but only called from behind various ifdefs. Mark it __maybe_unused. Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm/mincore.c: fix race between swapoff and mincoreHuang Ying2019-07-311-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit aeb309b81c6bada783c3695528a3e10748e97285 ] Via commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"), after swapoff, the address_space associated with the swap device will be freed. So swap_address_space() users which touch the address_space need some kind of mechanism to prevent the address_space from being freed during accessing. When mincore processes an unmapped range for swapped shmem pages, it doesn't hold the lock to prevent swap device from being swapped off. So the following race is possible: CPU1 CPU2 do_mincore() swapoff() walk_page_range() mincore_unmapped_range() __mincore_unmapped_range mincore_page as = swap_address_space() ... exit_swap_address_space() ... kvfree(spaces) find_get_page(as) The address space may be accessed after being freed. To fix the race, get_swap_device()/put_swap_device() is used to enclose find_get_page() to check whether the swap entry is valid and prevent the swap device from being swapoff during accessing. Link: http://lkml.kernel.org/r/20190611020510.28251-1-ying.huang@intel.com Fixes: 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* 9p: pass the correct prototype to read_cache_pageChristoph Hellwig2019-07-311-2/+4
| | | | | | | | | | | | | | | | | [ Upstream commit f053cbd4366051d7eb6ba1b8d529d20f719c2963 ] Fix the callback 9p passes to read_cache_page to actually have the proper type expected. Casting around function pointers can easily hide typing bugs, and defeats control flow protection. Link: http://lkml.kernel.org/r/20190520055731.24538-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
* mm/kmemleak.c: fix check for softirq contextDmitry Vyukov2019-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6ef9056952532c3b746de46aa10d45b4d7797bd8 ] in_softirq() is a wrong predicate to check if we are in a softirq context. It also returns true if we have BH disabled, so objects are falsely stamped with "softirq" comm. The correct predicate is in_serving_softirq(). If user does cat from /sys/kernel/debug/kmemleak previously they would see this, which is clearly wrong, this is system call context (see the comm): unreferenced object 0xffff88805bd661c0 (size 64): comm "softirq", pid 0, jiffies 4294942959 (age 12.400s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00 ................ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<0000000007dcb30c>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<0000000007dcb30c>] slab_post_alloc_hook mm/slab.h:439 [inline] [<0000000007dcb30c>] slab_alloc mm/slab.c:3326 [inline] [<0000000007dcb30c>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<00000000969722b7>] kmalloc include/linux/slab.h:547 [inline] [<00000000969722b7>] kzalloc include/linux/slab.h:742 [inline] [<00000000969722b7>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline] [<00000000969722b7>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085 [<00000000a4134b5f>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475 [<00000000d20248ad>] do_ip_setsockopt.isra.0+0x19fe/0x1c00 net/ipv4/ip_sockglue.c:957 [<000000003d367be7>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246 [<000000003c7c76af>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616 [<000000000c1aeb23>] sock_common_setsockopt+0x3e/0x50 net/core/sock.c:3130 [<000000000157b92b>] __sys_setsockopt+0x9e/0x120 net/socket.c:2078 [<00000000a9f3d058>] __do_sys_setsockopt net/socket.c:2089 [inline] [<00000000a9f3d058>] __se_sys_setsockopt net/socket.c:2086 [inline] [<00000000a9f3d058>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086 [<000000001b8da885>] do_syscall_64+0x7c/0x1a0 arch/x86/entry/common.c:301 [<00000000ba770c62>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 now they will see this: unreferenced object 0xffff88805413c800 (size 64): comm "syz-executor.4", pid 8960, jiffies 4294994003 (age 14.350s) hex dump (first 32 bytes): 00 7a 8a 57 80 88 ff ff e0 00 00 01 00 00 00 00 .z.W............ 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ backtrace: [<00000000c5d3be64>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<00000000c5d3be64>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000c5d3be64>] slab_alloc mm/slab.c:3326 [inline] [<00000000c5d3be64>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<0000000023865be2>] kmalloc include/linux/slab.h:547 [inline] [<0000000023865be2>] kzalloc include/linux/slab.h:742 [inline] [<0000000023865be2>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline] [<0000000023865be2>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085 [<000000003029a9d4>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475 [<00000000ccd0a87c>] do_ip_setsockopt.isra.0+0x19fe/0x1c00 net/ipv4/ip_sockglue.c:957 [<00000000a85a3785>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246 [<00000000ec13c18d>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616 [<0000000052d748e3>] sock_common_setsockopt+0x3e/0x50 net/core/sock.c:3130 [<00000000512f1014>] __sys_setsockopt+0x9e/0x120 net/socket.c:2078 [<00000000181758bc>] __do_sys_setsockopt net/socket.c:2089 [inline] [<00000000181758bc>] __se_sys_setsockopt net/socket.c:2086 [inline] [<00000000181758bc>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086 [<00000000d4b73623>] do_syscall_64+0x7c/0x1a0 arch/x86/entry/common.c:301 [<00000000c1098bec>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20190517171507.96046-1-dvyukov@gmail.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>