summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-4.20' of ↵Linus Torvalds2018-10-251-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "All trivial changes - simplification, typo fix and adding cond_resched() in a netclassid update loop" * 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup, netclassid: add a preemption point to write_classid rdmacg: fix a typo in rdmacg documentation cgroup: Simplify cgroup_ancestor
| * cgroup, netclassid: add a preemption point to write_classidMichal Hocko2018-10-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have seen a customer complaining about soft lockups on !PREEMPT kernel config with 4.4 based kernel [1072141.435366] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [systemd:1] [1072141.444090] Modules linked in: mpt3sas raid_class binfmt_misc af_packet 8021q garp mrp stp llc xfs libcrc32c bonding iscsi_ibft iscsi_boot_sysfs msr ext4 crc16 jbd2 mbcache cdc_ether usbnet mii joydev hid_generic usbhid intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif mgag200 i2c_algo_bit ttm ipmi_devintf drbg ixgbe drm_kms_helper vxlan ansi_cprng ip6_udp_tunnel drm aesni_intel udp_tunnel aes_x86_64 iTCO_wdt syscopyarea ptp xhci_pci lrw iTCO_vendor_support pps_core gf128mul ehci_pci glue_helper sysfillrect mdio pcspkr sb_edac ablk_helper cryptd ehci_hcd sysimgblt xhci_hcd fb_sys_fops edac_core mei_me lpc_ich ses usbcore enclosure dca mfd_core ipmi_si mei i2c_i801 scsi_transport_sas usb_common ipmi_msghandler shpchp fjes wmi processor button acpi_pad btrfs xor raid6_pq sd_mod crc32c_intel megaraid_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod md_mod autofs4 [1072141.444146] Supported: Yes [1072141.444149] CPU: 21 PID: 1 Comm: systemd Not tainted 4.4.121-92.80-default #1 [1072141.444150] Hardware name: LENOVO Lenovo System x3650 M5 -[5462P4U]- -[5462P4U]-/01GR451, BIOS -[TCE136H-2.70]- 06/13/2018 [1072141.444151] task: ffff880191bd0040 ti: ffff880191bd4000 task.ti: ffff880191bd4000 [1072141.444153] RIP: 0010:[<ffffffff815229f9>] [<ffffffff815229f9>] update_classid_sock+0x29/0x40 [1072141.444157] RSP: 0018:ffff880191bd7d58 EFLAGS: 00000286 [1072141.444158] RAX: ffff883b177cb7c0 RBX: 0000000000000000 RCX: 0000000000000000 [1072141.444159] RDX: 00000000000009c7 RSI: ffff880191bd7d5c RDI: ffff8822e29bb200 [1072141.444160] RBP: ffff883a72230980 R08: 0000000000000101 R09: 0000000000000000 [1072141.444161] R10: 0000000000000008 R11: f000000000000000 R12: ffffffff815229d0 [1072141.444162] R13: 0000000000000000 R14: ffff881fd0a47ac0 R15: ffff880191bd7f28 [1072141.444163] FS: 00007f3e2f1eb8c0(0000) GS:ffff882000340000(0000) knlGS:0000000000000000 [1072141.444164] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1072141.444165] CR2: 00007f3e2f200000 CR3: 0000001ffea4e000 CR4: 00000000001606f0 [1072141.444166] Stack: [1072141.444166] ffffffa800000246 00000000000009c7 ffffffff8121d583 ffff8818312a05c0 [1072141.444168] ffff8818312a1100 ffff880197c3b280 ffff881861422858 ffffffffffffffea [1072141.444170] ffffffff81522b1c ffffffff81d0ca20 ffff8817fa17b950 ffff883fdd8121e0 [1072141.444171] Call Trace: [1072141.444179] [<ffffffff8121d583>] iterate_fd+0x53/0x80 [1072141.444182] [<ffffffff81522b1c>] write_classid+0x4c/0x80 [1072141.444187] [<ffffffff8111328b>] cgroup_file_write+0x9b/0x100 [1072141.444193] [<ffffffff81278bcb>] kernfs_fop_write+0x11b/0x150 [1072141.444198] [<ffffffff81201566>] __vfs_write+0x26/0x100 [1072141.444201] [<ffffffff81201bed>] vfs_write+0x9d/0x190 [1072141.444203] [<ffffffff812028c2>] SyS_write+0x42/0xa0 [1072141.444207] [<ffffffff815f58c3>] entry_SYSCALL_64_fastpath+0x1e/0xca [1072141.445490] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xca If a cgroup has many tasks with many open file descriptors then we would end up in a large loop without any rescheduling point throught the operation. Add cond_resched once per task. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | Merge branch 'linus' of ↵Linus Torvalds2018-10-2511-133/+132
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Remove VLA usage - Add cryptostat user-space interface - Add notifier for new crypto algorithms Algorithms: - Add OFB mode - Remove speck Drivers: - Remove x86/sha*-mb as they are buggy - Remove pcbc(aes) from x86/aesni - Improve performance of arm/ghash-ce by up to 85% - Implement CTS-CBC in arm64/aes-blk, faster by up to 50% - Remove PMULL based arm64/crc32 driver - Use PMULL in arm64/crct10dif - Add aes-ctr support in s5p-sss - Add caam/qi2 driver Others: - Pick better transform if one becomes available in crc-t10dif" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (124 commits) crypto: chelsio - Update ntx queue received from cxgb4 crypto: ccree - avoid implicit enum conversion crypto: caam - add SPDX license identifier to all files crypto: caam/qi - simplify CGR allocation, freeing crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static crypto: arm64/aes-blk - ensure XTS mask is always loaded crypto: testmgr - fix sizeof() on COMP_BUF_SIZE crypto: chtls - remove set but not used variable 'csk' crypto: axis - fix platform_no_drv_owner.cocci warnings crypto: x86/aes-ni - fix build error following fpu template removal crypto: arm64/aes - fix handling sub-block CTS-CBC inputs crypto: caam/qi2 - avoid double export crypto: mxs-dcp - Fix AES issues crypto: mxs-dcp - Fix SHA null hashes and output length crypto: mxs-dcp - Implement sha import/export crypto: aegis/generic - fix for big endian systems crypto: morus/generic - fix for big endian systems crypto: lrw - fix rebase error after out of bounds fix crypto: cavium/nitrox - use pci_alloc_irq_vectors() while enabling MSI-X. crypto: cavium/nitrox - NITROX command queue changes. ...
| * | rxrpc: Remove VLA usage of skcipherKees Cook2018-09-282-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: David Howells <dhowells@redhat.com> Cc: linux-afs@lists.infradead.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | libceph: Remove VLA usage of skcipherKees Cook2018-09-282-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Ilya Dryomov <idryomov@gmail.com> Cc: "Yan, Zheng" <zyan@redhat.com> Cc: Sage Weil <sage@redhat.com> Cc: ceph-devel@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | mac802154: Remove VLA usage of skcipherKees Cook2018-09-282-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@datenfreihafen.org> Cc: linux-wpan@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | lib80211: Remove VLA usage of skcipherKees Cook2018-09-282-32/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | gss_krb5: Remove VLA usage of skcipherKees Cook2018-09-285-94/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this replaces struct crypto_skcipher and SKCIPHER_REQUEST_ON_STACK() usage with struct crypto_sync_skcipher and SYNC_SKCIPHER_REQUEST_ON_STACK(), which uses a fixed stack size. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: YueHaibing <yuehaibing@huawei.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | Merge branch 'work.compat' of ↵Linus Torvalds2018-10-255-49/+80
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull compat_ioctl fixes from Al Viro: "A bunch of compat_ioctl fixes, mostly in bluetooth. Hopefully, most of fs/compat_ioctl.c will get killed off over the next few cycles; between this, tty series already merged and Arnd's work this cycle ought to take a good chunk out of the damn thing..." * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: hidp: fix compat_ioctl hidp: constify hidp_connection_add() cmtp: fix compat_ioctl bnep: fix compat_ioctl compat_ioctl: trim the pointless includes
| * | | hidp: fix compat_ioctlAl Viro2018-09-101-29/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) no point putting it into fs/compat_ioctl.c when you handle it in your ->compat_ioctl() anyway. 2) HIDPCONNADD is *not* COMPATIBLE_IOCTL() stuff at all - it does layout massage (pointer-chasing there) 3) use compat_ptr() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | hidp: constify hidp_connection_add()Al Viro2018-09-102-6/+6
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | cmtp: fix compat_ioctlAl Viro2018-09-101-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | Use compat_ptr(). And don't mess with fs/compat_ioctl.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | bnep: fix compat_ioctlAl Viro2018-09-101-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | use compat_ptr() properly and don't bother with fs/compat_ioctl.c - it's all handled in ->compat_ioctl() anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2018-10-252-15/+13
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "The timers and timekeeping departement provides: - Another large y2038 update with further preparations for providing the y2038 safe timespecs closer to the syscalls. - An overhaul of the SHCMT clocksource driver - SPDX license identifier updates - Small cleanups and fixes all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) tick/sched : Remove redundant cpu_online() check clocksource/drivers/dw_apb: Add reset control clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE clocksource/drivers: Unify the names to timer-* format clocksource/drivers/sh_cmt: Add R-Car gen3 support dt-bindings: timer: renesas: cmt: document R-Car gen3 support clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines clocksource/drivers/sh_cmt: Fixup for 64-bit machines clocksource/drivers/sh_tmu: Convert to SPDX identifiers clocksource/drivers/sh_mtu2: Convert to SPDX identifiers clocksource/drivers/sh_cmt: Convert to SPDX identifiers clocksource/drivers/renesas-ostm: Convert to SPDX identifiers clocksource: Convert to using %pOFn instead of device_node.name tick/broadcast: Remove redundant check RISC-V: Request newstat syscalls y2038: signal: Change rt_sigtimedwait to use __kernel_timespec y2038: socket: Change recvmmsg to use __kernel_timespec y2038: sched: Change sched_rr_get_interval to use __kernel_timespec y2038: utimes: Rework #ifdef guards for compat syscalls ...
| * | | | y2038: socket: Change recvmmsg to use __kernel_timespecArnd Bergmann2018-08-292-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts the recvmmsg() system call in all its variations to use 'timespec64' internally for its timeout, and have a __kernel_timespec64 argument in the native entry point. This lets us change the type to use 64-bit time_t at a later point while using the 32-bit compat system call emulation for existing user space. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| * | | | y2038: globally rename compat_time to old_time32Arnd Bergmann2018-08-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph Hellwig suggested a slightly different path for handling backwards compatibility with the 32-bit time_t based system calls: Rather than simply reusing the compat_sys_* entry points on 32-bit architectures unchanged, we get rid of those entry points and the compat_time types by renaming them to something that makes more sense on 32-bit architectures (which don't have a compat mode otherwise), and then share the entry points under the new name with the 64-bit architectures that use them for implementing the compatibility. The following types and interfaces are renamed here, and moved from linux/compat_time.h to linux/time32.h: old new --- --- compat_time_t old_time32_t struct compat_timeval struct old_timeval32 struct compat_timespec struct old_timespec32 struct compat_itimerspec struct old_itimerspec32 ns_to_compat_timeval() ns_to_old_timeval32() get_compat_itimerspec64() get_old_itimerspec32() put_compat_itimerspec64() put_old_itimerspec32() compat_get_timespec64() get_old_timespec32() compat_put_timespec64() put_old_timespec32() As we already have aliases in place, this patch addresses only the instances that are relevant to the system call interface in particular, not those that occur in device drivers and other modules. Those will get handled separately, while providing the 64-bit version of the respective interfaces. I'm not renaming the timex, rusage and itimerval structures, as we are still debating what the new interface will look like, and whether we will need a replacement at all. This also doesn't change the names of the syscall entry points, which can be done more easily when we actually switch over the 32-bit architectures to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | | | | Merge tag 'docs-4.20' of git://git.lwn.net/linuxLinus Torvalds2018-10-241-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull documentation updates from Jonathan Corbet: "This is a fairly typical cycle for documentation. There's some welcome readability improvements for the formatted output, some LICENSES updates including the addition of the ISC license, the removal of the unloved and unmaintained 00-INDEX files, the deprecated APIs document from Kees, more MM docs from Mike Rapoport, and the usual pile of typo fixes and corrections" * tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits) docs: Fix typos in histogram.rst docs: Introduce deprecated APIs list kernel-doc: fix declaration type determination doc: fix a typo in adding-syscalls.rst docs/admin-guide: memory-hotplug: remove table of contents doc: printk-formats: Remove bogus kobject references for device nodes Documentation: preempt-locking: Use better example dm flakey: Document "error_writes" feature docs/completion.txt: Fix a couple of punctuation nits LICENSES: Add ISC license text LICENSES: Add note to CDDL-1.0 license that it should not be used docs/core-api: memory-hotplug: add some details about locking internals docs/core-api: rename memory-hotplug-notifier to memory-hotplug docs: improve readability for people with poorer eyesight yama: clarify ptrace_scope=2 in Yama documentation docs/vm: split memory hotplug notifier description to Documentation/core-api docs: move memory hotplug description into admin-guide/mm doc: Fix acronym "FEKEK" in ecryptfs docs: fix some broken documentation references iommu: Fix passthrough option documentation ...
| * | | | | docs: fix some broken documentation referencesMauro Carvalho Chehab2018-09-201-1/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some documentation files received recent changes and are pointing to wrong places. Those references can easily fixed with the help of a script: $ ./scripts/documentation-file-ref-check --fix Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
* | | | | Merge branch 'work.tty-ioctl' of ↵Linus Torvalds2018-10-242-12/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull tty ioctl updates from Al Viro: "This is the compat_ioctl work related to tty ioctls. Quite a bit of dead code taken out, all tty-related stuff gone from fs/compat_ioctl.c. A bunch of compat bugs fixed - some still remain, but all more or less generic tty-related ioctls should be covered (remaining issues are in things like driver-private ioctls in a pcmcia serial card driver not getting properly handled in 32bit processes on 64bit host, etc)" * 'work.tty-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (53 commits) kill TIOCSERGSTRUCT change semantics of ldisc ->compat_ioctl() kill TIOCSER[SG]WILD synclink_gt(): fix compat_ioctl() pty: fix compat ioctls compat_ioctl - kill keyboard ioctl handling gigaset: add ->compat_ioctl() vt_compat_ioctl(): clean up, use compat_ptr() properly gigaset: don't try to printk userland buffer contents dgnc: don't bother with (empty) stub for TCXONC dgnc: leave TIOC[GS]SOFTCAR to ldisc remove fallback to drivers for TIOCGICOUNT dgnc: break-related ioctls won't reach ->ioctl() kill the rest of tty COMPAT_IOCTL() entries dgnc: TIOCM... won't reach ->ioctl() isdn_tty: TCSBRK{,P} won't reach ->ioctl() kill capinc_tty_ioctl() take compat TIOC[SG]SERIAL treatment into tty_compat_ioctl() synclink: reduce pointless checks in ->ioctl() complete ->[sg]et_serial() switchover ...
| * | | | | kill TIOCSERGSTRUCTAl Viro2018-10-131-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once upon a time a bunch of serial drivers used to provide that; today it's only amiserial and it's FUBAR - the structure being copied to userland includes kernel pointers, fields with config-dependent size, etc. No userland code using it could possibly survive - e.g. enabling lockdep definitely changes the layout. Besides, it's a massive infoleak. Kill it. If somebody needs that data for debugging purposes, they can bloody well expose it saner ways. Assuming anyone does debugging of amiserial in the first place, that is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | change semantics of ldisc ->compat_ioctl()Al Viro2018-10-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First of all, make it return int. Returning long when native method had never allowed that is ridiculous and inconvenient. More importantly, change the caller; if ldisc ->compat_ioctl() is NULL or returns -ENOIOCTLCMD, tty_compat_ioctl() will try to feed cmd and compat_ptr(arg) to ldisc's native ->ioctl(). That simplifies ->compat_ioctl() instances quite a bit - they only need to deal with ioctls that are neither generic tty ones (those would get shunted off to tty_ioctl()) nor simple compat pointer ones. Note that something like TCFLSH won't reach ->compat_ioctl(), even if ldisc ->ioctl() does handle it - it will be recognized earlier and passed to tty_ioctl() (and ultimately - ldisc ->ioctl()). For many ldiscs it means that NULL ->compat_ioctl() does the right thing. Those where it won't serve (see e.g. n_r3964.c) are also easily dealt with - we need to handle the numeric-argument ioctls (calling the native instance) and, if such would exist, the ioctls that need layout conversion, etc. All in-tree ldiscs dealt with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | rfcomm: get rid of mentioning TIOC[SG]SERIALAl Viro2018-10-131-8/+0
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | no support there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | tcp: add tcp_reset_xmit_timer() helperEric Dumazet2018-10-232-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With EDT model, SRTT no longer is inflated by pacing delays. This means that RTO and some other xmit timers might be setup incorrectly. This is particularly visible with either : - Very small enforced pacing rates (SO_MAX_PACING_RATE) - Reduced rto (from the default 200 ms) This can lead to TCP flows aborts in the worst case, or spurious retransmits in other cases. For example, this session gets far more throughput than the requested 80kbit : $ netperf -H 127.0.0.2 -l 100 -- -q 10000 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 104.00 2.66 With the fix : $ netperf -H 127.0.0.2 -l 100 -- -q 10000 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 540000 262144 262144 104.00 0.12 EDT allows for better control of rtx timers, since TCP has a better idea of the earliest departure time of each skb in the rtx queue. We only have to eventually add to the timer the difference of the EDT time with current time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Revert "net: simplify sock_poll_wait"Karsten Graul2018-10-2311-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit dd979b4df817e9976f18fb6f9d134d6bc4a3c317. This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an internal TCP socket for the initial handshake with the remote peer. Whenever the SMC connection can not be established this TCP socket is used as a fallback. All socket operations on the SMC socket are then forwarded to the TCP socket. In case of poll, the file->private_data pointer references the SMC socket because the TCP socket has no file assigned. This causes tcp_poll to wait on the wrong socket. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2018-10-227-27/+107
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) rbtree lookup from control plane returns the left-hand side element of the range when the interval end flag is set on. 2) osf extension is not supported from the input path, reject this from the control plane, from Fernando Fernandez Mancera. 3) xt_TEE is leaving output interface unset due to a recent incorrect netns rework, from Taehee Yoo. 4) xt_TEE allows to select an interface which does not belong to this netnamespace, from Taehee Yoo. 5) Zero private extension area in nft_compat, just like we do in x_tables, otherwise we leak kernel memory to userspace. 6) Missing .checkentry and .destroy entries in new DNAT extensions breaks it since we never load nf_conntrack dependencies, from Paolo Abeni. 7) Do not remove flowtable hook from netns exit path, the netdevice handler already deals with this, also from Taehee Yoo. 8) Only cleanup flowtable entries that reside in this netnamespace, also from Taehee Yoo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | netfilter: nf_flow_table: do not remove offload when other netns's interface ↵Taehee Yoo2018-10-191-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is down When interface is down, offload cleanup function(nf_flow_table_do_cleanup) is called and that checks whether interface index of offload and index of link down interface is same. but only interface index checking is not enough because flowtable is not pernet list. So that, if other netns's interface that has index is same with offload is down, that offload will be removed. This patch adds netns checking code to the offload cleanup routine. Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit ↵Taehee Yoo2018-10-191-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine When device is unregistered, flowtable flush routine is called by notifier_call(nf_tables_flowtable_event). and exit callback of nftables pernet_operation(nf_tables_exit_net) also has flowtable flush routine. but when network namespace is destroyed, both notifier_call and pernet_operation are called. hence flowtable flush routine in pernet_operation is unnecessary. test commands: %ip netns add vm1 %ip netns exec vm1 nft add table ip filter %ip netns exec vm1 nft add flowtable ip filter w \ { hook ingress priority 0\; devices = { lo }\; } %ip netns del vm1 splat looks like: [ 265.187019] WARNING: CPU: 0 PID: 87 at net/netfilter/core.c:309 nf_hook_entry_head+0xc7/0xf0 [ 265.187112] Modules linked in: nf_flow_table_ipv4 nf_flow_table nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip_tables x_tables [ 265.187390] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc3+ #5 [ 265.187453] Workqueue: netns cleanup_net [ 265.187514] RIP: 0010:nf_hook_entry_head+0xc7/0xf0 [ 265.187546] Code: 8d 81 68 03 00 00 5b c3 89 d0 83 fa 04 48 8d 84 c7 e8 11 00 00 76 81 0f 0b 31 c0 e9 78 ff ff ff 0f 0b 48 83 c4 08 31 c0 5b c3 <0f> 0b 31 c0 e9 65 ff ff ff 0f 0b 31 c0 e9 5c ff ff ff 48 89 0c 24 [ 265.187573] RSP: 0018:ffff88011546f098 EFLAGS: 00010246 [ 265.187624] RAX: ffffffff8d90e135 RBX: 1ffff10022a8de1c RCX: 0000000000000000 [ 265.187645] RDX: 0000000000000000 RSI: 0000000000000005 RDI: ffff880116298040 [ 265.187645] RBP: ffff88010ea4c1a8 R08: 0000000000000000 R09: 0000000000000000 [ 265.187645] R10: ffff88011546f1d8 R11: ffffed0022c532c1 R12: ffff88010ea4c1d0 [ 265.187645] R13: 0000000000000005 R14: dffffc0000000000 R15: ffff88010ea4c1c4 [ 265.187645] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 [ 265.187645] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 265.187645] CR2: 00007fdfb8d00000 CR3: 0000000057a16000 CR4: 00000000001006f0 [ 265.187645] Call Trace: [ 265.187645] __nf_unregister_net_hook+0xca/0x5d0 [ 265.187645] ? nf_hook_entries_free.part.3+0x80/0x80 [ 265.187645] ? save_trace+0x300/0x300 [ 265.187645] nf_unregister_net_hooks+0x2e/0x40 [ 265.187645] nf_tables_exit_net+0x479/0x1340 [nf_tables] [ 265.187645] ? find_held_lock+0x39/0x1c0 [ 265.187645] ? nf_tables_abort+0x30/0x30 [nf_tables] [ 265.187645] ? inet_frag_destroy_rcu+0xd0/0xd0 [ 265.187645] ? trace_hardirqs_on+0x93/0x210 [ 265.187645] ? __bpf_trace_preemptirq_template+0x10/0x10 [ 265.187645] ? inet_frag_destroy_rcu+0xd0/0xd0 [ 265.187645] ? inet_frag_destroy_rcu+0xd0/0xd0 [ 265.187645] ? __mutex_unlock_slowpath+0x17f/0x740 [ 265.187645] ? wait_for_completion+0x710/0x710 [ 265.187645] ? bucket_table_free+0xb2/0x1f0 [ 265.187645] ? nested_table_free+0x130/0x130 [ 265.187645] ? __lock_is_held+0xb4/0x140 [ 265.187645] ops_exit_list.isra.10+0x94/0x140 [ 265.187645] cleanup_net+0x45b/0x900 [ ... ] This WARNING means that hook unregisteration is failed because all flowtables hooks are already unregistered by notifier_call. Network namespace exit routine guarantees that all devices will be unregistered first. then, other exit callbacks of pernet_operations are called. so that removing flowtable flush routine in exit callback of pernet_operation(nf_tables_exit_net) doesn't make flowtable leak. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: xt_nat: fix DNAT target for shifted portmap rangesPaolo Abeni2018-10-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 2eb0f624b709 ("netfilter: add NAT support for shifted portmap ranges") did not set the checkentry/destroy callbacks for the newly added DNAT target. As a result, rulesets using only such nat targets are not effective, as the relevant conntrack hooks are not enabled. The above affect also nft_compat rulesets. Fix the issue adding the missing initializers. Fixes: 2eb0f624b709 ("netfilter: add NAT support for shifted portmap ranges") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: nft_compat: do not dump private areaPablo Neira Ayuso2018-10-111-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zero pad private area, otherwise we expose private kernel pointer to userspace. This patch also zeroes the tail area after the ->matchsize and ->targetsize that results from XT_ALIGN(). Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: xt_TEE: add missing code to get interface index in checkentry.Taehee Yoo2018-10-111-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | checkentry(tee_tg_check) should initialize priv->oif from dev if possible. But only netdevice notifier handler can set that. Hence priv->oif is always -1 until notifier handler is called. Fixes: 9e2f6c5d78db ("netfilter: Rework xt_TEE netdevice notifier") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: xt_TEE: fix wrong interface selectionTaehee Yoo2018-10-111-17/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TEE netdevice notifier handler checks only interface name. however each netns can have same interface name. hence other netns's interface could be selected. test commands: %ip netns add vm1 %iptables -I INPUT -p icmp -j TEE --gateway 192.168.1.1 --oif enp2s0 %ip link set enp2s0 netns vm1 Above rule is in the root netns. but that rule could get enp2s0 ifindex of vm1 by notifier handler. After this patch, TEE rule is added to the per-netns list. Fixes: 9e2f6c5d78db ("netfilter: Rework xt_TEE netdevice notifier") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: nft_osf: usage from output path is not validFernando Fernandez Mancera2018-10-111-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nft_osf extension, like xt_osf, is not supported from the output path. Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | | netfilter: nft_set_rbtree: allow loose matching of closing element in intervalPablo Neira Ayuso2018-10-111-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow to find closest matching for the right side of an interval (end flag set on) so we allow lookups in inner ranges, eg. 10-20 in 5-25. Fixes: ba0e4d9917b4 ("netfilter: nf_tables: get set elements via netlink") Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | | llc: do not use sk_eat_skb()Eric Dumazet2018-10-221-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzkaller triggered a use-after-free [1], caused by a combination of skb_get() in llc_conn_state_process() and usage of sk_eat_skb() sk_eat_skb() is assuming the skb about to be freed is only used by the current thread. TCP/DCCP stacks enforce this because current thread holds the socket lock. llc_conn_state_process() wants to make sure skb does not disappear, and holds a reference on the skb it manipulates. But as soon as this skb is added to socket receive queue, another thread can consume it. This means that llc must use regular skb_unlink() and kfree_skb() so that both producer and consumer can safely work on the same skb. [1] BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: use-after-free in refcount_read include/linux/refcount.h:43 [inline] BUG: KASAN: use-after-free in skb_unref include/linux/skbuff.h:967 [inline] BUG: KASAN: use-after-free in kfree_skb+0xb7/0x580 net/core/skbuff.c:655 Read of size 4 at addr ffff8801d1f6fba4 by task ksoftirqd/1/18 CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.19.0-rc8+ #295 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b6 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] refcount_read include/linux/refcount.h:43 [inline] skb_unref include/linux/skbuff.h:967 [inline] kfree_skb+0xb7/0x580 net/core/skbuff.c:655 llc_sap_state_process+0x9b/0x550 net/llc/llc_sap.c:224 llc_sap_rcv+0x156/0x1f0 net/llc/llc_sap.c:297 llc_sap_handler+0x65e/0xf80 net/llc/llc_sap.c:438 llc_rcv+0x79e/0xe20 net/llc/llc_input.c:208 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023 process_backlog+0x218/0x6f0 net/core/dev.c:5829 napi_poll net/core/dev.c:6249 [inline] net_rx_action+0x7c5/0x1950 net/core/dev.c:6315 __do_softirq+0x30c/0xb03 kernel/softirq.c:292 run_ksoftirqd+0x94/0x100 kernel/softirq.c:653 smpboot_thread_fn+0x68b/0xa00 kernel/smpboot.c:164 kthread+0x35a/0x420 kernel/kthread.c:246 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 Allocated by task 18: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644 __alloc_skb+0x119/0x770 net/core/skbuff.c:193 alloc_skb include/linux/skbuff.h:995 [inline] llc_alloc_frame+0xbc/0x370 net/llc/llc_sap.c:54 llc_station_ac_send_xid_r net/llc/llc_station.c:52 [inline] llc_station_rcv+0x1dc/0x1420 net/llc/llc_station.c:111 llc_rcv+0xc32/0xe20 net/llc/llc_input.c:220 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023 process_backlog+0x218/0x6f0 net/core/dev.c:5829 napi_poll net/core/dev.c:6249 [inline] net_rx_action+0x7c5/0x1950 net/core/dev.c:6315 __do_softirq+0x30c/0xb03 kernel/softirq.c:292 Freed by task 16383: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x83/0x290 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb+0x1d/0x20 net/core/skbuff.c:642 sk_eat_skb include/net/sock.h:2366 [inline] llc_ui_recvmsg+0xec2/0x1610 net/llc/af_llc.c:882 sock_recvmsg_nosec net/socket.c:794 [inline] sock_recvmsg+0xd0/0x110 net/socket.c:801 ___sys_recvmsg+0x2b6/0x680 net/socket.c:2278 __sys_recvmmsg+0x303/0xb90 net/socket.c:2390 do_sys_recvmmsg+0x181/0x1a0 net/socket.c:2466 __do_sys_recvmmsg net/socket.c:2484 [inline] __se_sys_recvmmsg net/socket.c:2480 [inline] __x64_sys_recvmmsg+0xbe/0x150 net/socket.c:2480 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801d1f6fac0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 228 bytes inside of 232-byte region [ffff8801d1f6fac0, ffff8801d1f6fba8) The buggy address belongs to the page: page:ffffea000747dbc0 count:1 mapcount:0 mapping:ffff8801d9be7680 index:0xffff8801d1f6fe80 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0007346e88 ffffea000705b108 ffff8801d9be7680 raw: ffff8801d1f6fe80 ffff8801d1f6f0c0 000000010000000b 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801d1f6fa80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8801d1f6fb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8801d1f6fb80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc ^ ffff8801d1f6fc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801d1f6fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: dsa: legacy: simplify getting .driver_dataWolfram Sang2018-10-221-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should get 'driver_data' from 'struct device' directly. Going via platform_device is an unneeded step back and forth. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net/sched: act_police: disallow 'goto chain' on fallback control actionDavide Caratti2018-10-221-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the following command: # tc action add action police rate <r> burst <b> conform-exceed <c1>/<c2> 'goto chain x' is allowed only for c1: setting it for c2 makes the kernel crash with NULL pointer dereference, since TC core doesn't initialize the chain handle. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net/sched: act_gact: disallow 'goto chain' on fallback control actionDavide Caratti2018-10-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in the following command: # tc action add action <c1> random <rand_type> <c2> <rand_param> 'goto chain x' is allowed only for c1: setting it for c2 makes the kernel crash with NULL pointer dereference, since TC core doesn't initialize the chain handle. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: bpfilter: Set user mode helper's command lineOlivier Brunel2018-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Olivier Brunel <jjk@jjacky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net/ipv6: Add support for dumping addresses for a specific deviceDavid Ahern2018-10-221-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an RTM_GETADDR dump request has ifa_index set in the ifaddrmsg header, then return only the addresses for that device. Since inet6_dump_addr is reused for multicast and anycast addresses, this adds support for device specfic dumps of RTM_GETMULTICAST and RTM_GETANYCAST as well. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net/ipv4: Add support for dumping addresses for a specific deviceDavid Ahern2018-10-221-5/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an RTM_GETADDR dump request has ifa_index set in the ifaddrmsg header, then return only the addresses for that device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net/ipv6: Remove ip_idx arg to in6_dump_addrsDavid Ahern2018-10-221-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_idx is always 0 going into in6_dump_addrs; it is passed as a pointer to save the last good index into cb. Since cb is already argument to in6_dump_addrs, just save the value there. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net/ipv4: Move loop over addresses on a device into in_dev_dump_addrDavid Ahern2018-10-221-15/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to IPv6 move the logic that walks over the ipv4 address list for a device into a helper. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tipc: eliminate message disordering during binding table updateJon Maloy2018-10-223-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have seen the following race scenario: 1) named_distribute() builds a "bulk" message, containing a PUBLISH item for a certain publication. This is based on the contents of the binding tables's 'cluster_scope' list. 2) tipc_named_withdraw() removes the same publication from the list, bulds a WITHDRAW message and distributes it to all cluster nodes. 3) tipc_named_node_up(), which was calling named_distribute(), sends out the bulk message built under 1) 4) The WITHDRAW message arrives at the just detected node, finds no corresponding publication, and is dropped. 5) The PUBLISH item arrives at the same node, is added to its binding table, and remains there forever. This arrival disordering was earlier taken care of by the backlog queue, originally added for a different purpose, which was removed in the commit referred to below, but we now need a different solution. In this commit, we replace the rcu lock protecting the 'cluster_scope' list with a regular RW lock which comprises even the sending of the bulk message. This both guarantees both the list integrity and the message sending order. We will later add a commit which cleans up this code further. Note that this commit needs recently added commit d3092b2efca1 ("tipc: fix unsafe rcu locking when accessing publication list") to apply cleanly. Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table") Reported-by: Tuong Lien Tong <tuong.t.lien@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tipc: use destination length for copy stringGuoqing Jiang2018-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Got below warning with gcc 8.2 compiler. net/tipc/topsrv.c: In function ‘tipc_topsrv_start’: net/tipc/topsrv.c:660:2: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=] strncpy(srv->name, name, strlen(name) + 1); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/tipc/topsrv.c:660:27: note: length computed here strncpy(srv->name, name, strlen(name) + 1); ^~~~~~~~~~~~ So change it to correct length and use strscpy. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2018-10-216-23/+225
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Implement two new kind of BPF maps, that is, queue and stack map along with new peek, push and pop operations, from Mauricio. 2) Add support for MSG_PEEK flag when redirecting into an ingress psock sk_msg queue, and add a new helper bpf_msg_push_data() for insert data into the message, from John. 3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use direct packet access for __skb_buff, from Song. 4) Use more lightweight barriers for walking perf ring buffer for libbpf and perf tool as well. Also, various fixes and improvements from verifier side, from Daniel. 5) Add per-symbol visibility for DSO in libbpf and hide by default global symbols such as netlink related functions, from Andrey. 6) Two improvements to nfp's BPF offload to check vNIC capabilities in case prog is shared with multiple vNICs and to protect against mis-initializing atomic counters, from Jakub. 7) Fix for bpftool to use 4 context mode for the nfp disassembler, also from Jakub. 8) Fix a return value comparison in test_libbpf.sh and add several bpftool improvements in bash completion, documentation of bpf fs restrictions and batch mode summary print, from Quentin. 9) Fix a file resource leak in BPF selftest's load_kallsyms() helper, from Peng. 10) Fix an unused variable warning in map_lookup_and_delete_elem(), from Alexei. 11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc, from Nicolas. 12) Add missing executables to .gitignore in BPF selftests, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ulp: remove uid and user_visible membersDaniel Borkmann2018-10-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They are not used anymore and therefore should be removed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | bpf: sk_msg program helper bpf_msg_push_dataJohn Fastabend2018-10-201-0/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows user to push data into a msg using sk_msg program types. The format is as follows, bpf_msg_push_data(msg, offset, len, flags) this will insert 'len' bytes at offset 'offset'. For example to prepend 10 bytes at the front of the message the user can, bpf_msg_push_data(msg, 0, 10, 0); This will invalidate data bounds so BPF user will have to then recheck data bounds after calling this. After this the msg size will have been updated and the user is free to write into the added bytes. We allow any offset/len as long as it is within the (data, data_end) range. However, a copy will be required if the ring is full and its possible for the helper to fail with ENOMEM or EINVAL errors which need to be handled by the BPF program. This can be used similar to XDP metadata to pass data between sk_msg layer and lower layers. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | | | bpf: skmsg, fix psock create on existing kcm/tls portJohn Fastabend2018-10-201-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before using the psock returned by sk_psock_get() when adding it to a sockmap we need to ensure it is actually a sockmap based psock. Previously we were only checking this after incrementing the reference counter which was an error. This resulted in a slab-out-of-bounds error when the psock was not actually a sockmap type. This moves the check up so the reference counter is only used if it is a sockmap psock. Eric reported the following KASAN BUG, BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387 CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 sk_psock_get include/linux/skmsg.h:379 [inline] sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178 sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669 sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738 map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | | | | bpf: add tests for direct packet access from CGROUP_SKBSong Liu2018-10-191-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tests are added to make sure CGROUP_SKB cannot access: tc_classid, data_meta, flow_keys and can read and write: mark, prority, and cb[0-4] and can read other fields. To make selftest with skb->sk work, a dummy sk is added in bpf_prog_test_run_skb(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
| * | | | | | bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKBSong Liu2018-10-191-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the skb. This patch enables direct access of skb for these programs. Two helper functions bpf_compute_and_save_data_end() and bpf_restore_data_end() are introduced. There are used in __cgroup_bpf_run_filter_skb(), to compute proper data_end for the BPF program, and restore original data afterwards. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>