summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-04-05527-6928/+4631
|\ | | | | | | | | | | | | | | | | Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge tag 'mm-compaction-5.1-rc4' of ↵Linus Torvalds2019-04-051-11/+18
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux Pull mm/compaction fixes from Mel Gorman: "The merge window for 5.1 introduced a number of compaction-related patches. with intermittent reports of corruption and functional issues. The bugs are due to sloopy checking of zone boundaries and a corner case where invalid indexes are used to access the free lists. Reports are not common but at least two users and 0-day have tripped over them. There is a chance that one of the syzbot reports are related but it has not been confirmed properly. The normal submission path is with Andrew but there have been some delays and I consider them urgent enough that they should be picked up before RC4 to avoid duplicate reports. All of these have been successfully tested on older RC windows. This will make this branch look like a rebase but in fact, they've simply been lifted again from Andrew's tree and placed on a fresh branch. I've no reason to believe that this has invalidated the testing given the lack of change in compaction and the nature of the fixes" * tag 'mm-compaction-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux: mm/compaction.c: abort search if isolation fails mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints
| | * mm/compaction.c: abort search if isolation failsQian Cai2019-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Running LTP oom01 in a tight loop or memory stress testing put the system in a low-memory situation could triggers random memory corruption like page flag corruption below due to in fast_isolate_freepages(), if isolation fails, next_search_order() does not abort the search immediately could lead to improper accesses. UBSAN: Undefined behaviour in ./include/linux/mm.h:1195:50 index 7 is out of range for type 'zone [5]' Call Trace: dump_stack+0x62/0x9a ubsan_epilogue+0xd/0x7f __ubsan_handle_out_of_bounds+0x14d/0x192 __isolate_free_page+0x52c/0x600 compaction_alloc+0x886/0x25f0 unmap_and_move+0x37/0x1e70 migrate_pages+0x2ca/0xb20 compact_zone+0x19cb/0x3620 kcompactd_do_work+0x2df/0x680 kcompactd+0x1d8/0x6c0 kthread+0x32c/0x3f0 ret_from_fork+0x35/0x40 ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:3124! invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI RIP: 0010:__isolate_free_page+0x464/0x600 RSP: 0000:ffff888b9e1af848 EFLAGS: 00010007 RAX: 0000000030000000 RBX: ffff888c39fcf0f8 RCX: 0000000000000000 RDX: 1ffff111873f9e25 RSI: 0000000000000004 RDI: ffffed1173c35ef6 RBP: ffff888b9e1af898 R08: fffffbfff4fc2461 R09: fffffbfff4fc2460 R10: fffffbfff4fc2460 R11: ffffffffa7e12303 R12: 0000000000000008 R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff888ba8e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc7abc00000 CR3: 0000000752416004 CR4: 00000000001606a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: compaction_alloc+0x886/0x25f0 unmap_and_move+0x37/0x1e70 migrate_pages+0x2ca/0xb20 compact_zone+0x19cb/0x3620 kcompactd_do_work+0x2df/0x680 kcompactd+0x1d8/0x6c0 kthread+0x32c/0x3f0 ret_from_fork+0x35/0x40 Link: http://lkml.kernel.org/r/20190320192648.52499-1-cai@lca.pw Fixes: dbe2d4e4f12e ("mm, compaction: round-robin the order while searching the free lists for a target") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
| | * mm/compaction.c: correct zone boundary handling when resetting pageblock ↵Mel Gorman2019-04-041-10/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skip hints Mikhail Gavrilo reported the following bug being triggered in a Fedora kernel based on 5.1-rc1 but it is relevant to a vanilla kernel. kernel: page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) kernel: ------------[ cut here ]------------ kernel: kernel BUG at include/linux/mm.h:1021! kernel: invalid opcode: 0000 [#1] SMP NOPTI kernel: CPU: 6 PID: 116 Comm: kswapd0 Tainted: G C 5.1.0-0.rc1.git1.3.fc31.x86_64 #1 kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-I GAMING, BIOS 1201 12/07/2018 kernel: RIP: 0010:__reset_isolation_pfn+0x244/0x2b0 kernel: Code: fe 06 e8 0f 8e fc ff 44 0f b6 4c 24 04 48 85 c0 0f 85 dc fe ff ff e9 68 fe ff ff 48 c7 c6 58 b7 2e 8c 4c 89 ff e8 0c 75 00 00 <0f> 0b 48 c7 c6 58 b7 2e 8c e8 fe 74 00 00 0f 0b 48 89 fa 41 b8 01 kernel: RSP: 0018:ffff9e2d03f0fde8 EFLAGS: 00010246 kernel: RAX: 0000000000000034 RBX: 000000000081f380 RCX: ffff8cffbddd6c20 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8cffbddd6c20 kernel: RBP: 0000000000000001 R08: 0000009898b94613 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000100000 kernel: R13: 0000000000100000 R14: 0000000000000001 R15: ffffca7de07ce000 kernel: FS: 0000000000000000(0000) GS:ffff8cffbdc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007fc1670e9000 CR3: 00000007f5276000 CR4: 00000000003406e0 kernel: Call Trace: kernel: __reset_isolation_suitable+0x62/0x120 kernel: reset_isolation_suitable+0x3b/0x40 kernel: kswapd+0x147/0x540 kernel: ? finish_wait+0x90/0x90 kernel: kthread+0x108/0x140 kernel: ? balance_pgdat+0x560/0x560 kernel: ? kthread_park+0x90/0x90 kernel: ret_from_fork+0x27/0x50 He bisected it down to e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints"). The problem is that the patch in question was sloppy with respect to the handling of zone boundaries. In some instances, it was possible for PFNs outside of a zone to be examined and if those were not properly initialised or poisoned then it would trigger the VM_BUG_ON. This patch corrects the zone boundary issues when resetting pageblock skip hints and Mikhail reported that the bug did not trigger after 30 hours of testing. Link: http://lkml.kernel.org/r/20190327085424.GL3189@techsingularity.net Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Qian Cai <cai@lca.pw> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
| * | tty: mark Siemens R3964 line discipline as BROKENGreg Kroah-Hartman2019-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The n_r3964 line discipline driver was written in a different time, when SMP machines were rare, and users were trusted to do the right thing. Since then, the world has moved on but not this code, it has stayed rooted in the past with its lovely hand-crafted list structures and loads of "interesting" race conditions all over the place. After attempting to clean up most of the issues, I just gave up and am now marking the driver as BROKEN so that hopefully someone who has this hardware will show up out of the woodwork (I know you are out there!) and will help with debugging a raft of changes that I had laying around for the code, but was too afraid to commit as odds are they would break things. Many thanks to Jann and Linus for pointing out the initial problems in this codebase, as well as many reviews of my attempts to fix the issues. It was a case of whack-a-mole, and as you can see, the mole won. Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | Merge tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2019-04-0411-17/+44
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Pretty quiet week, just some amdgpu and i915 fixes. i915: - deadlock fix - gvt fixes amdgpu: - PCIE dpm feature fix - Powerplay fixes" * tag 'drm-fixes-2019-04-05' of git://anongit.freedesktop.org/drm/drm: drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplug drm/i915/gvt: Correct the calculation of plane size drm/amdgpu: remove unnecessary rlc reset function on gfx9 drm/i915: Always backoff after a drm_modeset_lock() deadlock drm/i915/gvt: do not let pin count of shadow mm go negative drm/i915/gvt: do not deliver a workload if its creation fails drm/amd/display: VBIOS can't be light up HDMI when restart system drm/amd/powerplay: fix possible hang with 3+ 4K monitors drm/amd/powerplay: correct data type to avoid overflow drm/amd/powerplay: add ECC feature bit drm/amd/amdgpu: fix PCIe dpm feature issue (v3)
| | * \ Merge tag 'drm-intel-fixes-2019-04-04' of ↵Dave Airlie2019-04-055-11/+11
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Only one fix for DSC (backoff after drm_modeset_lock deadlock) and GVT's fixes including vGPU display plane size calculation, shadow mm pin count, error recovery path for workload create and one kerneldoc fix. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190404161116.GA14522@intel.com
| | | * \ Merge tag 'gvt-fixes-2019-04-04' of https://github.com/intel/gvt-linux into ↵Rodrigo Vivi2019-04-034-10/+7
| | | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drm-intel-fixes gvt-fixes-2019-04-04 - Fix shadow mm pin count (Yan) - Fix cmd parser error path recover (Yan) - Fix vGPU display plane size calculation (Xiong) - Fix kerneldoc (Chris) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190404003957.GB8327@zhen-hp.sh.intel.com
| | | | * | drm/i915/gvt: Fix kerneldoc typo for intel_vgpu_emulate_hotplugChris Wilson2019-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/gpu/drm/i915/gvt/display.c:457: warning: Function parameter or member 'connected' not described in 'intel_vgpu_emulate_hotplug' drivers/gpu/drm/i915/gvt/display.c:457: warning: Excess function parameter 'conncted' description in 'intel_vgpu_emulate_hotplug' Fixes: 1ca20f33df42 ("drm/i915/gvt: add hotplug emulation") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Hang Yuan <hang.yuan@linux.intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
| | | | * | drm/i915/gvt: Correct the calculation of plane sizeXiong Zhang2019-04-041-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | stride isn't in unit of pixel, it is bytes, so calculation of plane size doesn't need to multiple bpp. Fixes: e546e281d33d ("drm/i915/gvt: Dmabuf support for GVT-g") Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
| | | | * | drm/i915/gvt: do not let pin count of shadow mm go negativeYan Zhao2019-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shadow mm's pin count got increased in workload preparation phase, which is after workload scanning. it will get decreased in complete_current_workload() anyway after workload completion. Sometimes, if a workload meets a scanning error, its shadow mm pin count will not get increased but will get decreased in the end. This patch lets shadow mm's pin count not go below 0. Fixes: 2707e4446688 ("drm/i915/gvt: vGPU graphics memory virtualization") Cc: zhenyuw@linux.intel.com Cc: stable@vger.kernel.org #4.14+ Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
| | | | * | drm/i915/gvt: do not deliver a workload if its creation failsYan Zhao2019-03-291-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in workload creation routine, if any failure occurs, do not queue this workload for delivery. if this failure is fatal, enter into failsafe mode. Fixes: 6d76303553ba ("drm/i915/gvt: Move common vGPU workload creation into scheduler.c") Cc: stable@vger.kernel.org #4.19+ Cc: zhenyuw@linux.intel.com Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
| | | * | | drm/i915: Always backoff after a drm_modeset_lock() deadlockChris Wilson2019-04-011-1/+4
| | | | |/ | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If drm_modeset_lock() reports a deadlock it sets the ctx->contexted field and insists that the caller calls drm_modeset_backoff() or else it generates a WARN on cleanup. <4> [1601.870376] WARNING: CPU: 3 PID: 8445 at drivers/gpu/drm/drm_modeset_lock.c:228 drm_modeset_drop_locks+0x35/0x40 <4> [1601.870395] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal i915 coretemp crct10dif_pclmul <6> [1601.870403] Console: switching <4> [1601.870403] snd_hda_intel <4> [1601.870406] to colour frame buffer device 320x90 <4> [1601.870406] crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel e1000e snd_hda_core cdc_ether ptp usbnet mii pps_core snd_pcm i2c_i801 mei_me mei prime_numbers <4> [1601.870422] CPU: 3 PID: 8445 Comm: cat Tainted: G U 5.0.0-rc7-CI-CI_DRM_5650+ #1 <4> [1601.870424] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.2402.AD3.1810170014 10/17/2018 <4> [1601.870427] RIP: 0010:drm_modeset_drop_locks+0x35/0x40 <4> [1601.870430] Code: 29 48 8b 43 60 48 8d 6b 60 48 39 c5 74 19 48 8b 43 60 48 8d b8 70 ff ff ff e8 87 ff ff ff 48 8b 43 60 48 39 c5 75 e7 5b 5d c3 <0f> 0b eb d3 0f 1f 80 00 00 00 00 41 56 41 55 41 54 55 53 48 8b 6f <4> [1601.870432] RSP: 0018:ffffc90000d67ce8 EFLAGS: 00010282 <4> [1601.870435] RAX: 00000000ffffffdd RBX: ffffc90000d67d00 RCX: 5dbbe23d00000000 <4> [1601.870437] RDX: 0000000000000000 RSI: 0000000093e6194a RDI: ffffc90000d67d00 <4> [1601.870439] RBP: ffff88849e62e678 R08: 0000000003b7329a R09: 0000000000000001 <4> [1601.870441] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888492100410 <4> [1601.870442] R13: ffff88849ea50958 R14: ffff8884a67eb028 R15: ffff8884a67eb028 <4> [1601.870445] FS: 00007fa7a27745c0(0000) GS:ffff8884aff80000(0000) knlGS:0000000000000000 <4> [1601.870447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [1601.870449] CR2: 000055af07e66000 CR3: 00000004a8cc2006 CR4: 0000000000760ee0 <4> [1601.870451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4> [1601.870453] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4> [1601.870454] PKRU: 55555554 <4> [1601.870456] Call Trace: <4> [1601.870505] i915_dsc_fec_support_show+0x91/0x190 [i915] <4> [1601.870522] seq_read+0xdb/0x3c0 <4> [1601.870531] full_proxy_read+0x51/0x80 <4> [1601.870538] __vfs_read+0x31/0x190 <4> [1601.870546] ? __se_sys_newfstat+0x3c/0x60 <4> [1601.870552] vfs_read+0x9e/0x150 <4> [1601.870557] ksys_read+0x50/0xc0 <4> [1601.870564] do_syscall_64+0x55/0x190 <4> [1601.870569] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [1601.870572] RIP: 0033:0x7fa7a226d081 <4> [1601.870574] Code: fe ff ff 48 8d 3d 67 9c 0a 00 48 83 ec 08 e8 a6 4c 02 00 66 0f 1f 44 00 00 48 8d 05 81 08 2e 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [1601.870576] RSP: 002b:00007ffcc05140c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 <4> [1601.870579] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fa7a226d081 <4> [1601.870581] RDX: 0000000000020000 RSI: 000055af07e63000 RDI: 0000000000000007 <4> [1601.870583] RBP: 0000000000020000 R08: 000000000000007b R09: 0000000000000000 <4> [1601.870585] R10: 000055af07e60010 R11: 0000000000000246 R12: 000055af07e63000 <4> [1601.870587] R13: 0000000000000007 R14: 000055af07e634bf R15: 0000000000020000 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109745 Fixes: e845f099f1c6 ("drm/i915/dsc: Add Per connector debugfs node for DSC support/enable") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Manasi Navare <manasi.d.navare@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190329165152.29259-1-chris@chris-wilson.co.uk (cherry picked from commit ee6df5694a9a2e30566ae05e9c145a0f6d5e087f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
| | * | | Merge branch 'drm-fixes-5.1' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie2019-04-056-6/+33
| | |\ \ \ | | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into drm-fixes Fixes for 5.1: - Fix for pcie dpm - Powerplay fixes for vega20 - Fix vbios display on reboot if driver display state is retained - Gfx9 resume robustness fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190404042939.3386-1-alexander.deucher@amd.com
| | | * | drm/amdgpu: remove unnecessary rlc reset function on gfx9Le Ma2019-04-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rlc reset function is not necessary during gfx9 initialization/resume phase. And this function would even cause rlc fw loading failed on some gfx9 ASIC. Remove this function safely with verification well on Vega/Raven platform. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | | * | drm/amd/display: VBIOS can't be light up HDMI when restart systemPaul Hsieh2019-03-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [Why] VBIOS will not post pixel rate > 340MHz. If driver set pixel rate > 340MHz and do restart bottom, VBIOS can't post HDMI monitor due to monitor is stay in HDMI2.0 state. [How] Program Scrambling_Enable and TMDS_Bit_Clock_Ratio when disable stream. Signed-off-by: Paul Hsieh <paul.hsieh@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Acked-by: Harry Wentland <Harry.Wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | | * | drm/amd/powerplay: fix possible hang with 3+ 4K monitorsEvan Quan2019-03-271-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If DAL requires to force MCLK high, the FCLK will be forced to high also. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | | * | drm/amd/powerplay: correct data type to avoid overflowEvan Quan2019-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid left shift overflow. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | | * | drm/amd/powerplay: add ECC feature bitEvan Quan2019-03-273-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's OK to have this feature bit with old SMU firmwares. But the feature should be disabled on them. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| | | * | drm/amd/amdgpu: fix PCIe dpm feature issue (v3)Chengming Gui2019-03-271-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | use pcie_bandwidth_available to get real link state to update pcie table. v2: fix incorrect initialized return value v3: expand the fetching method about the link width to all asics. Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2019-04-04122-563/+1096
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Several hash table refcount fixes in batman-adv, from Sven Eckelmann. 2) Use after free in bpf_evict_inode(), from Daniel Borkmann. 3) Fix mdio bus registration in ixgbe, from Ivan Vecera. 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni. 5) ila rhashtable corruption fix from Herbert Xu. 6) Don't allow upper-devices to be added to vrf devices, from Sabrina Dubroca. 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork. 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype, from Alexander Lobakin. 9) Missing IDR checks in mlx5 driver, from Aditya Pakki. 10) Fix false connection termination in ktls, from Jakub Kicinski. 11) Work around some ASPM issues with r8169 by disabling rx interrupt coalescing on certain chips. From Heiner Kallweit. 12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo Abeni. 13) Fully initialize sockaddr_in structures in SCTP, from Xin Long. 14) Various BPF flow dissector fixes from Stanislav Fomichev. 15) Divide by zero in act_sample, from Davide Caratti. 16) Fix bridging multicast regression introduced by rhashtable conversion, from Nikolay Aleksandrov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) ibmvnic: Fix completion structure initialization ipv6: sit: reset ip header pointer in ipip6_rcv net: bridge: always clear mcast matching struct on reports and leaves libcxgb: fix incorrect ppmax calculation vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x sch_cake: Make sure we can write the IP header before changing DSCP bits sch_cake: Use tc_skb_protocol() helper for getting packet protocol tcp: Ensure DCTCP reacts to losses net/sched: act_sample: fix divide by zero in the traffic path net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop net: hns: Fix sparse: some warnings in HNS drivers net: hns: Fix WARNING when remove HNS driver with SMMU enabled net: hns: fix ICMP6 neighbor solicitation messages discard problem net: hns: Fix probabilistic memory overwrite when HNS driver initialized net: hns: Use NAPI_POLL_WEIGHT for hns driver net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw() flow_dissector: rst'ify documentation ipv6: Fix dangling pointer when ipv6 fragment net-gro: Fix GRO flush when receiving a GSO packet. flow_dissector: document BPF flow dissector environment ...
| | * | | | ibmvnic: Fix completion structure initializationThomas Falcon2019-04-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix device initialization completion handling for vNIC adapters. Initialize the completion structure on probe and reinitialize when needed. This also fixes a race condition during kdump where the driver can attempt to access the completion struct before it is initialized: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000081acbe0 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1 NIP: c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900 REGS: c000000027f3f990 TRAP: 0300 Not tainted (4.18.0-64.el8.ppc64le) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288 XER: 00000006 CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1 GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58 GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4 GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28 GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100 GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006 GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000 GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003 GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58 NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100 LR [c0000000081ad964] complete+0x64/0xa0 Call Trace: [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable) [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0 [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic] [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic] [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0 [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400 [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0 [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210 [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24 [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130 [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120 Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | ipv6: sit: reset ip header pointer in ipip6_rcvLorenzo Bianconi2019-04-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipip6 tunnels run iptunnel_pull_header on received skbs. This can determine the following use-after-free accessing iph pointer since the packet will be 'uncloned' running pskb_expand_head if it is a cloned gso skb (e.g if the packet has been sent though a veth device) [ 706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit] [ 706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/= [ 706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016 [ 706.771839] Call trace: [ 706.801159] dump_backtrace+0x0/0x2f8 [ 706.845079] show_stack+0x24/0x30 [ 706.884833] dump_stack+0xe0/0x11c [ 706.925629] print_address_description+0x68/0x260 [ 706.982070] kasan_report+0x178/0x340 [ 707.025995] __asan_report_load1_noabort+0x30/0x40 [ 707.083481] ipip6_rcv+0x1678/0x16e0 [sit] [ 707.132623] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 707.185940] ip_local_deliver_finish+0x3b8/0x988 [ 707.241338] ip_local_deliver+0x144/0x470 [ 707.289436] ip_rcv_finish+0x43c/0x14b0 [ 707.335447] ip_rcv+0x628/0x1138 [ 707.374151] __netif_receive_skb_core+0x1670/0x2600 [ 707.432680] __netif_receive_skb+0x28/0x190 [ 707.482859] process_backlog+0x1d0/0x610 [ 707.529913] net_rx_action+0x37c/0xf68 [ 707.574882] __do_softirq+0x288/0x1018 [ 707.619852] run_ksoftirqd+0x70/0xa8 [ 707.662734] smpboot_thread_fn+0x3a4/0x9e8 [ 707.711875] kthread+0x2c8/0x350 [ 707.750583] ret_from_fork+0x10/0x18 [ 707.811302] Allocated by task 16982: [ 707.854182] kasan_kmalloc.part.1+0x40/0x108 [ 707.905405] kasan_kmalloc+0xb4/0xc8 [ 707.948291] kasan_slab_alloc+0x14/0x20 [ 707.994309] __kmalloc_node_track_caller+0x158/0x5e0 [ 708.053902] __kmalloc_reserve.isra.8+0x54/0xe0 [ 708.108280] __alloc_skb+0xd8/0x400 [ 708.150139] sk_stream_alloc_skb+0xa4/0x638 [ 708.200346] tcp_sendmsg_locked+0x818/0x2b90 [ 708.251581] tcp_sendmsg+0x40/0x60 [ 708.292376] inet_sendmsg+0xf0/0x520 [ 708.335259] sock_sendmsg+0xac/0xf8 [ 708.377096] sock_write_iter+0x1c0/0x2c0 [ 708.424154] new_sync_write+0x358/0x4a8 [ 708.470162] __vfs_write+0xc4/0xf8 [ 708.510950] vfs_write+0x12c/0x3d0 [ 708.551739] ksys_write+0xcc/0x178 [ 708.592533] __arm64_sys_write+0x70/0xa0 [ 708.639593] el0_svc_handler+0x13c/0x298 [ 708.686646] el0_svc+0x8/0xc [ 708.739019] Freed by task 17: [ 708.774597] __kasan_slab_free+0x114/0x228 [ 708.823736] kasan_slab_free+0x10/0x18 [ 708.868703] kfree+0x100/0x3d8 [ 708.905320] skb_free_head+0x7c/0x98 [ 708.948204] skb_release_data+0x320/0x490 [ 708.996301] pskb_expand_head+0x60c/0x970 [ 709.044399] __iptunnel_pull_header+0x3b8/0x5d0 [ 709.098770] ipip6_rcv+0x41c/0x16e0 [sit] [ 709.146873] tunnel64_rcv+0xd4/0x200 [tunnel4] [ 709.200195] ip_local_deliver_finish+0x3b8/0x988 [ 709.255596] ip_local_deliver+0x144/0x470 [ 709.303692] ip_rcv_finish+0x43c/0x14b0 [ 709.349705] ip_rcv+0x628/0x1138 [ 709.388413] __netif_receive_skb_core+0x1670/0x2600 [ 709.446943] __netif_receive_skb+0x28/0x190 [ 709.497120] process_backlog+0x1d0/0x610 [ 709.544169] net_rx_action+0x37c/0xf68 [ 709.589131] __do_softirq+0x288/0x1018 [ 709.651938] The buggy address belongs to the object at ffffe01b6bd85580 which belongs to the cache kmalloc-1024 of size 1024 [ 709.804356] The buggy address is located 117 bytes inside of 1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980) [ 709.946340] The buggy address belongs to the page: [ 710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0 [ 710.099914] flags: 0xfffff8000000100(slab) [ 710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600 [ 710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000 [ 710.334966] page dumped because: kasan: bad access detected Fix it resetting iph pointer after iptunnel_pull_header Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Tested-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | net: bridge: always clear mcast matching struct on reports and leavesNikolay Aleksandrov2019-04-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to be careful and always zero the whole br_ip struct when it is used for matching since the rhashtable change. This patch fixes all the places which didn't properly clear it which in turn might've caused mismatches. Thanks for the great bug report with reproducing steps and bisection. Steps to reproduce (from the bug report): ip link add br0 type bridge mcast_querier 1 ip link set br0 up ip link add v2 type veth peer name v3 ip link set v2 master br0 ip link set v2 up ip link set v3 up ip addr add 3.0.0.2/24 dev v3 ip netns add test ip link add v1 type veth peer name v1 netns test ip link set v1 master br0 ip link set v1 up ip -n test link set v1 up ip -n test addr add 3.0.0.1/24 dev v1 # Multicast receiver ip netns exec test socat UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork - # Multicast sender echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588 Reported-by: liam.mcbirnie@boeing.com Fixes: 19e3a9c90c53 ("net: bridge: convert multicast to generic rhashtable") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | libcxgb: fix incorrect ppmax calculationVarun Prakash2019-04-041-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BITS_TO_LONGS() uses DIV_ROUND_UP() because of this ppmax value can be greater than available per cpu page pods. This patch removes BITS_TO_LONGS() to fix this issue. Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2xChris Leech2019-04-041-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Way back in 3c9c36bcedd426f2be2826da43e5163de61735f7 the ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x driver and used by bnx2fc without including the generic software fcoe module. But, FCoE is generally used over an 802.1q VLAN, and the implementation of ndo_fcoe_get_wwn in the 8021q module was not similarly changed. The result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a call to ndo_fcoe_get_wwn through the 8021q interface to the underlying bnx2x interface. The bnx2fc driver then falls back to a potentially different mapping of Ethernet MAC to Fibre Channel WWN, creating an incompatibility with the fabric and target configurations when compared to the WWNs used by pre-boot firmware and differently-configured kernels. So make the conditional inclusion of FCoE code in 8021q match the conditional inclusion in netdevice.h Signed-off-by: Chris Leech <cleech@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-04-046-26/+208
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-04-04 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Batch of fixes to the existing BPF flow dissector API to support calling BPF programs from the eth_get_headlen context (support for latter is planned to be added in bpf-next), from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | flow_dissector: rst'ify documentationStanislav Fomichev2019-04-043-115/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename bpf_flow_dissector.txt to bpf_flow_dissector.rst and fix formatting. Also, link it from the Documentation/networking/index.rst. Tested with 'make htmldocs' to make sure it looks reasonable. Fixes: ae82899bbe92 ("flow_dissector: document BPF flow dissector environment") Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | * | | | Merge branch 'bpf-flow-dissector-fixes'Daniel Borkmann2019-04-035-26/+196
| | | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stanislav Fomichev says: ==================== This patch series fixes the existing BPF flow dissector API to support calling BPF progs from the eth_get_headlen context (the support itself will be added in bpf-next tree). The summary of the changes: * fix VLAN handling in bpf_flow.c, we don't need to peek back and look at skb->vlan_present; add selftests * pass and use flow_keys->n_proto instead of skb->protocol * fix clamping of flow_keys->nhoff for packets with nhoff > 0 * prohibit access to most of the __sk_buff fields from BPF flow dissector progs; only data/data_end/flow_keys are allowed (all input is now passed via flow_keys) * finally, document BPF flow dissector program environment ==================== Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Petar Penkov <peterpenkov96@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | | * | | | flow_dissector: document BPF flow dissector environmentStanislav Fomichev2019-04-031-0/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Short doc on what BPF flow dissector should expect in the input __sk_buff and flow_keys. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | | * | | | flow_dissector: allow access only to a subset of __sk_buff fieldsStanislav Fomichev2019-04-031-13/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use whitelist instead of a blacklist and allow only a small set of fields that might be relevant in the context of flow dissector: * data * data_end * flow_keys This is required for the eth_get_headlen case where we have only a chunk of data to dissect (i.e. trying to read the other skb fields doesn't make sense). Note, that it is a breaking API change! However, we've provided flow_keys->n_proto as a substitute for skb->protocol; and there is no need to manually handle skb->vlan_present. So even if we break somebody, the migration is trivial. Unfortunately, we can't support eth_get_headlen use-case without those breaking changes. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | | * | | | flow_dissector: fix clamping of BPF flow_keys for non-zero nhoffStanislav Fomichev2019-04-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't allow BPF program to set flow_keys->nhoff to less than initial value. We currently don't read the value afterwards in anything but the tests, but it's still a good practice to return consistent values to the test programs. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | | * | | | net/flow_dissector: pass flow_keys->n_proto to BPF programsStanislav Fomichev2019-04-032-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a preparation for the next commit that would prohibit access to the most fields of __sk_buff from the BPF programs. Instead of requiring BPF flow dissector programs to look into skb, pass all input data in the flow_keys. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | | | * | | | selftests/bpf: fix vlan handling in flow dissector programStanislav Fomichev2019-04-032-11/+72
| | | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use flow_keys->n_proto, that we set in parse_eth_proto instead and properly increment nhoff as well. Also, always use skb->protocol and don't look at skb->vlan_present. skb->vlan_present indicates that vlan information is stored out-of-band in skb->vlan_{tci,proto} and vlan header is already pulled from skb. That means, skb->vlan_present == true is not relevant for BPF flow dissector. Add simple test cases with VLAN tagged frames: * single vlan for ipv4 * double vlan for ipv6 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| | * | | | | Merge branch 'sch_cake-fixes'David S. Miller2019-04-041-1/+12
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Toke Høiland-Jørgensen says: ==================== sched: A few small fixes for sch_cake Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP diffserv fields. This series fixes those two issues, and should probably go to in 4.19 as well. However, the previous refactoring patch means they don't apply as-is; I can send a follow-up directly to stable if that's OK with you? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | sch_cake: Make sure we can write the IP header before changing DSCP bitsToke Høiland-Jørgensen2019-04-041-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is not actually any guarantee that the IP headers are valid before we access the DSCP bits of the packets. Fix this using the same approach taken in sch_dsmark. Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | sch_cake: Use tc_skb_protocol() helper for getting packet protocolToke Høiland-Jørgensen2019-04-041-1/+1
| | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We shouldn't be using skb->protocol directly as that will miss cases with hardware-accelerated VLAN tags. Use the helper instead to get the right protocol number. Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | tcp: Ensure DCTCP reacts to lossesKoen De Schepper2019-04-041-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP". Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values. This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed. The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison. | Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06% We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno. Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> Cc: Bob Briscoe <research@bobbriscoe.net> Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <borkmann@iogearbox.net> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | net/sched: act_sample: fix divide by zero in the traffic pathDavide Caratti2019-04-042-2/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the control path of 'sample' action does not validate the value of 'rate' provided by the user, but then it uses it as divisor in the traffic path. Validate it in tcf_sample_init(), and return -EINVAL with a proper extack message in case that value is zero, to fix a splat with the script below: # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2 # tc -s a s action sample total acts 1 action order 0: sample rate 1/0 group 1 pipe index 2 ref 1 bind 1 installed 19 sec used 19 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 # ping 192.0.2.1 -I test0 -c1 -q divide error: 0000 [#1] SMP PTI CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample] Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000 FS: 00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0 Call Trace: tcf_action_exec+0x7c/0x1c0 tcf_classify+0x57/0x160 __dev_queue_xmit+0x3dc/0xd10 ip_finish_output2+0x257/0x6d0 ip_output+0x75/0x280 ip_send_skb+0x15/0x40 raw_sendmsg+0xae3/0x1410 sock_sendmsg+0x36/0x40 __sys_sendto+0x10e/0x140 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [...] Kernel panic - not syncing: Fatal exception in interrupt Add a TDC selftest to document that 'rate' is now being validated. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Yotam Gigi <yotam.gi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stopLorenzo Bianconi2019-04-041-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a bpf program is uploaded, the driver computes the number of xdp tx queues resulting in the allocation of additional qsets. Starting from commit '2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them")' the driver runs link state polling for each VF resulting in the following NULL pointer dereference: [ 56.169256] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 56.178032] Mem abort info: [ 56.180834] ESR = 0x96000005 [ 56.183877] Exception class = DABT (current EL), IL = 32 bits [ 56.189792] SET = 0, FnV = 0 [ 56.192834] EA = 0, S1PTW = 0 [ 56.195963] Data abort info: [ 56.198831] ISV = 0, ISS = 0x00000005 [ 56.202662] CM = 0, WnR = 0 [ 56.205619] user pgtable: 64k pages, 48-bit VAs, pgdp = 0000000021f0c7a0 [ 56.212315] [0000000000000020] pgd=0000000000000000, pud=0000000000000000 [ 56.219094] Internal error: Oops: 96000005 [#1] SMP [ 56.260459] CPU: 39 PID: 2034 Comm: ip Not tainted 5.1.0-rc3+ #3 [ 56.266452] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018 [ 56.273315] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 56.278098] pc : __ll_sc___cmpxchg_case_acq_64+0x4/0x20 [ 56.283312] lr : mutex_lock+0x2c/0x50 [ 56.286962] sp : ffff0000219af1b0 [ 56.290264] x29: ffff0000219af1b0 x28: ffff800f64de49a0 [ 56.295565] x27: 0000000000000000 x26: 0000000000000015 [ 56.300865] x25: 0000000000000000 x24: 0000000000000000 [ 56.306165] x23: 0000000000000000 x22: ffff000011117000 [ 56.311465] x21: ffff800f64dfc080 x20: 0000000000000020 [ 56.316766] x19: 0000000000000020 x18: 0000000000000001 [ 56.322066] x17: 0000000000000000 x16: ffff800f2e077080 [ 56.327367] x15: 0000000000000004 x14: 0000000000000000 [ 56.332667] x13: ffff000010964438 x12: 0000000000000002 [ 56.337967] x11: 0000000000000000 x10: 0000000000000c70 [ 56.343268] x9 : ffff0000219af120 x8 : ffff800f2e077d50 [ 56.348568] x7 : 0000000000000027 x6 : 000000062a9d6a84 [ 56.353869] x5 : 0000000000000000 x4 : ffff800f2e077480 [ 56.359169] x3 : 0000000000000008 x2 : ffff800f2e077080 [ 56.364469] x1 : 0000000000000000 x0 : 0000000000000020 [ 56.369770] Process ip (pid: 2034, stack limit = 0x00000000c862da3a) [ 56.376110] Call trace: [ 56.378546] __ll_sc___cmpxchg_case_acq_64+0x4/0x20 [ 56.383414] drain_workqueue+0x34/0x198 [ 56.387247] nicvf_open+0x48/0x9e8 [nicvf] [ 56.391334] nicvf_open+0x898/0x9e8 [nicvf] [ 56.395507] nicvf_xdp+0x1bc/0x238 [nicvf] [ 56.399595] dev_xdp_install+0x68/0x90 [ 56.403333] dev_change_xdp_fd+0xc8/0x240 [ 56.407333] do_setlink+0x8e0/0xbe8 [ 56.410810] __rtnl_newlink+0x5b8/0x6d8 [ 56.414634] rtnl_newlink+0x54/0x80 [ 56.418112] rtnetlink_rcv_msg+0x22c/0x2f8 [ 56.422199] netlink_rcv_skb+0x60/0x120 [ 56.426023] rtnetlink_rcv+0x28/0x38 [ 56.429587] netlink_unicast+0x1c8/0x258 [ 56.433498] netlink_sendmsg+0x1b4/0x350 [ 56.437410] sock_sendmsg+0x4c/0x68 [ 56.440887] ___sys_sendmsg+0x240/0x280 [ 56.444711] __sys_sendmsg+0x68/0xb0 [ 56.448275] __arm64_sys_sendmsg+0x2c/0x38 [ 56.452361] el0_svc_handler+0x9c/0x128 [ 56.456186] el0_svc+0x8/0xc [ 56.459056] Code: 35ffff91 2a1003e0 d65f03c0 f9800011 (c85ffc10) [ 56.465166] ---[ end trace 4a57fdc27b0a572c ]--- [ 56.469772] Kernel panic - not syncing: Fatal exception Fix it by checking nicvf_rx_mode_wq pointer in nicvf_open and nicvf_stop Fixes: 2ecbe4f4a027 ("net: thunderx: replace global nicvf_rx_mode_wq work queue for all VFs to private for each of them") Fixes: 2c632ad8bc74 ("net: thunderx: move link state polling function to VF") Reported-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | Merge branch 'net-hns-bugfixes-for-HNS-Driver'David S. Miller2019-04-0414-58/+69
| | |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yonglong Liu says: ==================== net: hns: bugfixes for HNS Driver This patchset fix some bugs that were found in the test of various scenarios, or identify by KASAN/sparse. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | net: hns: Fix sparse: some warnings in HNS driversYonglong Liu2019-04-0411-43/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some sparse warnings in the HNS drivers: warning: incorrect type in assignment (different address spaces) expected void [noderef] <asn:2> *io_base got void *vaddr warning: cast removes address space '<asn:2>' of expression [...] Add __iomem and change all the u8 __iomem to void __iomem to fix these kind of warnings. warning: incorrect type in argument 1 (different address spaces) expected void [noderef] <asn:2> *base got unsigned char [usertype] *base_addr warning: cast to restricted __le16 warning: incorrect type in assignment (different base types) expected unsigned int [usertype] tbl_tcam_data_high got restricted __le32 [usertype] warning: cast to restricted __le32 [...] These variables used u32/u16 as their type, and finally as a parameter of writel(), writel() will do the cpu_to_le32 coversion so remove the little endian covert code to fix these kind of warnings. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | net: hns: Fix WARNING when remove HNS driver with SMMU enabledYonglong Liu2019-04-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When enable SMMU, remove HNS driver will cause a WARNING: [ 141.924177] WARNING: CPU: 36 PID: 2708 at drivers/iommu/dma-iommu.c:443 __iommu_dma_unmap+0xc0/0xc8 [ 141.954673] Modules linked in: hns_enet_drv(-) [ 141.963615] CPU: 36 PID: 2708 Comm: rmmod Tainted: G W 5.0.0-rc1-28723-gb729c57de95c-dirty #32 [ 141.983593] Hardware name: Huawei D05/D05, BIOS Hisilicon D05 UEFI Nemo 1.8 RC0 08/31/2017 [ 142.000244] pstate: 60000005 (nZCv daif -PAN -UAO) [ 142.009886] pc : __iommu_dma_unmap+0xc0/0xc8 [ 142.018476] lr : __iommu_dma_unmap+0xc0/0xc8 [ 142.027066] sp : ffff000013533b90 [ 142.033728] x29: ffff000013533b90 x28: ffff8013e6983600 [ 142.044420] x27: 0000000000000000 x26: 0000000000000000 [ 142.055113] x25: 0000000056000000 x24: 0000000000000015 [ 142.065806] x23: 0000000000000028 x22: ffff8013e66eee68 [ 142.076499] x21: ffff8013db919800 x20: 0000ffffefbff000 [ 142.087192] x19: 0000000000001000 x18: 0000000000000007 [ 142.097885] x17: 000000000000000e x16: 0000000000000001 [ 142.108578] x15: 0000000000000019 x14: 363139343a70616d [ 142.119270] x13: 6e75656761705f67 x12: 0000000000000000 [ 142.129963] x11: 00000000ffffffff x10: 0000000000000006 [ 142.140656] x9 : 1346c1aa88093500 x8 : ffff0000114de4e0 [ 142.151349] x7 : 6662666578303d72 x6 : ffff0000105ffec8 [ 142.162042] x5 : 0000000000000000 x4 : 0000000000000000 [ 142.172734] x3 : 00000000ffffffff x2 : ffff0000114de500 [ 142.183427] x1 : 0000000000000000 x0 : 0000000000000035 [ 142.194120] Call trace: [ 142.199030] __iommu_dma_unmap+0xc0/0xc8 [ 142.206920] iommu_dma_unmap_page+0x20/0x28 [ 142.215335] __iommu_unmap_page+0x40/0x60 [ 142.223399] hnae_unmap_buffer+0x110/0x134 [ 142.231639] hnae_free_desc+0x6c/0x10c [ 142.239177] hnae_fini_ring+0x14/0x34 [ 142.246540] hnae_fini_queue+0x2c/0x40 [ 142.254080] hnae_put_handle+0x38/0xcc [ 142.261619] hns_nic_dev_remove+0x54/0xfc [hns_enet_drv] [ 142.272312] platform_drv_remove+0x24/0x64 [ 142.280552] device_release_driver_internal+0x17c/0x20c [ 142.291070] driver_detach+0x4c/0x90 [ 142.298259] bus_remove_driver+0x5c/0xd8 [ 142.306148] driver_unregister+0x2c/0x54 [ 142.314037] platform_driver_unregister+0x10/0x18 [ 142.323505] hns_nic_dev_driver_exit+0x14/0xf0c [hns_enet_drv] [ 142.335248] __arm64_sys_delete_module+0x214/0x25c [ 142.344891] el0_svc_common+0xb0/0x10c [ 142.352430] el0_svc_handler+0x24/0x80 [ 142.359968] el0_svc+0x8/0x7c0 [ 142.366104] ---[ end trace 60ad1cd58e63c407 ]--- The tx ring buffer map when xmit and unmap when xmit done. So in hnae_init_ring() did not map tx ring buffer, but in hnae_fini_ring() have a unmap operation for tx ring buffer, which is already unmapped when xmit done, than cause this WARNING. The hnae_alloc_buffers() is called in hnae_init_ring(), so the hnae_free_buffers() should be in hnae_fini_ring(), not in hnae_free_desc(). In hnae_fini_ring(), adds a check is_rx_ring() as in hnae_init_ring(). When the ring buffer is tx ring, adds a piece of code to ensure that the tx ring is unmap. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | net: hns: fix ICMP6 neighbor solicitation messages discard problemYonglong Liu2019-04-041-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ICMP6 neighbor solicitation messages will be discard by the Hip06 chips, because of not setting forwarding pool. Enable promisc mode has the same problem. This patch fix the wrong forwarding table configs for the multicast vague matching when enable promisc mode, and add forwarding pool for the forwarding table. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | net: hns: Fix probabilistic memory overwrite when HNS driver initializedYonglong Liu2019-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reboot the system again and again, may cause a memory overwrite. [ 15.638922] systemd[1]: Reached target Swap. [ 15.667561] tun: Universal TUN/TAP device driver, 1.6 [ 15.676756] Bridge firewalling registered [ 17.344135] Unable to handle kernel paging request at virtual address 0000000200000040 [ 17.352179] Mem abort info: [ 17.355007] ESR = 0x96000004 [ 17.358105] Exception class = DABT (current EL), IL = 32 bits [ 17.364112] SET = 0, FnV = 0 [ 17.367209] EA = 0, S1PTW = 0 [ 17.370393] Data abort info: [ 17.373315] ISV = 0, ISS = 0x00000004 [ 17.377206] CM = 0, WnR = 0 [ 17.380214] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____) [ 17.386926] [0000000200000040] pgd=0000000000000000 [ 17.391878] Internal error: Oops: 96000004 [#1] SMP [ 17.396824] CPU: 23 PID: 95 Comm: kworker/u130:0 Tainted: G E 4.19.25-1.2.78.aarch64 #1 [ 17.414175] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.54 08/16/2018 [ 17.425615] Workqueue: events_unbound async_run_entry_fn [ 17.435151] pstate: 00000005 (nzcv daif -PAN -UAO) [ 17.444139] pc : __mutex_lock.isra.1+0x74/0x540 [ 17.453002] lr : __mutex_lock.isra.1+0x3c/0x540 [ 17.461701] sp : ffff000100d9bb60 [ 17.469146] x29: ffff000100d9bb60 x28: 0000000000000000 [ 17.478547] x27: 0000000000000000 x26: ffff802fb8945000 [ 17.488063] x25: 0000000000000000 x24: ffff802fa32081a8 [ 17.497381] x23: 0000000000000002 x22: ffff801fa2b15220 [ 17.506701] x21: ffff000009809000 x20: ffff802fa23a0888 [ 17.515980] x19: ffff801fa2b15220 x18: 0000000000000000 [ 17.525272] x17: 0000000200000000 x16: 0000000200000000 [ 17.534511] x15: 0000000000000000 x14: 0000000000000000 [ 17.543652] x13: ffff000008d95db8 x12: 000000000000000d [ 17.552780] x11: ffff000008d95d90 x10: 0000000000000b00 [ 17.561819] x9 : ffff000100d9bb90 x8 : ffff802fb89d6560 [ 17.570829] x7 : 0000000000000004 x6 : 00000004a1801d05 [ 17.579839] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.588852] x3 : ffff802fb89d5a00 x2 : 0000000000000000 [ 17.597734] x1 : 0000000200000000 x0 : 0000000200000000 [ 17.606631] Process kworker/u130:0 (pid: 95, stack limit = 0x(____ptrval____)) [ 17.617438] Call trace: [ 17.623349] __mutex_lock.isra.1+0x74/0x540 [ 17.630927] __mutex_lock_slowpath+0x24/0x30 [ 17.638602] mutex_lock+0x50/0x60 [ 17.645295] drain_workqueue+0x34/0x198 [ 17.652623] __sas_drain_work+0x7c/0x168 [ 17.659903] sas_drain_work+0x60/0x68 [ 17.666947] hisi_sas_scan_finished+0x30/0x40 [hisi_sas_main] [ 17.676129] do_scsi_scan_host+0x70/0xb0 [ 17.683534] do_scan_async+0x20/0x228 [ 17.690586] async_run_entry_fn+0x4c/0x1d0 [ 17.697997] process_one_work+0x1b4/0x3f8 [ 17.705296] worker_thread+0x54/0x470 Every time the call trace is not the same, but the overwrite address is always the same: Unable to handle kernel paging request at virtual address 0000000200000040 The root cause is, when write the reg XGMAC_MAC_TX_LF_RF_CONTROL_REG, didn't use the io_base offset. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | net: hns: Use NAPI_POLL_WEIGHT for hns driverYonglong Liu2019-04-041-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the HNS driver loaded, always have an error print: "netif_napi_add() called with weight 256" This is because the kernel checks the NAPI polling weights requested by drivers and it prints an error message if a driver requests a weight bigger than 64. So use NAPI_POLL_WEIGHT to fix it. Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * | | | | net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()Liubin Shu2019-04-041-2/+3
| | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is trying to fix the issue due to: [27237.844750] BUG: KASAN: use-after-free in hns_nic_net_xmit_hw+0x708/0xa18[hns_enet_drv] After hnae_queue_xmit() in hns_nic_net_xmit_hw(), can be interrupted by interruptions, and than call hns_nic_tx_poll_one() to handle the new packets, and free the skb. So, when turn back to hns_nic_net_xmit_hw(), calling skb->len will cause use-after-free. This patch update tx ring statistics in hns_nic_tx_poll_one() to fix the bug. Signed-off-by: Liubin Shu <shuliubin@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | ipv6: Fix dangling pointer when ipv6 fragmentJunwei Hu2019-04-031-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the beginning of ip6_fragment func, the prevhdr pointer is obtained in the ip6_find_1stfragopt func. However, all the pointers pointing into skb header may change when calling skb_checksum_help func with skb->ip_summed = CHECKSUM_PARTIAL condition. The prevhdr pointe will be dangling if it is not reloaded after calling __skb_linearize func in skb_checksum_help func. Here, I add a variable, nexthdr_offset, to evaluate the offset, which does not changes even after calling __skb_linearize func. Fixes: 405c92f7a541 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment") Signed-off-by: Junwei Hu <hujunwei4@huawei.com> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com> Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | net-gro: Fix GRO flush when receiving a GSO packet.Steffen Klassert2019-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we may merge incorrectly a received GSO packet or a packet with frag_list into a packet sitting in the gro_hash list. skb_segment() may crash case because the assumptions on the skb layout are not met. The correct behaviour would be to flush the packet in the gro_hash list and send the received GSO packet directly afterwards. Commit d61d072e87c8e ("net-gro: avoid reorders") sets NAPI_GRO_CB(skb)->flush in this case, but this is not checked before merging. This patch makes sure to check this flag and to not merge in that case. Fixes: d61d072e87c8e ("net-gro: avoid reorders") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | Merge branch '40GbE' of ↵David S. Miller2019-04-023-14/+33
| | |\ \ \ \ \ | | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Fixes 2019-04-01 This series contains two fixes for XDP in the i40e driver. Björn provides both fixes, first moving a function out of the header and into the main.c file. Second fixes a regression introduced in an earlier patch that removed umem from the VSI. This caused an issue because the setup code would try to enable AF_XDP zero copy unconditionally, as long as there was a umem placed in the netdev receive structure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>