summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Linux 2.6.32.70v2.6.32.70Willy Tarreau2016-01-291-1/+1
| | | | Signed-off-by: Willy Tarreau <w@1wt.eu>
* kvm: x86: only channel 0 of the i8254 is linked to the HPETPaolo Bonzini2016-01-292-1/+3
| | | | | | | | | | | | | | | | | commit e5e57e7a03b1cdcb98e4aed135def2a08cbf3257 upstream. While setting the KVM PIT counters in 'kvm_pit_load_count', if 'hpet_legacy_start' is set, the function disables the timer on channel[0], instead of the respective index 'channel'. This is because channels 1-3 are not linked to the HPET. Fix the caller to only activate the special HPET processing for channel 0. Reported-by: P J P <pjp@fedoraproject.org> Fixes: 0185604c2d82c560dab2f2933a18f797e74ab5a8 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit ef90cf3d0b59e3b1dcfe94d1a241107667e6e96a) Signed-off-by: Willy Tarreau <w@1wt.eu>
* KVM: x86: Reload pit counters for all channels when restoring stateAndrew Honig2016-01-291-3/+6
| | | | | | | | | | | | | | | | | | commit 0185604c2d82c560dab2f2933a18f797e74ab5a8 upstream. Currently if userspace restores the pit counters with a count of 0 on channels 1 or 2 and the guest attempts to read the count on those channels, then KVM will perform a mod of 0 and crash. This will ensure that 0 values are converted to 65536 as per the spec. This is CVE-2015-7513. Signed-off-by: Andy Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 08b8d1a6ccdefd3d517d04c472b7f42f51b3059b) Signed-off-by: Willy Tarreau <w@1wt.eu>
* mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone()Andrew Banman2016-01-291-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5f0f2887f4de9508dcf438deab28f1de8070c271 upstream. test_pages_in_a_zone() does not account for the possibility of missing sections in the given pfn range. pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing sections to pass the test, leading to a kernel oops. Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check for missing sections before proceeding into the zone-check code. This also prevents a crash from offlining memory devices with missing sections. Despite this, it may be a good idea to keep the related patch '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with missing sections' because missing sections in a memory block may lead to other problems not covered by the scope of this fix. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Alex Thorlton <athorlton@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 17f6a291c98199d7ce15a850ce5f548ceef628bc) Signed-off-by: Willy Tarreau <w@1wt.eu>
* mm: add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macroDaniel Kiper2016-01-291-0/+6
| | | | | | | | | | | | | | | | | commit a539f3533b78e39a22723d6d3e1e11b6c14454d9 upstream. Add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro which aligns given pfn to upper section and lower section boundary accordingly. Required for the latest memory hotplug support for the Xen balloon driver. Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: only needed for next patch] Signed-off-by: Willy Tarreau <w@1wt.eu>
* ipv6/addrlabel: fix ip6addrlbl_get()Andrey Ryabinin2016-01-291-1/+1
| | | | | | | | | | | | | | | | | | | commit e459dfeeb64008b2d23bdf600f03b3605dbb8152 upstream. ip6addrlbl_get() has never worked. If ip6addrlbl_hold() succeeded, ip6addrlbl_get() will exit with '-ESRCH'. If ip6addrlbl_hold() failed, ip6addrlbl_get() will use about to be free ip6addrlbl_entry pointer. Fix this by inverting ip6addrlbl_hold() check. Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Cong Wang <cwang@twopensource.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 39b214ba1a357359f9c0be6ef8d21f2e5187567a) Signed-off-by: Willy Tarreau <w@1wt.eu>
* parisc: Fix syscall restartsHelge Deller2016-01-291-12/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream. On parisc syscalls which are interrupted by signals sometimes failed to restart and instead returned -ENOSYS which in the worst case lead to userspace crashes. A similiar problem existed on MIPS and was fixed by commit e967ef02 ("MIPS: Fix restart of indirect syscalls"). On parisc the current syscall restart code assumes that all syscall callers load the syscall number in the delay slot of the ble instruction. That's how it is e.g. done in the unistd.h header file: ble 0x100(%sr2, %r0) ldi #syscall_nr, %r20 Because of that assumption the current code never restored %r20 before returning to userspace. This assumption is at least not true for code which uses the glibc syscall() function, which instead uses this syntax: ble 0x100(%sr2, %r0) copy regX, %r20 where regX depend on how the compiler optimizes the code and register usage. This patch fixes this problem by adding code to analyze how the syscall number is loaded in the delay branch and - if needed - copy the syscall number to regX prior returning to userspace for the syscall restart. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 9f2dcffefc599c424ad1dd402e3a96da60639308) Signed-off-by: Willy Tarreau <w@1wt.eu>
* MIPS: Fix restart of indirect syscallsEd Swierk2016-01-292-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e967ef022e00bb7c2e5b1a42007abfdd52055050 upstream. When 32-bit MIPS userspace invokes a syscall indirectly via syscall(number, arg1, ..., arg7), the kernel looks up the actual syscall based on the given number, shifts the other arguments to the left, and jumps to the syscall. If the syscall is interrupted by a signal and indicates it needs to be restarted by the kernel (by returning ERESTARTNOINTR for example), the syscall must be called directly, since the number is no longer the first argument, and the other arguments are now staged for a direct call. Before shifting the arguments, store the syscall number in pt_regs->regs[2]. This gets copied temporarily into pt_regs->regs[0] after the syscall returns. If the syscall needs to be restarted, handle_signal()/do_signal() copies the number back to pt_regs->reg[2], which ends up in $v0 once control returns to userspace. Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/8929/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 08f865bba9c705aef95268a33393698e5385587e) Signed-off-by: Willy Tarreau <w@1wt.eu>
* USB: fix invalid memory access in hub_activate()Alan Stern2016-01-291-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e50293ef9775c5f1cf3fcc093037dd6a8c5684ea upstream. Commit 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") changed the hub_activate() routine to make part of it run in a workqueue. However, the commit failed to take a reference to the usb_hub structure or to lock the hub interface while doing so. As a result, if a hub is plugged in and quickly unplugged before the work routine can run, the routine will try to access memory that has been deallocated. Or, if the hub is unplugged while the routine is running, the memory may be deallocated while it is in active use. This patch fixes the problem by taking a reference to the usb_hub at the start of hub_activate() and releasing it at the end (when the work is finished), and by locking the hub interface while the work routine is running. It also adds a check at the start of the routine to see if the hub has already been disconnected, in which nothing should be done. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Alexandru Cornea <alexandru.cornea@intel.com> Tested-by: Alexandru Cornea <alexandru.cornea@intel.com> Fixes: 8520f38099cc ("USB: change hub initialization sleeps to delayed_work") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: add prototype for hub_release() before first use] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 10037421b529bc1fc18994e94e37d745184c4ea9) [wt: made a few changes : - adjusted context due to some autopm code being added only in 2.6.33 - no device_{lock,unlock}() in 2.6.32, use up/down(&->sem) instead] Signed-off-by: Willy Tarreau <w@1wt.eu>
* USB: ipaq.c: fix a timeout loopDan Carpenter2016-01-291-2/+2
| | | | | | | | | | | | | | | commit abdc9a3b4bac97add99e1d77dc6d28623afe682b upstream. The code expects the loop to end with "retries" set to zero but, because it is a post-op, it will end set to -1. I have fixed this by moving the decrement inside the loop. Fixes: 014aa2a3c32e ('USB: ipaq: minor ipaq_open() cleanup.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 53a68d3f1629de82ddeb4e0882b0727fc230a6f3) Signed-off-by: Willy Tarreau <w@1wt.eu>
* s390/dis: Fix handling of format specifiersMichael Holzheu2016-01-291-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 272fa59ccb4fc802af28b1d699c2463db6a71bf7 upstream. The print_insn() function returns strings like "lghi %r1,0". To escape the '%' character in sprintf() a second '%' is used. For example "lghi %%r1,0" is converted into "lghi %r1,0". After print_insn() the output string is passed to printk(). Because format specifiers like "%r" or "%f" are ignored by printk() this works by chance most of the time. But for instructions with control registers like "lctl %c6,%c6,780" this fails because printk() interprets "%c" as character format specifier. Fix this problem and escape the '%' characters twice. For example "lctl %%%%c6,%%%%c6,780" is then converted by sprintf() into "lctl %%c6,%%c6,780" and by printk() into "lctl %c6,%c6,780". Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: drop the OPERAND_VR case] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 45f32e359d36dd045e4af0d1bea8bbbafd81eeb7) Signed-off-by: Willy Tarreau <w@1wt.eu>
* spi: fix parent-device reference leakJohan Hovold2016-01-291-1/+1
| | | | | | | | | | | | | | | | | | commit 157f38f993919b648187ba341bfb05d0e91ad2f6 upstream. Fix parent-device reference leak due to SPI-core taking an unnecessary reference to the parent when allocating the master structure, a reference that was never released. Note that driver core takes its own reference to the parent when the master device is registered. Fixes: 49dce689ad4e ("spi doesn't need class_device") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 40ddf30ca43b50019304303949c9b2676cfbd820) Signed-off-by: Willy Tarreau <w@1wt.eu>
* ser_gigaset: fix deallocation of platform device structureTilman Schmidt2016-01-291-3/+7
| | | | | | | | | | | | | | | | | | commit 4c5e354a974214dfb44cd23fa0429327693bc3ea upstream. When shutting down the device, the struct ser_cardstate must not be kfree()d immediately after the call to platform_device_unregister() since the embedded struct platform_device is still in use. Move the kfree() call to the release method instead. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Fixes: 2869b23e4b95 ("drivers/isdn/gigaset: new M101 driver (v2)") Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 521e4101aaf46b2f37088453a1e935604cdd7f2a) Signed-off-by: Willy Tarreau <w@1wt.eu>
* mISDN: fix a loop countDan Carpenter2016-01-291-4/+3
| | | | | | | | | | | | | | | | | | commit 40d24c4d8a7430aa4dfd7a665fa3faf3b05b673f upstream. There are two issue here. 1) cnt starts as maxloop + 1 so all these loops iterate one more time than intended. 2) At the end of the loop we test for "if (maxloop && !cnt)" but for the first two loops, we end with cnt equal to -1. Changing this to a pre-op means we end with cnt set to 0. Fixes: cae86d4a4e56 ('mISDN: Add driver for Infineon ISDN chipset family') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 7eb2a0151e7c8a95d1be33a923718fb690452c61) Signed-off-by: Willy Tarreau <w@1wt.eu>
* tty: Fix GPF in flush_to_ldisc()Peter Hurley2016-01-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9ce119f318ba1a07c29149301f1544b6c4bea52a upstream. A line discipline which does not define a receive_buf() method can can cause a GPF if data is ever received [1]. Oddly, this was known to the author of n_tracesink in 2011, but never fixed. [1] GPF report BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 3752d067 PUD 37a7b067 PMD 0 Oops: 0010 [#1] SMP KASAN Modules linked in: CPU: 2 PID: 148 Comm: kworker/u10:2 Not tainted 4.4.0-rc2+ #51 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events_unbound flush_to_ldisc task: ffff88006da94440 ti: ffff88006db60000 task.ti: ffff88006db60000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff88006db67b50 EFLAGS: 00010246 RAX: 0000000000000102 RBX: ffff88003ab32f88 RCX: 0000000000000102 RDX: 0000000000000000 RSI: ffff88003ab330a6 RDI: ffff88003aabd388 RBP: ffff88006db67c48 R08: ffff88003ab32f9c R09: ffff88003ab31fb0 R10: ffff88003ab32fa8 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff88006db67c20 R14: ffffffff863df820 R15: ffff88003ab31fb8 FS: 0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000037938000 CR4: 00000000000006e0 Stack: ffffffff829f46f1 ffff88006da94bf8 ffff88006da94bf8 0000000000000000 ffff88003ab31fb0 ffff88003aabd438 ffff88003ab31ff8 ffff88006430fd90 ffff88003ab32f9c ffffed0007557a87 1ffff1000db6cf78 ffff88003ab32078 Call Trace: [<ffffffff8127cf91>] process_one_work+0x8f1/0x17a0 kernel/workqueue.c:2030 [<ffffffff8127df14>] worker_thread+0xd4/0x1180 kernel/workqueue.c:2162 [<ffffffff8128faaf>] kthread+0x1cf/0x270 drivers/block/aoe/aoecmd.c:1302 [<ffffffff852a7c2f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff88006db67b50> CR2: 0000000000000000 ---[ end trace a587f8947e54d6ea ]--- Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit b23324ffa8ef8cc96865db76db938905d61d949a) [wt: applied to drivers/char/tty_buffer.c instead] Signed-off-by: Willy Tarreau <w@1wt.eu>
* ses: fix additional element traversal bugJames Bottomley2016-01-292-1/+13
| | | | | | | | | | | | | | | | | | | | commit 5e1033561da1152c57b97ee84371dba2b3d64c25 upstream. KASAN found that our additional element processing scripts drop off the end of the VPD page into unallocated space. The reason is that not every element has additional information but our traversal routines think they do, leading to them expecting far more additional information than is present. Fix this by adding a gate to the traversal routine so that it only processes elements that are expected to have additional information (list is in SES-2 section 6.1.13.1: Additional Element Status diagnostic page overview) Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 344d6d02690a650607e6372bce8fe40e647a51b5) Signed-off-by: Willy Tarreau <w@1wt.eu>
* ses: Fix problems with simple enclosuresJames Bottomley2016-01-291-1/+19
| | | | | | | | | | | | | | | | | | | | | | commit 3417c1b5cb1fdc10261dbed42b05cc93166a78fd upstream. Simple enclosure implementations (mostly USB) are allowed to return only page 8 to every diagnostic query. That really confuses our implementation because we assume the return is the page we asked for and end up doing incorrect offsets based on bogus information leading to accesses outside of allocated ranges. Fix that by checking the page code of the return and giving an error if it isn't the one we asked for. This should fix reported bugs with USB storage by simply refusing to attach to enclosures that behave like this. It's also good defensive practise now that we're starting to see more USB enclosures. Reported-by: Andrea Gelmini <andrea.gelmini@gelma.net> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 25ef938516d26c3db9cb1f5b927ad7ee95be1022) Signed-off-by: Willy Tarreau <w@1wt.eu>
* rfkill: copy the name into the rfkill structJohannes Berg2016-01-291-3/+3
| | | | | | | | | | | | | | | | | commit b7bb110008607a915298bf0f47d25886ecb94477 upstream. Some users of rfkill, like NFC and cfg80211, use a dynamic name when allocating rfkill, in those cases dev_name(). Therefore, the pointer passed to rfkill_alloc() might not be valid forever, I specifically found the case that the rfkill name was quite obviously an invalid pointer (or at least garbage) when the wiphy had been renamed. Fix this by making a copy of the rfkill name in rfkill_alloc(). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 6f23bc6f6be370267332a0278a4646126836baee) Signed-off-by: Willy Tarreau <w@1wt.eu>
* af_unix: fix a fatal race with bit fieldsEric Dumazet2016-01-292-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 60bc851ae59bfe99be6ee89d6bc50008c85ec75d upstream. Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 2ee9cbe7e7bfe2d36374288b818aa31b2c4981db) [wt: adjusted context] Signed-off-by: Willy Tarreau <w@1wt.eu>
* sctp: update the netstamp_needed counter when copying socketsMarcelo Ricardo Leitner2016-01-292-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 01ce63c90170283a9855d1db4fe81934dddce648 ] Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy related to disabling sock timestamp. When SCTP accepts an association or peel one off, it copies sock flags but forgot to call net_enable_timestamp() if a packet timestamping flag was copied, leading to extra calls to net_disable_timestamp() whenever such clones were closed. The fix is to call net_enable_timestamp() whenever we copy a sock with that flag on, like tcp does. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: SK_FLAGS_TIMESTAMP is newly defined] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit d85242d91610acbe4f905624a5758a01ae7bb32c) Signed-off-by: Willy Tarreau <w@1wt.eu>
* net, scm: fix PaX detected msg_controllen overflow in scm_detach_fdsDaniel Borkmann2016-01-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6900317f5eff0a7070c5936e5383f589e0de7a09 ] David and HacKurx reported a following/similar size overflow triggered in a grsecurity kernel, thanks to PaX's gcc size overflow plugin: (Already fixed in later grsecurity versions by Brad and PaX Team.) [ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314 cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr; [ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7 [ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...] [ 1002.296153] ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8 [ 1002.296162] ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8 [ 1002.296169] ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60 [ 1002.296176] Call Trace: [ 1002.296190] [<ffffffff818129ba>] dump_stack+0x45/0x57 [ 1002.296200] [<ffffffff8121f838>] report_size_overflow+0x38/0x60 [ 1002.296209] [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300 [ 1002.296220] [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930 [ 1002.296228] [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60 [ 1002.296236] [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50 [ 1002.296243] [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60 [ 1002.296248] [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0 [ 1002.296257] [<ffffffff81693496>] __sys_recvmsg+0x46/0x80 [ 1002.296263] [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40 [ 1002.296271] [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85 Further investigation showed that this can happen when an *odd* number of fds are being passed over AF_UNIX sockets. In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)), where i is the number of successfully passed fds, differ by 4 bytes due to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary on 64 bit. The padding is used to align subsequent cmsg headers in the control buffer. When the control buffer passed in from the receiver side *lacks* these 4 bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will overflow in scm_detach_fds(): int cmlen = CMSG_LEN(i * sizeof(int)); <--- cmlen w/o tail-padding err = put_user(SOL_SOCKET, &cm->cmsg_level); if (!err) err = put_user(SCM_RIGHTS, &cm->cmsg_type); if (!err) err = put_user(cmlen, &cm->cmsg_len); if (!err) { cmlen = CMSG_SPACE(i * sizeof(int)); <--- cmlen w/ 4 byte extra tail-padding msg->msg_control += cmlen; msg->msg_controllen -= cmlen; <--- iff no tail-padding space here ... } ... wrap-around F.e. it will wrap to a length of 18446744073709551612 bytes in case the receiver passed in msg->msg_controllen of 20 bytes, and the sender properly transferred 1 fd to the receiver, so that its CMSG_LEN results in 20 bytes and CMSG_SPACE in 24 bytes. In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an issue in my tests as alignment seems always on 4 byte boundary. Same should be in case of native 32 bit, where we end up with 4 byte boundaries as well. In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving a single fd would mean that on successful return, msg->msg_controllen is being set by the kernel to 24 bytes instead, thus more than the input buffer advertised. It could f.e. become an issue if such application later on zeroes or copies the control buffer based on the returned msg->msg_controllen elsewhere. Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253). Going over the code, it seems like msg->msg_controllen is not being read after scm_detach_fds() in scm_recv() anymore by the kernel, good! Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg()) and unix_stream_recvmsg(). Both return back to their recvmsg() caller, and ___sys_recvmsg() places the updated length, that is, new msg_control - old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen in the example). Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad24a ("[NET]: Fix function put_cmsg() which may cause usr application memory overflow"). RFC3542, section 20.2. says: The fields shown as "XX" are possible padding, between the cmsghdr structure and the data, and between the data and the next cmsghdr structure, if required by the implementation. While sending an application may or may not include padding at the end of last ancillary data in msg_controllen and implementations must accept both as valid. On receiving a portable application must provide space for padding at the end of the last ancillary data as implementations may copy out the padding at the end of the control message buffer and include it in the received msg_controllen. When recvmsg() is called if msg_controllen is too small for all the ancillary data items including any trailing padding after the last item an implementation may set MSG_CTRUNC. Since we didn't place MSG_CTRUNC for already quite a long time, just do the same as in 1ac70e7ad24a to avoid an overflow. Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix error in SCM_RIGHTS code sample"). Some people must have copied this (?), thus it got triggered in the wild (reported several times during boot by David and HacKurx). No Fixes tag this time as pre 2002 (that is, pre history tree). Reported-by: David Sterba <dave@jikos.cz> Reported-by: HacKurx <hackurx@gmail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Emese Revfy <re.emese@gmail.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 831a2a17da39d93cf68981ff99cce5d31551044b) Signed-off-by: Willy Tarreau <w@1wt.eu>
* tcp: initialize tp->copied_seq in case of cross SYN connectionEric Dumazet2016-01-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 142a2e7ece8d8ac0e818eb2c91f99ca894730e2a ] Dmitry provided a syzkaller (http://github.com/google/syzkaller) generated program that triggers the WARNING at net/ipv4/tcp.c:1729 in tcp_recvmsg() : WARN_ON(tp->copied_seq != tp->rcv_nxt && !(flags & (MSG_PEEK | MSG_TRUNC))); His program is specifically attempting a Cross SYN TCP exchange, that we support (for the pleasure of hackers ?), but it looks we lack proper tcp->copied_seq initialization. Thanks again Dmitry for your report and testings. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 6cfa9781d3bf950eed455369966bbdf9d05871c5) Signed-off-by: Willy Tarreau <w@1wt.eu>
* ipmi: move timer init to before irq is setupJan Stancek2016-01-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 27f972d3e00b50639deb4cc1392afaeb08d3cecc upstream. We encountered a panic on boot in ipmi_si on a dell per320 due to an uninitialized timer as follows. static int smi_start_processing(void *send_info, ipmi_smi_t intf) { /* Try to claim any interrupts. */ if (new_smi->irq_setup) new_smi->irq_setup(new_smi); --> IRQ arrives here and irq handler tries to modify uninitialized timer which triggers BUG_ON(!timer->function) in __mod_timer(). Call Trace: <IRQ> [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si] [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si] [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si] [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350 [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si] [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170 [<ffffffff810f245e>] handle_edge_irq+0xde/0x180 [<ffffffff8100fc59>] handle_irq+0x49/0xa0 [<ffffffff8154643c>] do_IRQ+0x6c/0xf0 [<ffffffff8100ba53>] ret_from_intr+0x0/0x11 /* Set up the timer that drives the interface. */ setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi); The following patch fixes the problem. To: Openipmi-developer@lists.sourceforge.net To: Corey Minyard <minyard@acm.org> CC: linux-kernel@vger.kernel.org Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 207ffa8cbcf6e780d862316a94c95a7e70d1e72e) Signed-off-by: Willy Tarreau <w@1wt.eu>
* sched/core: Remove false-positive warning from wake_up_process()Sasha Levin2016-01-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | commit 119d6f6a3be8b424b200dcee56e74484d5445f7e upstream. Because wakeups can (fundamentally) be late, a task might not be in the expected state. Therefore testing against a task's state is racy, and can yield false positives. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: oleg@redhat.com Fixes: 9067ac85d533 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task") Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 0e796c1b57fdd97fc040b0b78ff7fea6c0a4a39d) Signed-off-by: Willy Tarreau <w@1wt.eu>
* ipv4: igmp: Allow removing groups from a removed interfaceAndrew Lunn2016-01-291-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4eba7bb1d72d9bde67d810d09bf62dc207b63c5c upstream. When a multicast group is joined on a socket, a struct ip_mc_socklist is appended to the sockets mc_list containing information about the joined group. If the interface is hot unplugged, this entry becomes stale. Prior to commit 52ad353a5344f ("igmp: fix the problem when mc leave group") it was possible to remove the stale entry by performing a IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on the interface. However, this fix enforces that the interface must still exist. Thus with time, the number of stale entries grows, until sysctl_igmp_max_memberships is reached and then it is not possible to join and more groups. The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is performed without specifying the interface, either by ifindex or ip address. However here we do supply one of these. So loosen the restriction on device existence to only apply when the interface has not been specified. This then restores the ability to clean up the stale entries. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 52ad353a5344f "(igmp: fix the problem when mc leave group") Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit b1fa852672fb1ea4b0e34bfa1136bb65dba66151) Signed-off-by: Willy Tarreau <w@1wt.eu>
* wan/x25: Fix use-after-free in x25_asy_open_tty()Peter Hurley2016-01-291-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ee9159ddce14bc1dec9435ae4e3bd3153e783706 upstream. The N_X25 line discipline may access the previous line discipline's closed and already-freed private data on open [1]. The tty->disc_data field _never_ refers to valid data on entry to the line discipline's open() method. Rather, the ldisc is expected to initialize that field for its own use for the lifetime of the instance (ie. from open() to close() only). [1] [ 634.336761] ================================================================== [ 634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0 [ 634.339558] Read of size 4 by task syzkaller_execu/8981 [ 634.340359] ============================================================================= [ 634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected ... [ 634.405018] Call Trace: [ 634.405277] dump_stack (lib/dump_stack.c:52) [ 634.405775] print_trailer (mm/slub.c:655) [ 634.406361] object_err (mm/slub.c:662) [ 634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236) [ 634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279) [ 634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1)) [ 634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447) [ 634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567) [ 634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879) [ 634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607) [ 634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613) [ 634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188) Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 485274cf9a69504cf4a7ef182c508f3541eee84e) Signed-off-by: Willy Tarreau <w@1wt.eu>
* nfs: if we have no valid attrs, then don't declare the attribute cache validJeff Layton2016-01-291-1/+5
| | | | | | | | | | | | | | | | | commit c812012f9ca7cf89c9e1a1cd512e6c3b5be04b85 upstream. If we pass in an empty nfs_fattr struct to nfs_update_inode, it will (correctly) not update any of the attributes, but it then clears the NFS_INO_INVALID_ATTR flag, which indicates that the attributes are up to date. Don't clear the flag if the fattr struct has no valid attrs to apply. Reviewed-by: Steve French <steve.french@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit ddab0155aa9589600861e3f65d505982b719496a) Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: Fix handling of extended tv_secDavid Turner2016-01-291-7/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream. In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by: David Turner <novalis@novalis.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Mark Harris <mh8928@yahoo.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 6dfd5f6abf1825fc351f663bf630603f9b78251b) Signed-off-by: Willy Tarreau <w@1wt.eu>
* vfs: Avoid softlockups with sendfile(2)Jan Kara2016-01-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | commit c2489e07c0a71a56fb2c84bc0ee66cddfca7d068 upstream. The following test program from Dmitry can cause softlockups or RCU stalls as it copies 1GB from tmpfs into eventfd and we don't have any scheduling point at that path in sendfile(2) implementation: int r1 = eventfd(0, 0); int r2 = memfd_create("", 0); unsigned long n = 1<<30; fallocate(r2, 0, 0, n); sendfile(r1, r2, 0, n); Add cond_resched() into __splice_from_pipe() to fix the problem. CC: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 47ae562efbd0fd8440959304915291865b945c67) Signed-off-by: Willy Tarreau <w@1wt.eu>
* fix sysvfs symlinksAl Viro2016-01-293-29/+3
| | | | | | | | | | | | | | | | | | | | | | commit 0ebf7f10d67a70e120f365018f1c5fce9ddc567d upstream. The thing got broken back in 2002 - sysvfs does *not* have inline symlinks; even short ones have bodies stored in the first block of file. sysv_symlink() handles that correctly; unfortunately, attempting to look an existing symlink up will end up confusing them for inline symlinks, and interpret the block number containing the body as the body itself. Nobody has noticed until now, which says something about the level of testing sysvfs gets ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [bwh: Backported to 3.2: - Adjust context - Also delete unused sysv_fast_symlink_inode_operations] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 081c769778f7a41f4eb3d633df26ad572575c980) Signed-off-by: Willy Tarreau <w@1wt.eu>
* fuse: break infinite loop in fuse_fill_write_pages()Roman Gushchin2016-01-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3ca8138f014a913f98e6ef40e939868e1e9ea876 upstream. I got a report about unkillable task eating CPU. Further investigation shows, that the problem is in the fuse_fill_write_pages() function. If iov's first segment has zero length, we get an infinite loop, because we never reach iov_iter_advance() call. Fix this by calling iov_iter_advance() before repeating an attempt to copy data from userspace. A similar problem is described in 124d3b7041f ("fix writev regression: pan hanging unkillable and un-straceable"). If zero-length segmend is followed by segment with invalid address, iov_iter_fault_in_readable() checks only first segment (zero-length), iov_iter_copy_from_user_atomic() skips it, fails at second and returns zero -> goto again without skipping zero-length segment. Patch calls iov_iter_advance() before goto again: we'll skip zero-length segment at second iteraction and iov_iter_fault_in_readable() will detect invalid address. Special thanks to Konstantin Khlebnikov, who helped a lot with the commit description. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Maxim Patlasov <mpatlasov@parallels.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: ea9b9907b82a ("fuse: implement perform_write") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit a5b234167a1ff46f311f5835828eec2f971b9bb4) [wt: adjusted context, as commit 478e0841b from 3.1 was never backported to call mark_page_accessed() eventhough it seems it should have been] Signed-off-by: Willy Tarreau <w@1wt.eu>
* sctp: translate host order to network order when setting a hmacidlucien2016-01-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | commit ed5a377d87dc4c87fb3e1f7f698cba38cd893103 upstream. now sctp auth cannot work well when setting a hmacid manually, which is caused by that we didn't use the network order for hmacid, so fix it by adding the transformation in sctp_auth_ep_set_hmacs. even we set hmacid with the network order in userspace, it still can't work, because of this condition in sctp_auth_ep_set_hmacs(): if (id > SCTP_AUTH_HMAC_ID_MAX) return -EOPNOTSUPP; so this wasn't working before and thus it won't break compatibility. Fixes: 65b07e5d0d09 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 98d37e7fed9568f37d4a4a150a3773cfafef91ed) Signed-off-by: Willy Tarreau <w@1wt.eu>
* bluetooth: Validate socket address length in sco_sock_bind().David S. Miller2016-01-291-0/+3
| | | | | | | | commit 5233252fce714053f0151680933571a2da9cbfb4 upstream. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
* net: fix warnings in 'make htmldocs' by moving macro definition out of field ↵Hannes Frederic Sowa2016-01-291-1/+1
| | | | | | | | | | | | | | | | | | | declaration commit 7bbadd2d1009575dad675afc16650ebb5aa10612 upstream. Docbook does not like the definition of macros inside a field declaration and adds a warning. Move the definition out. Fixes: 79462ad02e86180 ("net: add validation for the socket syscall protocol argument") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: keep open-coding U8_MAX] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 35da6d62b608fd3beb576d72e48e2bc866d714c9) [wt: adjusted context] Signed-off-by: Willy Tarreau <w@1wt.eu>
* net: add validation for the socket syscall protocol argumentHannes Frederic Sowa2016-01-296-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 79462ad02e861803b3840cc782248c7359451cd9 upstream. 郭永刚 reported that one could simply crash the kernel as root by using a simple program: int socket_fd; struct sockaddr_in addr; addr.sin_port = 0; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_family = 10; socket_fd = socket(10,3,0x40000000); connect(socket_fd , &addr,16); AF_INET, AF_INET6 sockets actually only support 8-bit protocol identifiers. inet_sock's skc_protocol field thus is sized accordingly, thus larger protocol identifiers simply cut off the higher bits and store a zero in the protocol fields. This could lead to e.g. NULL function pointer because as a result of the cut off inet_num is zero and we call down to inet_autobind, which is NULL for raw sockets. kernel: Call Trace: kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70 kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80 kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110 kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80 kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200 kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10 kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89 I found no particular commit which introduced this problem. CVE: CVE-2015-8543 Cc: Cong Wang <cwang@twopensource.com> Reported-by: 郭永刚 <guoyonggang@360.cn> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 2.6.32: open-code U8_MAX; adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
* KEYS: Fix race between read and revokeDavid Howells2016-01-291-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream. This fixes CVE-2015-7550. There's a race between keyctl_read() and keyctl_revoke(). If the revoke happens between keyctl_read() checking the validity of a key and the key's semaphore being taken, then the key type read method will see a revoked key. This causes a problem for the user-defined key type because it assumes in its read method that there will always be a payload in a non-revoked key and doesn't check for a NULL pointer. Fix this by making keyctl_read() check the validity of a key after taking semaphore instead of before. I think the bug was introduced with the original keyrings code. This was discovered by a multithreaded test program generated by syzkaller (http://github.com/google/syzkaller). Here's a cleaned up version: #include <sys/types.h> #include <keyutils.h> #include <pthread.h> void *thr0(void *arg) { key_serial_t key = (unsigned long)arg; keyctl_revoke(key); return 0; } void *thr1(void *arg) { key_serial_t key = (unsigned long)arg; char buffer[16]; keyctl_read(key, buffer, 16); return 0; } int main() { key_serial_t key = add_key("user", "%", "foo", 3, KEY_SPEC_USER_KEYRING); pthread_t th[5]; pthread_create(&th[0], 0, thr0, (void *)(unsigned long)key); pthread_create(&th[1], 0, thr1, (void *)(unsigned long)key); pthread_create(&th[2], 0, thr0, (void *)(unsigned long)key); pthread_create(&th[3], 0, thr1, (void *)(unsigned long)key); pthread_join(th[0], 0); pthread_join(th[1], 0); pthread_join(th[2], 0); pthread_join(th[3], 0); return 0; } Build as: cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread Run as: while keyctl-race; do :; done as it may need several iterations to crash the kernel. The crash can be summarised as: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff81279b08>] user_read+0x56/0xa3 ... Call Trace: [<ffffffff81276aa9>] keyctl_read_key+0xb6/0xd7 [<ffffffff81277815>] SyS_keyctl+0x83/0xe0 [<ffffffff815dbb97>] entry_SYSCALL_64_fastpath+0x12/0x6f Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: James Morris <james.l.morris@oracle.com> [bwh: Backported to 2.6.32: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
* udp: properly support MSG_PEEK with truncated buffersEric Dumazet2016-01-292-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 197c949e7798fbf28cfadc69d9ca0c2abbf93191 upstream. Backport of this upstream commit into stable kernels : 89c22d8c3b27 ("net: Fix skb csum races when peeking") exposed a bug in udp stack vs MSG_PEEK support, when user provides a buffer smaller than skb payload. In this case, skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr), msg->msg_iov); returns -EFAULT. This bug does not happen in upstream kernels since Al Viro did a great job to replace this into : skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg); This variant is safe vs short buffers. For the time being, instead reverting Herbert Xu patch and add back skb->ip_summed invalid changes, simply store the result of udp_lib_checksum_complete() so that we avoid computing the checksum a second time, and avoid the problematic skb_copy_and_csum_datagram_iovec() call. This patch can be applied on recent kernels as it avoids a double checksumming, then backported to stable kernels as a bug fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 18a6eba2eabbcb50a78210b16f7dd43d888a537b) Signed-off-by: Willy Tarreau <w@1wt.eu>
* Revert "net: add length argument to skb_copy_and_csum_datagram_iovec"Willy Tarreau2016-01-297-14/+7
| | | | | | | | This reverts commit c507639ba963bb47e3f515670a7cace76af76ab6. As reported by Michal Kubecek, this fix doesn't handle truncated reads correctly. Next patch from Eric fixes it better. Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: Fix null dereference in ext4_fill_super()Ben Hutchings2016-01-291-4/+4
| | | | | | | | | | | | | | | Fix failure paths in ext4_fill_super() that can lead to a null dereference. This was designated CVE-2015-8324. Mostly extracted from commit 744692dc0598 ("ext4: use ext4_get_block_write in buffer write"). However there's one more incorrect goto to fix, removed upstream in commit cf40db137cc2 ("ext4: remove failed journal checksum check"). Reference: https://bugs.openvz.org/browse/OVZ-6541 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
* unix: avoid use-after-free in ep_remove_wait_queueRainer Weikusat2016-01-292-21/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7d267278a9ece963d77eefec61630223fce08c6c upstream. Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f9420 ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 2.6.32: - Access sk_sleep directly, not through sk_sleep() function - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
* RDS: fix race condition when sending a message on unbound socketQuentin Casasnovas2016-01-292-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8c7188b23474cca017b3ef354c4a58456f68303a upstream. Sasha's found a NULL pointer dereference in the RDS connection code when sending a message to an apparently unbound socket. The problem is caused by the code checking if the socket is bound in rds_sendmsg(), which checks the rs_bound_addr field without taking a lock on the socket. This opens a race where rs_bound_addr is temporarily set but where the transport is not in rds_bind(), leading to a NULL pointer dereference when trying to dereference 'trans' in __rds_conn_create(). Vegard wrote a reproducer for this issue, so kindly ask him to share if you're interested. I cannot reproduce the NULL pointer dereference using Vegard's reproducer with this patch, whereas I could without. Complete earlier incomplete fix to CVE-2015-6937: 74e98eb08588 ("RDS: verify the underlying transport exists before creating a connection") Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Reviewed-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ppp, slip: Validate VJ compression slot parameters completelyBen Hutchings2016-01-294-14/+15
| | | | | | | | | | | | | | | | | | | | commit 4ab42d78e37a294ac7bc56901d563c642e03c4ae upstream. Currently slhc_init() treats out-of-range values of rslots and tslots as equivalent to 0, except that if tslots is too large it will dereference a null pointer (CVE-2015-7799). Add a range-check at the top of the function and make it return an ERR_PTR() on error instead of NULL. Change the callers accordingly. Compile-tested only. Reported-by: 郭永刚 <guoyonggang@360.cn> References: http://article.gmane.org/gmane.comp.security.oss.general/17908 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 2.6.32: adjust filenames, context, indentation] Signed-off-by: Willy Tarreau <w@1wt.eu>
* isdn_ppp: Add checks for allocation failure in isdn_ppp_open()Ben Hutchings2016-01-291-0/+6
| | | | | | | | | | commit 0baa57d8dc32db78369d8b5176ef56c5e2e18ab3 upstream. Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ip6mr: call del_timer_sync() in ip6mr_free_table()WANG Cong2016-01-291-1/+1
| | | | | | | | | | | | | | | | commit 7ba0c47c34a1ea5bc7a24ca67309996cce0569b5 upstream. We need to wait for the flying timers, since we are going to free the mrtable right after it. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> [ wt: 2.6.32 has a single table hence a single timer. ip6_mr_init() has the same del_timer() call on the error path, but we don't need to change it since at this point the timer hasn't been started yet ] Signed-off-by: Willy Tarreau <w@1wt.eu>
* Linux 2.6.32.69v2.6.32.69Willy Tarreau2015-12-061-1/+1
| | | | Signed-off-by: Willy Tarreau <w@1wt.eu>
* splice: sendfile() at once fails for big filesChristophe Leroy2015-12-061-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0ff28d9f4674d781e492bcff6f32f0fe48cf0fed upstream. Using sendfile with below small program to get MD5 sums of some files, it appear that big files (over 64kbytes with 4k pages system) get a wrong MD5 sum while small files get the correct sum. This program uses sendfile() to send a file to an AF_ALG socket for hashing. /* md5sum2.c */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #include <sys/socket.h> #include <sys/stat.h> #include <sys/types.h> #include <linux/if_alg.h> int main(int argc, char **argv) { int sk = socket(AF_ALG, SOCK_SEQPACKET, 0); struct stat st; struct sockaddr_alg sa = { .salg_family = AF_ALG, .salg_type = "hash", .salg_name = "md5", }; int n; bind(sk, (struct sockaddr*)&sa, sizeof(sa)); for (n = 1; n < argc; n++) { int size; int offset = 0; char buf[4096]; int fd; int sko; int i; fd = open(argv[n], O_RDONLY); sko = accept(sk, NULL, 0); fstat(fd, &st); size = st.st_size; sendfile(sko, fd, &offset, size); size = read(sko, buf, sizeof(buf)); for (i = 0; i < size; i++) printf("%2.2x", buf[i]); printf(" %s\n", argv[n]); close(fd); close(sko); } exit(0); } Test below is done using official linux patch files. First result is with a software based md5sum. Second result is with the program above. root@vgoip:~# ls -l patch-3.6.* -rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz -rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gz root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz 5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gz After investivation, it appears that sendfile() sends the files by blocks of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each block, the SPLICE_F_MORE flag is missing, therefore the hashing operation is reset as if it was the end of the file. This patch adds SPLICE_F_MORE to the flags when more data is pending. With the patch applied, we get the correct sums: root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit fcb2781782b61631db4ed31e98757795eacd31cb) Signed-off-by: Willy Tarreau <w@1wt.eu>
* net: avoid NULL deref in inet_ctl_sock_destroy()Eric Dumazet2015-12-061-1/+2
| | | | | | | | | | | | | | | | [ Upstream commit 8fa677d2706d325d71dab91bf6e6512c05214e37 ] Under low memory conditions, tcp_sk_init() and icmp_sk_init() can both iterate on all possible cpus and call inet_ctl_sock_destroy(), with eventual NULL pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit f79c83d6c41930362bc66fc71489e92975a2facf) Signed-off-by: Willy Tarreau <w@1wt.eu>
* ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() ↵Ani Sinha2015-12-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in preemptible context. [ Upstream commit 44f49dd8b5a606870a1f21101522a0f9c4414784 ] Fixes the following kernel BUG : BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758 caller is __this_cpu_preempt_check+0x13/0x15 CPU: 0 PID: 2758 Comm: bash Tainted: P O 3.18.19 #2 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8 Call Trace: [<ffffffff81482b2a>] dump_stack+0x52/0x80 [<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1 [<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15 [<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c [<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e [<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810e6974>] ? pollwake+0x4d/0x51 [<ffffffff81058ac0>] ? default_wake_function+0x0/0xf [<ffffffff810553fc>] ? get_parent_ip+0x11/0x42 [<ffffffff810613d9>] ? __wake_up_common+0x45/0x77 [<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32 [<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53 [<ffffffff8139a519>] ? sock_def_readable+0x71/0x75 [<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55 [<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41 [<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86 [<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d [<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b [<ffffffff810d5738>] ? new_sync_read+0x82/0xaa [<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99 [<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32 [<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f [<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf [<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3 [<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e Signed-off-by: Ani Sinha <ani@arista.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: ipmr doesn't implement IPSTATS_MIB_OUTOCTETS] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 33cf84ba8c25b40c7de52029efc8d4372725c95f) Signed-off-by: Willy Tarreau <w@1wt.eu>
* RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in ↵Sowmini Varadhan2015-12-061-2/+9
| | | | | | | | | | | | | | | | | | | | | | | rds_tcp_data_recv [ Upstream commit 8ce675ff39b9958d1c10f86cf58e357efaafc856 ] Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit f114d9374ba3e42c86b112c8b4dbcba50a7330e7) Signed-off-by: Willy Tarreau <w@1wt.eu>
* binfmt_elf: Don't clobber passed executable's file headerMaciej W. Rozycki2015-12-061-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b582ef5c53040c5feef4c96a8f9585b6831e2441 upstream. Do not clobber the buffer space passed from `search_binary_handler' and originally preloaded by `prepare_binprm' with the executable's file header by overwriting it with its interpreter's file header. Instead keep the buffer space intact and directly use the data structure locally allocated for the interpreter's file header, fixing a bug introduced in 2.1.14 with loadable module support (linux-mips.org commit beb11695 [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history). Adjust the amount of data read from the interpreter's file accordingly. This was not an issue before loadable module support, because back then `load_elf_binary' was executed only once for a given ELF executable, whether the function succeeded or failed. With loadable module support supported and enabled, upon a failure of `load_elf_binary' -- which may for example be caused by architecture code rejecting an executable due to a missing hardware feature requested in the file header -- a module load is attempted and then the function reexecuted by `search_binary_handler'. With the executable's file header replaced with its interpreter's file header the executable can then be erroneously accepted in this subsequent attempt. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit beebd9fa9d8aeb8f1a3028acc1987c808b601e7d) Signed-off-by: Willy Tarreau <w@1wt.eu>