summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* crypto: sha512 - use standard ror64()Alexey Dobriyan2012-02-201-0/+20
| | | | | | | | | | | | | | | | | | | commit f2ea0f5f04c97b48c88edccba52b0682fbe45087 upstream. Use standard ror64() instead of hand-written. There is no standard ror64, so create it. The difference is shift value being "unsigned int" instead of uint64_t (for which there is no reason). gcc starts to emit native ROR instructions which it doesn't do for some reason currently. This should make the code faster. Patch survives in-tree crypto test and ping flood with hmac(sha512) on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* writeback: fix dereferencing NULL bdi->dev on trace_writeback_queueWu Fengguang2012-02-201-1/+4
| | | | | | | | | | | | | | | | | | | commit 977b7e3a52a7421ad33a393a38ece59f3d41c2fa upstream. When a SD card is hot removed without umount, del_gendisk() will call bdi_unregister() without destroying/freeing it. This leaves the bdi in the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state. When sync(2) gets the bdi before bdi_unregister() and calls bdi_queue_work() after the unregister, trace_writeback_queue will be dereferencing the NULL bdi->dev. Fix it with a simple test for NULL. LKML-reference: http://lkml.org/lkml/2012/1/18/346 Reported-by: Rabin Vincent <rabin@rab.in> Tested-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernelWu Fengguang2012-02-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | commit 3310225dfc71a35a2cc9340c15c0e08b14b3c754 upstream. PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs in the below lines of bdi_dirty_limit(): bdi_dirty *= numerator; do_div(bdi_dirty, denominator); 1) divide error: do_div() only uses the lower 32 bit of the denominator, which may trimmed to be 0 when PROP_MAX_SHIFT > 32. 2) overflow: (bdi_dirty * numerator) could easily overflow if numerator used up to 48 bits, leaving only 16 bits to bdi_dirty Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com> Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net: fix NULL dereferences in check_peer_redir()Eric Dumazet2012-02-132-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d3aaeb38c40e5a6c08dd31a1b64da65c4352be36, along with dependent backports of commits: 69cce1d1404968f78b177a0314f5822d5afdbbfb 9de79c127cccecb11ae6a21ab1499e87aa222880 218fa90f072e4aeff9003d57e390857f4f35513e 580da35a31f91a594f3090b7a2c39b85cb051a12 f7e57044eeb1841847c24aa06766c8290c202583 e049f28883126c689cf95859480d9ee4ab23b7fa ] Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* PCI: Rework ASPM disable codeMatthew Garrett2012-02-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3c076351c4027a56d5005a39a0b518a4ba393ce2 upstream. Right now we forcibly clear ASPM state on all devices if the BIOS indicates that the feature isn't supported. Based on the Microsoft presentation "PCI Express In Depth for Windows Vista and Beyond", I'm starting to think that this may be an error. The implication is that unless the platform grants full control via _OSC, Windows will not touch any PCIe features - including ASPM. In that case clearing ASPM state would be an error unless the platform has granted us that control. This patch reworks the ASPM disabling code such that the actual clearing of state is triggered by a successful handoff of PCIe control to the OS. The general ASPM code undergoes some changes in order to ensure that the ability to clear the bits isn't overridden by ASPM having already been disabled. Further, this theoretically now allows for situations where only a subset of PCIe roots hand over control, leaving the others in the BIOS state. It's difficult to know for sure that this is the right thing to do - there's zero public documentation on the interaction between all of these components. But enough vendors enable ASPM on platforms and then set this bit that it seems likely that they're expecting the OS to leave them alone. Measured to save around 5W on an idle Thinkpad X220. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* netns: Fail conspicously if someone uses net_generic at an inappropriate time.Eric W. Biederman2012-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | [ Upstream commit 5ee4433efe99b9f39f6eff5052a177bbcfe72cea ] By definition net_generic should never be called when it can return NULL. Fail conspicously with a BUG_ON to make it clear when people mess up that a NULL return should never happen. Recently there was a bug in the CAIF subsystem where it was registered with register_pernet_device instead of register_pernet_subsys. It was erroneously concluded that net_generic could validly return NULL and that net_assign_generic was buggy (when it was just inefficient). Hopefully this BUG_ON will prevent people to coming to similar erroneous conclusions in the futrue. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* drm: Fix authentication kernel crashThomas Hellstrom2012-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | commit 598781d71119827b454fd75d46f84755bca6f0c6 upstream. If the master tries to authenticate a client using drm_authmagic and that client has already closed its drm file descriptor, either wilfully or because it was terminated, the call to drm_authmagic will dereference a stale pointer into kmalloc'ed memory and corrupt it. Typically this results in a hard system hang. This patch fixes that problem by removing any authentication tokens (struct drm_magic_entry) open for a file descriptor when that file descriptor is closed. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* target: Set additional sense length field in sense dataRoland Dreier2012-01-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 895f3022523361e9b383cf48f51feb1f7d5e7e53 upstream. The target code was not setting the additional sense length field in the sense data it returned, which meant that at least the Linux stack ignored the ASC/ASCQ fields. For example, without this patch, on a tcm_loop device: # sg_raw -v /dev/sda 2 0 0 0 0 0 gives cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Raw sense data (in hex): 70 00 05 00 00 00 00 00 while after the patch we correctly get the following (which matches what a regular disk returns): cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid command operation code Raw sense data (in hex): 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ACPI: Store SRAT table revisionKurt Garloff2012-01-251-0/+1
| | | | | | | | | | | | | | | | | | commit 8df0eb7c9d96f9e82f233ee8b74e0f0c8471f868 upstream. In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides 32bits for these. The new fields were reserved before. According to the ACPI spec, the OS must disregrard reserved fields. In order to know whether or not, we must know what version the SRAT table has. This patch stores the SRAT table revision for later consumption by arch specific __init functions. Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: fail SCSI passthrough ioctls on partition devicesPaolo Bonzini2012-01-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0bfc96cb77224736dfa35c3c555d37b3646ef35e upstream. [ Changes with respect to 3.3: return -ENOTTY from scsi_verify_blk_ioctl and -ENOIOCTLCMD from sd_compat_ioctl. ] Linux allows executing the SG_IO ioctl on a partition or LVM volume, and will pass the command to the underlying block device. This is well-known, but it is also a large security problem when (via Unix permissions, ACLs, SELinux or a combination thereof) a program or user needs to be granted access only to part of the disk. This patch lets partitions forward a small set of harmless ioctls; others are logged with printk so that we can see which ioctls are actually sent. In my tests only CDROM_GET_CAPABILITY actually occurred. Of course it was being sent to a (partition on a) hard disk, so it would have failed with ENOTTY and the patch isn't changing anything in practice. Still, I'm treating it specially to avoid spamming the logs. In principle, this restriction should include programs running with CAP_SYS_RAWIO. If for example I let a program access /dev/sda2 and /dev/sdb, it still should not be able to read/write outside the boundaries of /dev/sda2 independent of the capabilities. However, for now programs with CAP_SYS_RAWIO will still be allowed to send the ioctls. Their actions will still be logged. This patch does not affect the non-libata IDE driver. That driver however already tests for bd != bd->bd_contains before issuing some ioctl; it could be restricted further to forbid these ioctls even for programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [ Make it also print the command name when warning - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: add and use scsi_blk_cmd_ioctlPaolo Bonzini2012-01-251-0/+2
| | | | | | | | | | | | | | | | | commit 577ebb374c78314ac4617242f509e2f5e7156649 upstream. Introduce a wrapper around scsi_cmd_ioctl that takes a block device. The function will then be enhanced to detect partition block devices and, in that case, subject the ioctls to whitelisting. Cc: linux-scsi@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: James Bottomley <JBottomley@parallels.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* fix shrink_dcache_parent() livelockMiklos Szeredi2012-01-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit eaf5f9073533cde21c7121c136f1c3f072d9cf59 upstream. Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may cause shrink_dcache_parent() to loop forever. Here's what appears to happen: 1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1 2 - CPU1: select_parent(P) locks P->d_lock 3 - CPU0: shrink_dentry_list() locks C->d_lock dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock 4 - CPU1: select_parent(P) locks C->d_lock, moves C from dispose list being processed on CPU0 to the new dispose list, returns 1 5 - CPU0: shrink_dentry_list() finds dispose list empty, returns 6 - Goto 2 with CPU0 and CPU1 switched Basically select_parent() steals the dentry from shrink_dentry_list() and thinks it found a new one, causing shrink_dentry_list() to think it's making progress and loop over and over. One way to trigger this is to make udev calls stat() on the sysfs file while it is going away. Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick: ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true" Then execute the following loop: while true; do echo -bond0 > /sys/class/net/bonding_masters echo +bond0 > /sys/class/net/bonding_masters echo -bond1 > /sys/class/net/bonding_masters echo +bond1 > /sys/class/net/bonding_masters done One fix would be to check all callers and prevent concurrent calls to shrink_dcache_parent(). But I think a better solution is to stop the stealing behavior. This patch adds a new dentry flag that is set when the dentry is added to the dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a new reference just before being pruned. If the dentry has this flag, select_parent() will skip it and let shrink_dentry_list() retry pruning it. With select_parent() skipping those dentries there will not be the appearance of progress (new dentries found) when there is none, hence shrink_dcache_parent() will not loop forever. Set the flag is also set in prune_dcache_sb() for consistency as suggested by Linus. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* svcrpc: destroy server sockets all at onceJ. Bruce Fields2012-01-251-1/+1
| | | | | | | | | | | commit 2fefb8a09e7ed251ae8996e0c69066e74c5aa560 upstream. There's no reason I can see that we need to call sv_shutdown between closing the two lists of sockets. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* V4L/DVB: v4l2-ioctl: integer overflow in video_usercopy()Dan Carpenter2012-01-251-0/+1
| | | | | | | | | | | | | | | | | commit 6c06108be53ca5e94d8b0e93883d534dd9079646 upstream. If ctrls->count is too high the multiplication could overflow and array_size would be lower than expected. Mauro and Hans Verkuil suggested that we cap it at 1024. That comes from the maximum number of controls with lots of room for expantion. $ grep V4L2_CID include/linux/videodev2.h | wc -l 211 Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* memcg: add mem_cgroup_replace_page_cache() to fix LRU issueKAMEZAWA Hiroyuki2012-01-251-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ab936cbcd02072a34b60d268f94440fd5cf1970b upstream. Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a function replace_page_cache_page(). This function replaces a page in the radix-tree with a new page. WHen doing this, memory cgroup needs to fix up the accounting information. memcg need to check PCG_USED bit etc. In some(many?) cases, 'newpage' is on LRU before calling replace_page_cache(). So, memcg's LRU accounting information should be fixed, too. This patch adds mem_cgroup_replace_page_cache() and removes the old hooks. In that function, old pages will be unaccounted without touching res_counter and new page will be accounted to the memcg (of old page). WHen overwriting pc->mem_cgroup of newpage, take zone->lru_lock and avoid races with LRU handling. Background: replace_page_cache_page() is called by FUSE code in its splice() handling. Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated page and may be on LRU. LRU mis-accounting will be critical for memory cgroup because rmdir() checks the whole LRU is empty and there is no account leak. If a page is on the other LRU than it should be, rmdir() will fail. This bug was added in March 2011, but no bug report yet. I guess there are not many people who use memcg and FUSE at the same time with upstream kernels. The result of this bug is that admin cannot destroy a memcg because of account leak. So, no panic, no deadlock. And, even if an active cgroup exist, umount can succseed. So no problem at shutdown. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* xen/xenbus: Reject replies with payload > XENSTORE_PAYLOAD_MAX.Ian Campbell2012-01-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9e7860cee18241633eddb36a4c34c7b61d8cecbc upstream. Haogang Chen found out that: There is a potential integer overflow in process_msg() that could result in cross-domain attack. body = kmalloc(msg->hdr.len + 1, GFP_NOIO | __GFP_HIGH); When a malicious guest passes 0xffffffff in msg->hdr.len, the subsequent call to xb_read() would write to a zero-length buffer. The other end of this connection is always the xenstore backend daemon so there is no guest (malicious or otherwise) which can do this. The xenstore daemon is a trusted component in the system. However this seem like a reasonable robustness improvement so we should have it. And Ian when read the API docs found that: The payload length (len field of the header) is limited to 4096 (XENSTORE_PAYLOAD_MAX) in both directions. If a client exceeds the limit, its xenstored connection will be immediately killed by xenstored, which is usually catastrophic from the client's point of view. Clients (particularly domains, which cannot just reconnect) should avoid this. so this patch checks against that instead. This also avoids a potential integer overflow pointed out by Haogang Chen. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* PCI: Fix PCI_EXP_TYPE_RC_EC valueAlex Williamson2012-01-251-1/+1
| | | | | | | | | | | commit 1830ea91c20b06608f7cdb2455ce05ba834b3214 upstream. Spec shows this as 1010b = 0xa Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* usb: ch9: fix up MaxStreams helperFelipe Balbi2012-01-121-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 18b7ede5f7ee2092aedcb578d3ac30bd5d4fc23c upstream. [ removed the dwc3 portion of the patch as it didn't apply to older kernels - gregkh] According to USB 3.0 Specification Table 9-22, if bmAttributes [4:0] are set to zero, it means "no streams supported", but the way this helper was defined on Linux, we will *always* have one stream which might cause several problems. For example on DWC3, we would tell the controller endpoint has streams enabled and yet start transfers with Stream ID set to 0, which would goof up the host side. While doing that, convert the macro to an inline function due to the different checks we now need. Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* usb: fix number of mapped SG DMA entriesClemens Ladisch2012-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bc677d5b64644c399cd3db6a905453e611f402ab upstream. Add a new field num_mapped_sgs to struct urb so that we have a place to store the number of mapped entries and can also retain the original value of entries in num_sgs. Previously, usb_hcd_map_urb_for_dma() would overwrite this with the number of mapped entries, which would break dma_unmap_sg() because it requires the original number of entries. This fixes warnings like the following when using USB storage devices: ------------[ cut here ]------------ WARNING: at lib/dma-debug.c:902 check_unmap+0x4e4/0x695() ehci_hcd 0000:00:12.2: DMA-API: device driver frees DMA sg list with different entry count [map count=4] [unmap count=1] Modules linked in: ohci_hcd ehci_hcd Pid: 0, comm: kworker/0:1 Not tainted 3.2.0-rc2+ #319 Call Trace: <IRQ> [<ffffffff81036d3b>] warn_slowpath_common+0x80/0x98 [<ffffffff81036de7>] warn_slowpath_fmt+0x41/0x43 [<ffffffff811fa5ae>] check_unmap+0x4e4/0x695 [<ffffffff8105e92c>] ? trace_hardirqs_off+0xd/0xf [<ffffffff8147208b>] ? _raw_spin_unlock_irqrestore+0x33/0x50 [<ffffffff811fa84a>] debug_dma_unmap_sg+0xeb/0x117 [<ffffffff8137b02f>] usb_hcd_unmap_urb_for_dma+0x71/0x188 [<ffffffff8137b166>] unmap_urb_for_dma+0x20/0x22 [<ffffffff8137b1c5>] usb_hcd_giveback_urb+0x5d/0xc0 [<ffffffffa0000d02>] ehci_urb_done+0xf7/0x10c [ehci_hcd] [<ffffffffa0001140>] qh_completions+0x429/0x4bd [ehci_hcd] [<ffffffffa000340a>] ehci_work+0x95/0x9c0 [ehci_hcd] ... ---[ end trace f29ac88a5a48c580 ]--- Mapped at: [<ffffffff811faac4>] debug_dma_map_sg+0x45/0x139 [<ffffffff8137bc0b>] usb_hcd_map_urb_for_dma+0x22e/0x478 [<ffffffff8137c494>] usb_hcd_submit_urb+0x63f/0x6fa [<ffffffff8137d01c>] usb_submit_urb+0x2c7/0x2de [<ffffffff8137dcd4>] usb_sg_wait+0x55/0x161 Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* mfd: Turn on the twl4030-madc MADC clockKyle Manna2012-01-061-0/+4
| | | | | | | | | | | | | | | commit 3d6271f92e98094584fd1e609a9969cd33e61122 upstream. Without turning the MADC clock on, no MADC conversions occur. $ cat /sys/class/hwmon/hwmon0/device/in8_input [ 53.428436] twl4030_madc twl4030_madc: conversion timeout! cat: read error: Resource temporarily unavailable Signed-off-by: Kyle Manna <kyle@kylemanna.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sctp: fix incorrect overflow check on autocloseXi Wang2012-01-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2692ba61a82203404abd7dd2a027bda962861f74 ] Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* VFS: Fix race between CPU hotplug and lglocksSrivatsa S. Bhat2012-01-061-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e30e2fdfe56288576ee9e04dbb06b4bd5f282203 upstream. Currently, the *_global_[un]lock_online() routines are not at all synchronized with CPU hotplug. Soft-lockups detected as a consequence of this race was reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng for finding out that the root-cause of this issue is the race condition between br_write_[un]lock() and CPU hotplug, which results in the lock states getting messed up). Fixing this race by just adding {get,put}_online_cpus() at appropriate places in *_global_[un]lock_online() is not a good option, because, then suddenly br_write_[un]lock() would become blocking, whereas they have been kept as non-blocking all this time, and we would want to keep them that way. So, overall, we want to ensure 3 things: 1. br_write_lock() and br_write_unlock() must remain as non-blocking. 2. The corresponding lock and unlock of the per-cpu spinlocks must not happen for different sets of CPUs. 3. Either prevent any new CPU online operation in between this lock-unlock, or ensure that the newly onlined CPU does not proceed with its corresponding per-cpu spinlock unlocked. To achieve all this: (a) We introduce a new spinlock that is taken by the *_global_lock_online() routine and released by the *_global_unlock_online() routine. (b) We register a callback for CPU hotplug notifications, and this callback takes the same spinlock as above. (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is initialized in the lock_init() code, all future updates to it are done in the callback, under the above spinlock. (d) The above bitmap is used (instead of cpu_online_mask) while locking and unlocking the per-cpu locks. The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the br_write_lock-unlock sequence is in progress, the callback keeps spinning, thus preventing the CPU online operation till the lock-unlock sequence is complete. This takes care of requirement (3). The bitmap that we maintain remains unmodified throughout the lock-unlock sequence, since all updates to it are managed by the callback, which takes the same spinlock as the one taken by the lock code and released only by the unlock routine. Combining this with (d) above, satisfies requirement (2). Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug operations from racing with br_write_lock-unlock, requirement (1) is also taken care of. By the way, it is to be noted that a CPU offline operation can actually run in parallel with our lock-unlock sequence, because our callback doesn't react to notifications earlier than CPU_DEAD (in order to maintain our bitmap properly). And this means, since we use our own bitmap (which is stale, on purpose) during the lock-unlock sequence, we could end up unlocking the per-cpu lock of an offline CPU (because we had locked it earlier, when the CPU was online), in order to satisfy requirement (2). But this is harmless, though it looks a bit awkward. Debugged-by: Cong Meng <mc@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* block: initialize request_queue's numa node duringMike Snitzer2012-01-061-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5151412dd4338b273afdb107c3772528e9e67d92 upstream. struct request_queue is allocated with __GFP_ZERO so its "node" field is zero before initialization. This causes an oops if node 0 is offline in the page allocator because its zonelists are not initialized. From Dave Young's dmesg: SRAT: Node 1 PXM 2 0-d0000000 SRAT: Node 1 PXM 2 100000000-330000000 SRAT: Node 0 PXM 1 330000000-630000000 Initmem setup node 1 0000000000000000-000000000affb000 ... Built 1 zonelists in Node order, mobility grouping on. ... BUG: unable to handle kernel paging request at 0000000000001c08 IP: [<ffffffff8111c355>] __alloc_pages_nodemask+0xb5/0x870 and __alloc_pages_nodemask+0xb5 translates to a NULL pointer on zonelist->_zonerefs. The fix is to initialize q->node at the time of allocation so the correct node is passed to the slab allocator later. Since blk_init_allocated_queue_node() is no longer needed, merge it with blk_init_allocated_queue(). [rientjes@google.com: changelog, initializing q->node] Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Dave Young <dyoung@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* drm/radeon/kms: add some new pci idsAlex Deucher2011-12-211-0/+10
| | | | | | | | | | | | commit cd5cfce856684e13b9b57d46b78bb827e9c4da3c upstream. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=43739 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* linux/log2.h: Fix rounddown_pow_of_two(1)Linus Torvalds2011-12-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 13c07b0286d340275f2d97adf085cecda37ede37 upstream. Exactly like roundup_pow_of_two(1), the rounddown version was buggy for the case of a compile-time constant '1' argument. Probably because it originated from the same code, sharing history with the roundup version from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix roundup_pow_of_two(1)"). However, unlike the roundup version, the fix for rounddown is to just remove the broken special case entirely. It's simply not needed - the generic code 1UL << ilog2(n) does the right thing for the constant '1' argment too. The only reason roundup needed that special case was because rounding up does so by subtracting one from the argument (and then adding one to the result) causing the obvious problems with "ilog2(0)". But rounddown doesn't do any of that, since ilog2() naturally truncates (ie "rounds down") to the right rounded down value. And without the ilog2(0) case, there's no reason for the special case that had the wrong value. tl;dr: rounddown_pow_of_two(1) should be 1, not 0. Acked-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* fix apparmor dereferencing potentially freed dentry, sanitize __d_path() APIAl Viro2011-12-212-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 02125a826459a6ad142f8d91c5b6357562f96615 upstream. __d_path() API is asking for trouble and in case of apparmor d_namespace_path() getting just that. The root cause is that when __d_path() misses the root it had been told to look for, it stores the location of the most remote ancestor in *root. Without grabbing references. Sure, at the moment of call it had been pinned down by what we have in *path. And if we raced with umount -l, we could have very well stopped at vfsmount/dentry that got freed as soon as prepend_path() dropped vfsmount_lock. It is safe to compare these pointers with pre-existing (and known to be still alive) vfsmount and dentry, as long as all we are asking is "is it the same address?". Dereferencing is not safe and apparmor ended up stepping into that. d_namespace_path() really wants to examine the place where we stopped, even if it's not connected to our namespace. As the result, it looked at ->d_sb->s_magic of a dentry that might've been already freed by that point. All other callers had been careful enough to avoid that, but it's really a bad interface - it invites that kind of trouble. The fix is fairly straightforward, even though it's bigger than I'd like: * prepend_path() root argument becomes const. * __d_path() is never called with NULL/NULL root. It was a kludge to start with. Instead, we have an explicit function - d_absolute_root(). Same as __d_path(), except that it doesn't get root passed and stops where it stops. apparmor and tomoyo are using it. * __d_path() returns NULL on path outside of root. The main caller is show_mountinfo() and that's precisely what we pass root for - to skip those outside chroot jail. Those who don't want that can (and do) use d_path(). * __d_path() root argument becomes const. Everyone agrees, I hope. * apparmor does *NOT* try to use __d_path() or any of its variants when it sees that path->mnt is an internal vfsmount. In that case it's definitely not mounted anywhere and dentry_path() is exactly what we want there. Handling of sysctl()-triggered weirdness is moved to that place. * if apparmor is asked to do pathname relative to chroot jail and __d_path() tells it we it's not in that jail, the sucker just calls d_absolute_path() instead. That's the other remaining caller of __d_path(), BTW. * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway - the normal seq_file logics will take care of growing the buffer and redoing the call of ->show() just fine). However, if it gets path not reachable from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped ignoring the return value as it used to do). Reviewed-by: John Johansen <john.johansen@canonical.com> ACKed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* firmware: Sigma: Fix endianess issuesLars-Peter Clausen2011-12-091-4/+4
| | | | | | | | | | | | | commit bda63586bc5929e97288cdb371bb6456504867ed upstream. Currently the SigmaDSP firmware loader only works correctly on little-endian systems. Fix this by using the proper endianess conversion functions. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* firmware: Sigma: Prevent out of bounds memory accessLars-Peter Clausen2011-12-091-5/+0
| | | | | | | | | | | | | | | | | commit 4f718a29fe4908c2cea782f751e9805319684e2b upstream. The SigmaDSP firmware loader currently does not perform enough boundary size checks when processing the firmware. As a result it is possible that a malformed firmware can cause an out of bounds memory access. This patch adds checks which ensure that both the action header and the payload are completely inside the firmware data boundaries before processing them. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* drm/radeon/kms: add some new pci idsAlex Deucher2011-12-091-0/+8
| | | | | | | | | commit 2ed4d9d648cbd4fb1c232a646dbdbdfdd373ca94 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* drm: integer overflow in drm_mode_dirtyfb_ioctl()Xi Wang2011-12-091-0/+2
| | | | | | | | | | | | | | | commit a5cd335165e31db9dbab636fd29895d41da55dd2 upstream. There is a potential integer overflow in drm_mode_dirtyfb_ioctl() if userspace passes in a large num_clips. The call to kmalloc would allocate a small buffer, and the call to fb->funcs->dirty may result in a memory corruption. Reported-by: Haogang Chen <haogangchen@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* nfs: when attempting to open a directory, fall back on normal lookup (try #5)Jeff Layton2011-11-262-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1788ea6e3b2a58cf4fb00206e362d9caff8d86a7 upstream. commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return from an OPEN call. Prior to that patch, that caused the client to fall back to doing a normal lookup. When that patch went in, the code began returning that error to userspace. The d_revalidate codepath however never had the corresponding change, so it was still possible to end up with a NULL ctx->state pointer after that. That patch caused a regression. When we attempt to open a directory that does not have a cached dentry, that open now errors out with EISDIR. If you attempt the same open with a cached dentry, it will succeed. Fix this by reverting the change in nfs_atomic_lookup and allowing attempts to open directories to fall back to a normal lookup Also, add a NFSv4-specific f_ops->open routine that just returns -ENOTDIR. This should never be called if things are working properly, but if it ever is, then the dprintk may help in debugging. To facilitate this, a new file_operations field is also added to the nfs_rpc_ops struct. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* drm/radeon: add some missing FireMV pci idsAlex Deucher2011-11-211-0/+2
| | | | | | | | | | | | commit b872a37437e93df9d112ce674752b3b3a0a17020 upstream. Noticed by Egbert. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Egbert Eich <eich@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* net: Handle different key sizes between address families in flow cachedpward2011-11-111-0/+19
| | | | | | | | | | | | | commit aa1c366e4febc7f5c2b84958a2dd7cd70e28f9d0 upstream. With the conversion of struct flowi to a union of AF-specific structs, some operations on the flow cache need to account for the exact size of the key. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* net: Align AF-specific flowi structs to longDavid Ward2011-11-111-3/+3
| | | | | | | | | | | | | commit 728871bc05afc8ff310b17dba3e57a2472792b13 upstream. AF-specific flowi structs are now passed to flow_key_compare, which must also be aligned to a long. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* drm/radeon/kms: properly set panel mode for eDPAlex Deucher2011-11-111-0/+3
| | | | | | | | | | | commit 00dfb8df5bf8c3afe4c0bb8361133156b06b7a2c upstream. This should make eDP more reliable. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* thp: share get_huge_page_tail()Andrea Arcangeli2011-11-111-0/+11
| | | | | | | | | | | | | | | | | | | | | | | commit b35a35b556f5e6b7993ad0baf20173e75c09ce8c upstream. This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodesTheodore Ts'o2011-11-112-4/+4
| | | | | | | | | | | | | | commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream. This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* mm: thp: tail page refcounting fixAndrea Arcangeli2011-11-112-38/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 70b50f94f1644e2aa7cb374819cfd93f3c28d725 upstream. Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* vfs pathname lookup: Add LOOKUP_AUTOMOUNT flagLinus Torvalds2011-11-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* readlinkat: ensure we return ENOENT for the empty pathname for normal lookupsAndy Whitcroft2011-11-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1fa1e7f615f4d3ae436fa319af6e4eebdd4026a8 upstream. Since the commit below which added O_PATH support to the *at() calls, the error return for readlink/readlinkat for the empty pathname has switched from ENOENT to EINVAL: commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames This is both unexpected for userspace and makes readlink/readlinkat inconsistant with all other interfaces; and inconsistant with our stated return for these pathnames. As the readlinkat call does not have a flags parameter we cannot use the AT_EMPTY_PATH approach used in the other calls. Therefore expose whether the original path is infact entry via a new user_path_at_empty() path lookup function. Use this to determine whether to default to EINVAL or ENOENT for failures. Addresses http://bugs.launchpad.net/bugs/817187 [akpm@linux-foundation.org: remove unused getname_flags()] Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* mm: avoid null pointer access in vm_struct via /proc/vmallocinfoMitsuo Hayasaka2011-11-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f5252e009d5b87071a919221e4f6624184005368 upstream. The /proc/vmallocinfo shows information about vmalloc allocations in vmlist that is a linklist of vm_struct. It, however, may access pages field of vm_struct where a page was not allocated. This results in a null pointer access and leads to a kernel panic. Why this happens: In __vmalloc_node_range() called from vmalloc(), newly allocated vm_struct is added to vmlist at __get_vm_area_node() and then, some fields of vm_struct such as nr_pages and pages are set at __vmalloc_area_node(). In other words, it is added to vmlist before it is fully initialized. At the same time, when the /proc/vmallocinfo is read, it accesses the pages field of vm_struct according to the nr_pages field at show_numa_info(). Thus, a null pointer access happens. The patch adds the newly allocated vm_struct to the vmlist *after* it is fully initialized. So, it can avoid accessing the pages field with unallocated page when show_numa_info() is called. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* io-mapping: ensure io_mapping_map_atomic _is_ atomicDaniel Vetter2011-11-111-0/+4
| | | | | | | | | | | | | | | | | | | | | commit 24dd85ff723f142093f44244764b9b5c152235b8 upstream. For the !HAVE_ATOMIC_IOMAP case the stub functions did not call pagefault_disable/_enable. The i915 driver relies on the map actually being atomic, otherwise it can deadlock with it's own pagefault handler in the gtt pwrite fastpath. This is exercised by gem_mmap_gtt from the intel-gpu-toosl gem testsuite. v2: Chris Wilson noted the lack of an include. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlierIan Campbell2011-11-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | commit 9bab0b7fbaceec47d32db51cd9e59c82fb071f5a upstream. This adds a mechanism to resume selected IRQs during syscore_resume instead of dpm_resume_noirq. Under Xen we need to resume IRQs associated with IPIs early enough that the resched IPI is unmasked and we can therefore schedule ourselves out of the stop_machine where the suspend/resume takes place. This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* time: Change jiffies_to_clock_t() argument type to unsigned longhank2011-11-111-1/+1
| | | | | | | | | | | | | | | | | | | commit cbbc719fccdb8cbd87350a05c0d33167c9b79365 upstream. The parameter's origin type is long. On an i386 architecture, it can easily be larger than 0x80000000, causing this function to convert it to a sign-extended u64 type. Change the type to unsigned long so we get the correct result. Signed-off-by: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> [ build fix ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* net: hold sock reference while processing tx timestampsRichard Cochran2011-11-112-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit da92b194cc36b5dc1fbd85206aeeffd80bee0c39 upstream. The pair of functions, * skb_clone_tx_timestamp() * skb_complete_tx_timestamp() were designed to allow timestamping in PHY devices. The first function, called during the MAC driver's hard_xmit method, identifies PTP protocol packets, clones them, and gives them to the PHY device driver. The PHY driver may hold onto the packet and deliver it at a later time using the second function, which adds the packet to the socket's error queue. As pointed out by Johannes, nothing prevents the socket from disappearing while the cloned packet is sitting in the PHY driver awaiting a timestamp. This patch fixes the issue by taking a reference on the socket for each such packet. In addition, the comments regarding the usage of these function are expanded to highlight the rule that PHY drivers must use skb_complete_tx_timestamp() to release the packet, in order to release the socket reference, too. These functions first appeared in v2.6.36. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* USB: fix ehci alignment errorHarro Haan2011-11-111-1/+1
| | | | | | | | | | | | | commit 276532ba9666b36974cbe16f303fc8be99c9da17 upstream. The Kirkwood gave an unaligned memory access error on line 742 of drivers/usb/host/echi-hcd.c: "ehci->last_periodic_enable = ktime_get_real();" Signed-off-by: Harro Haan <hrhaan@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* TTY: make tty_add_file non-failingJiri Slaby2011-11-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fa90e1c935472281de314e6d7c9a37db9cbc2e4e upstream. If tty_add_file fails at the point it is now, we have to revert all the changes we did to the tty. It means either decrease all refcounts if this was a tty reopen or delete the tty if it was newly allocated. There was a try to fix this in v3.0-rc2 using tty_release in 0259894c7 (TTY: fix fail path in tty_open). But instead it introduced a NULL dereference. It's because tty_release dereferences filp->private_data, but that one is set even in our tty_add_file. And when tty_add_file fails, it's still NULL/garbage. Hence tty_release cannot be called there. To circumvent the original leak (and the current NULL deref) we split tty_add_file into two functions, making the latter non-failing. In that case we may do the former early in open, where handling failures is easy. The latter stays as it is now. So there is no change in functionality. The original bug (leak) was introduced by f573bd176 (tty: Remove __GFP_NOFAIL from tty_add_file()). Thanks Dan for reporting this. Later, we may split tty_release into more functions and call only some of them in this fail path instead. (If at all possible.) Introduced-in: v2.6.37-rc2 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ftrace: Fix warning when CONFIG_FUNCTION_TRACER is not definedSteven Rostedt2011-10-161-2/+2
| | | | | | | | | | | commit 04da85b86188f224cc9b391b5bdd92a3ba20ffcf upstream. The struct ftrace_hash was declared within CONFIG_FUNCTION_TRACER but was referenced outside of it. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ftrace: Fix regression of :mod:module function enablingSteven Rostedt2011-10-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 43dd61c9a09bd413e837df829e6bfb42159be52a upstream. The new code that allows different utilities to pick and choose what functions they trace broke the :mod: hook that allows users to trace only functions of a particular module. The reason is that the :mod: hook bypasses the hash that is setup to allow individual users to trace their own functions and uses the global hash directly. But if the global hash has not been set up, it will cause a bug: echo '*:mod:radeon' > /sys/kernel/debug/set_ftrace_filter produces: [drm:drm_mode_getfb] *ERROR* invalid framebuffer id [drm:radeon_crtc_page_flip] *ERROR* failed to reserve new rbo buffer before flip BUG: unable to handle kernel paging request at ffffffff8160ec90 IP: [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 PGD 1a05067 PUD 1a09063 PMD 80000000016001e1 Oops: 0003 [#1] SMP Jul 7 04:02:28 phyllis kernel: [55303.858604] CPU 1 Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc rfcomm bnep ip6table_filter hid radeon r8169 ahci libahci mii ttm drm_kms_helper drm video i2c_algo_bit intel_agp intel_gtt Pid: 10344, comm: bash Tainted: G WC 3.0.0-rc5 #1 Dell Inc. Inspiron N5010/0YXXJJ RIP: 0010:[<ffffffff810d9136>] [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 RSP: 0018:ffff88003a96bda8 EFLAGS: 00010246 RAX: ffff8801301735c0 RBX: ffffffff8160ec80 RCX: 0000000000306ee0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880137c92940 RBP: ffff88003a96bdb8 R08: ffff880137c95680 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81c9df78 R13: ffff8801153d1000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f329c18a700(0000) GS:ffff880137c80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8160ec90 CR3: 000000003002b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 10344, threadinfo ffff88003a96a000, task ffff88012fcfc470) Stack: 0000000000000fd0 00000000000000fc ffff88003a96be38 ffffffff810d92f5 ffff88011c4c4e00 ffff880000000000 000000000b69f4d0 ffffffff8160ec80 ffff8800300e6f06 0000000081130295 0000000000000282 ffff8800300e6f00 Call Trace: [<ffffffff810d92f5>] match_records+0x155/0x1b0 [<ffffffff810d940c>] ftrace_mod_callback+0xbc/0x100 [<ffffffff810dafdf>] ftrace_regex_write+0x16f/0x210 [<ffffffff810db09f>] ftrace_filter_write+0xf/0x20 [<ffffffff81166e48>] vfs_write+0xc8/0x190 [<ffffffff81167001>] sys_write+0x51/0x90 [<ffffffff815c7e02>] system_call_fastpath+0x16/0x1b Code: 48 8b 33 31 d2 48 85 f6 75 33 49 89 d4 4c 03 63 08 49 8b 14 24 48 85 d2 48 89 10 74 04 48 89 42 08 49 89 04 24 4c 89 60 08 31 d2 RIP [<ffffffff810d9136>] add_hash_entry+0x66/0xd0 RSP <ffff88003a96bda8> CR2: ffffffff8160ec90 ---[ end trace a5d031828efdd88e ]--- Reported-by: Brian Marete <marete@toshnix.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* ipv6: fix NULL dereference in udp6_ufo_fragment()Jason Wang2011-10-161-1/+1
| | | | | | | | | | | | | | | | | | This patch fixes the issue caused by ef81bb40bf15f350fe865f31fa42f1082772a576 which is a backport of upstream 87c48fa3b4630905f98268dde838ee43626a060c. The problem does not exist in upstream. We do not check whether route is attached before trying to assign ip identification through route dest which lead NULL pointer dereference. This happens when host bridge transmit a packet from guest. This patch changes ipv6_select_ident() to accept in6_addr as its paramter and fix the issue by using the destination address in ipv6 header when no route is attached. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>