summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* [SCSI] async: introduce 'async_domain' typeDan Williams2012-07-201-4/+31
| | | | | | | | | | | | | | | | | | | This is in preparation for teaching async_synchronize_full() to sync all pending async work, and not just on the async_running domain. This conversion is functionally equivalent, just embedding the existing list in a new async_domain type. The .registered attribute is used in a later patch to distinguish between domains that want to be flushed by async_synchronize_full() versus those that only expect async_synchronize_{full|cookie}_domain to be used for flushing. [jejb: add async.h to scsi_priv.h for struct async_domain] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Tested-by: Eldad Zack <eldad@fogrefinery.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN ↵Nicholas Bellinger2012-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | scanning This patch changes virtio-scsi to use a new virtio_driver->scan() callback so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring operation, instead of from within virtscsi_probe(). This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur. This fixes a bug with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs. Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] usb-storage: add support for write cache quirkNamjae Jeon2012-07-201-1/+3
| | | | | | | | | | | Add support for write cache quirk on usb hdd. scsi driver will be set to wce by detecting write cache quirk in quirk list when plugging usb hdd. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] set to WCE if usb cache quirk is present.Namjae Jeon2012-07-201-0/+1
| | | | | | | | | | | | Make use of USB quirk method to identify such HDD while reading the cache status in sd_probe(). If cache quirk is present for the HDD, lets assume that cache is enabled and make WCE bit equal to 1. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] virtio-scsi: hotplug support for virtio-scsiCong Meng2012-07-201-0/+9
| | | | | | | | | | | | This patch implements the hotplug support for virtio-scsi. When there is a device attached/detached, the virtio-scsi driver will be signaled via event virtual queue and it will add/remove the scsi device in question automatically. Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libsas: trim sas_task of slow path infrastructureDan Williams2012-07-201-5/+9
| | | | | | | | | | | The timer and the completion are only used for slow path tasks (smp, and lldd tmfs), yet we incur the allocation space and cpu setup time for every fast path task. Cc: Xiangliang Yu <yuxiangl@marvell.com> Acked-by: Jack Wang <jack_wang@usish.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libsas: drop sata port multiplier infrastructureDan Williams2012-07-201-1/+0
| | | | | | | | | | On the way to add a new sata_device field, noticed that libsas is carrying port multiplier infrastructure that is explicitly disabled by sas_discover_sata(). The aic94xx touches the unused port_no, so leave that field in case there was some use for it. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libsas: add sas_eh_abort_handlerDan Williams2012-07-201-0/+1
| | | | | | | | | When recovering failed eh-cmnds let the lldd attempt an abort via scsi_abort_eh_cmnd before escalating. Reviewed-by: Jacek Danecki <jacek.danecki@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libsas: enforce eh strategy handlers only in eh contextDan Williams2012-07-201-0/+10
| | | | | | | | | | | | | The strategy handlers may be called in places that are problematic for libsas (i.e. sata resets outside of domain revalidation filtering / libata link recovery), or problematic for userspace (non-blocking ioctl to sleeping reset functions). However, these routines are also called for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them as long as we are running in the host's error handler, otherwise arrange for them to be triggered in eh_context. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libata, libsas: introduce sched_eh and end_eh port opsDan Williams2012-07-203-1/+12
| | | | | | | | | | | | | | | | | | | | When managing shost->host_eh_scheduled libata assumes that there is a 1:1 shost-to-ata_port relationship. libsas creates a 1:N relationship so it needs to manage host_eh_scheduled cumulatively at the host level. The sched_eh and end_eh port port ops allow libsas to track when domain devices enter/leave the "eh-pending" state under ha->lock (previously named ha->state_lock, but it is no longer just a lock for ha->state changes). Since host_eh_scheduled indicates eh without backing commands pinning the device it can be deallocated at any time. Move the taking of the domain_device reference under the port_lock to guarantee that the ata_port stays around for the duration of eh. Reviewed-by: Jacek Danecki <jacek.danecki@intel.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] scsi_dh: add scsi_dh_attached_handler_nameMike Snitzer2012-07-201-0/+6
| | | | | | | | | | | | | | | Introduce scsi_dh_attached_handler_name() to retrieve the name of the scsi_dh that is attached to the scsi_device associated with the provided request queue. Returns NULL if a scsi_dh is not attached. Also, fix scsi_dh_{attach,detach} function header comments to document @q rather than @sdev. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Babu Moger <babu.moger@netapp.com> Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] scsi_dh_alua: implement 'implied transition timeout'Rob Evers2012-07-201-0/+2
| | | | | | | | | | | | | | | | | During alua transitions, an array can return transitioning status in response to rtpg requests. These requests get retried for a maximum of 60 seconds by default before timing out. Sometimes this timeout isn't sufficient to allow the array to complete the transition. T10-spc4 addresses this under 'Report Target Port Groups' command. This update retrieves the timeout value from the storage array if available and retries the transitioning rtpgs for up to the 'implied transitioning timeout' value Signed-off-by: Rob Evers <revers@redhat.com> Reviewed-by: Babu Moger <babu.moger@netapp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] core, classes, mpt2sas: have scsi_internal_device_unblock take new stateMike Christie2012-07-201-1/+1
| | | | | | | | | | | | | | | This has scsi_internal_device_unblock/scsi_target_unblock take the new state to set the devices as an argument instead of always setting to running. The patch also converts users of these functions. This allows the FC and iSCSI class to transition devices from blocked to transport-offline, so that when fast_io_fail/replacement_timeout has fired we do not set the devices back to running. Instead, we set them to SDEV_TRANSPORT_OFFLINE. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] add new SDEV_TRANSPORT_OFFLINE stateMike Christie2012-07-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | This patch adds a new state SDEV_TRANSPORT_OFFLINE. It will be used by transport classes to offline devices for cases like when the fast_io_fail/recovery_tmo fires. In those cases we want all IO to fail, and we have not yet escalated to dev_loss_tmo behavior where we are removing the devices. Currently to handle this state, transport classes are setting the scsi_device's state to running, setting their internal session/port structs state to something that indicates failed, and then failing IO from some transport check in the queuecommand. The reason for the new value is so that users can distinguish between a device failure that is a result of a transport problem vs the wide range of errors that devices get offlined for when a scsi command times out and we offline the devices there. It also fixes the confusion as to why the transport class is failing IO, but has set the device state from blocked to running. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libfc: update fcp and exch statsVasu Dev2012-07-201-0/+1
| | | | | | | | | | | | Updates newly added stats from fc_get_host_stats, added new function fc_exch_update_stats to update exches related stats from fc_exch.c by going thru internal ema_list elements. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Acked-by : Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libfc: adds FCP failures statsVasu Dev2012-07-201-0/+6
| | | | | | | | | | Adds stats to track FCP pkt and frame alloc failure. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Acked-by : Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] libfc, fcoe, bnx2fc: cleanup fcoe_dev_statsVasu Dev2012-07-201-9/+8
| | | | | | | | | | | | | | | The libfc is used by fcoe but fcoe agnostic, and therefore should not have any fcoe references. So renaming fcoe_dev_stats from libfc as its for fc_stats. After that libfc is fcoe string free except some strings for Open-FCoE.org. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Acked-by : Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Acked-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* [SCSI] fc: add some more FC specific stats to fc_hostVasu Dev2012-07-201-0/+12
| | | | | | | | | | | | | | | | | | | | The libfc provides more flexibility and with that we can monitor some more FC specific stats for FC exches or FCP error cases, this patch add such new FC stats. The patch adds *only* FC specific new stats to existing fc_host attribute container. Added stats names are self explanatory as existing FC stats already has, however anyway still added commentary along their definition to describe them. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Acked-by : Robert Love <robert.w.love@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-07-191-10/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull last minute Ceph fixes from Sage Weil: "The important one fixes a bug in the socket failure handling behavior that was turned up in some recent failure injection testing. The other two are minor bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: endian bug in rbd_req_cb() rbd: Fix ceph_snap_context size calculation libceph: fix messenger retry
| * libceph: fix messenger retrySage Weil2012-07-171-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <sage@inktank.com>
* | Make wait_for_device_probe() also do scsi_complete_async_scans()Linus Torvalds2012-07-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a7a20d103994 ("sd: limit the scope of the async probe domain") make the SCSI device probing run device discovery in it's own async domain. However, as a result, the partition detection was no longer synchronized by async_synchronize_full() (which, despite the name, only synchronizes the global async space, not all of them). Which in turn meant that "wait_for_device_probe()" would not wait for the SCSI partitions to be parsed. And "wait_for_device_probe()" was what the boot time init code relied on for mounting the root filesystem. Now, most people never noticed this, because not only is it timing-dependent, but modern distributions all use initrd. So the root filesystem isn't actually on a disk at all. And then before they actually mount the final disk filesystem, they will have loaded the scsi-wait-scan module, which not only does the expected wait_for_device_probe(), but also does scsi_complete_async_scans(). [ Side note: scsi_complete_async_scans() had also been partially broken, but that was fixed in commit 43a8d39d0137 ("fix async probe regression"), so that same commit a7a20d103994 had actually broken setups even if you used scsi-wait-scan explicitly ] Solve this problem by just moving the scsi_complete_async_scans() call into wait_for_device_probe(). Everybody who wants to wait for device probing to finish really wants the SCSI probing to complete, so there's no reason not to do this. So now "wait_for_device_probe()" really does what the name implies, and properly waits for device probing to finish. This also removes the now unnecessary extra calls to scsi_complete_async_scans(). Reported-and-tested-by: Artem S. Tashkinov <t.artem@mailcity.com> Cc: Dan Williams <dan.j.williams@gmail.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: James Bottomley <jbottomley@parallels.com> Cc: Borislav Petkov <bp@amd64.org> Cc: linux-scsi <linux-scsi@vger.kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'pm-post-3.5-rc7' of ↵Linus Torvalds2012-07-172-4/+4
|\ \ | |/ |/| | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull a last-minute PM update from Rafael J. Wysocki: "This renames CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND to encourage future reuse of the capability in question in related cases." * tag 'pm-post-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPEND
| * PM: Rename CAP_EPOLLWAKEUP to CAP_BLOCK_SUSPENDMichael Kerrisk2012-07-172-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990, the capability introduced in 4d7e30d98939a0340022ccd49325a3d70f7e0238 to govern EPOLLWAKEUP seems misnamed: this capability is about governing the ability to suspend the system, not using a particular API flag (EPOLLWAKEUP). We should make the name of the capability more general to encourage reuse in related cases. (Whether or not this capability should also be used to govern the use of /sys/power/wake_lock is a question that needs to be separately resolved.) This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure that the old capability name doesn't make it out into the wild, could you please apply and push up the tree to ensure that it is incorporated for the 3.5 release. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-07-172-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) IPVS oops'ers: a) Should not reset skb->nf_bridge in forwarding hook (Lin Ming) b) 3.4 commit can cause ip_vs_control_cleanup to be invoked after the ipvs_core_ops are unregistered during rmmod (Julian ANastasov) 2) ixgbevf bringup failure can crash in TX descriptor cleanup (Alexander Duyck) 3) AX25 switch missing break statement hoses ROSE sockets (Alan Cox) 4) CAIF accesses freed per-net memory (Sjur Brandeland) 5) Network cgroup code has out-or-bounds accesses (Eric DUmazet), and accesses freed memory (Gao Feng) 6) Fix a crash in SCTP reported by Dave Jones caused by freeing an association still on a list (Neil HOrman) 7) __netdev_alloc_skb() regresses on GFP_DMA using drivers because that GFP flag is not being retained for the allocation (Eric Dumazet). 8) Missing NULL hceck in sch_sfb netlink message parsing (Alan Cox) 9) bnx2 crashes because TX index iteration is not bounded correctly (Michael Chan) 10) IPoIB generates warnings in TCP queue collapsing (via skb_try_coalesce) because it does not set skb->truesize correctly (Eric Dumazet) 11) vlan_info objects leak for the implicit vlan with ID 0 (Amir Hanania) 12) A fix for TX time stamp handling in gianfar does not transfer socket ownership from one packet to another correctly, resulting in a socket write space imbalance (Eric Dumazet) 13) Julia Lawall found several cases where we do a list iteration, and then at the loop termination unconditionally assume we ended up with real list object, rather than the list head itself (CNIC, RXRPC, mISDN). 14) The bonding driver handles procfs moving incorrectly when a device it manages is moved from one namespace to another (Eric Biederman) 15) Missing memory barriers in stmmac descriptor accesses result in various crashes (Deepak Sikri) 16) Fix handling of broadcast packets in batman-adv (Simon Wunderlich) 17) Properly check the sanity of sendmsg() lengths in ieee802154's dgram_sendmsg(). Dave Jones and others have hit and reported this bug (Sasha Levin) 18) Some drivers (b44 and b43legacy) on 64-bit machines stopped working because of how netdev_alloc_skb() was adjusted. Such drivers should now use alloc_skb() for obtaining bounce buffers. (Eric Dumazet) 19) atl1c mis-managed it's link state in that it stops the queue by hand on link down. The generic networking takes care of that and this double stop locks the queue down. So simply removing the driver's queue stop call fixes the problem (Cloud Ren) 20) Fix out-of-memory due to mis-accounting in net_em packet scheduler (Eric Dumazet) 21) If DCB and SR-IOV are configured at the same time in IXGBE the chip will hang because this is not supported (Alexander Duyck) 22) A commit to stop drivers using netdev->base_addr broke the CNIC driver (Michael Chan) 23) Timeout regression in ipset caused by an attempt to fix an overflow bug (Jozsef Kadlecsik). 24) mac80211 minstrel code allocates memory using incorrect size (Thomas Huehn) 25) llcp_sock_getname() needs to check for a NULL device otherwise we OOPS (Sasha Levin) 26) mwifiex leaks memory (Bing Zhao) 27) Propagate iwlwifi fix to iwlegacy, even when we're not associated we need to monitor for stuck queues in the watchdog handler (Stanislaw Geuszka) * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) ipvs: fix oops in ip_vs_dst_event on rmmod ipvs: fix oops on NAT reply in br_nf context ixgbevf: Fix panic when loading driver ax25: Fix missing break MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership caif: Fix access to freed pernet memory net: cgroup: fix access the unallocated memory in netprio cgroup ixgbevf: Prevent RX/TX statistics getting reset to zero sctp: Fix list corruption resulting from freeing an association on a list net: respect GFP_DMA in __netdev_alloc_skb() e1000e: fix test for PHY being accessible on 82577/8/9 and I217 e1000e: Correct link check logic for 82571 serdes sch_sfb: Fix missing NULL check bnx2: Fix bug in bnx2_free_tx_skbs(). IPoIB: fix skb truesize underestimatiom net: Fix memory leak - vlan_info struct gianfar: fix potential sk_wmem_alloc imbalance drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list iterator variable net/rxrpc/ar-peer.c: remove invalid reference to list iterator variable drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variable ...
| * | ipvs: fix oops on NAT reply in br_nf contextLin Ming2012-07-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPVS should not reset skb->nf_bridge in FORWARD hook by calling nf_reset for NAT replies. It triggers oops in br_nf_forward_finish. [ 579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112 [ 579.781792] PGD 218f9067 PUD 0 [ 579.781865] Oops: 0000 [#1] SMP [ 579.781945] CPU 0 [ 579.781983] Modules linked in: [ 579.782047] [ 579.782080] [ 579.782114] Pid: 4644, comm: qemu Tainted: G W 3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard /30E8 [ 579.782300] RIP: 0010:[<ffffffff817b1ca5>] [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112 [ 579.782455] RSP: 0018:ffff88007b003a98 EFLAGS: 00010287 [ 579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a [ 579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00 [ 579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90 [ 579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02 [ 579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000 [ 579.783177] FS: 0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70 [ 579.783306] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b [ 579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0 [ 579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760) [ 579.783919] Stack: [ 579.783959] ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00 [ 579.784110] ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7 [ 579.784260] ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0 [ 579.784477] Call Trace: [ 579.784523] <IRQ> [ 579.784562] [ 579.784603] [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8 [ 579.784707] [<ffffffff81704b58>] nf_iterate+0x47/0x7d [ 579.784797] [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae [ 579.784906] [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102 [ 579.784995] [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae [ 579.785175] [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b [ 579.785179] [<ffffffff817ac417>] __br_forward+0x97/0xa2 [ 579.785179] [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257 [ 579.785179] [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb [ 579.785179] [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1 [ 579.785179] [<ffffffff81704b58>] nf_iterate+0x47/0x7d [ 579.785179] [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44 [ 579.785179] [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102 [ 579.785179] [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44 [ 579.785179] [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54 [ 579.785179] [<ffffffff817ad62a>] br_handle_frame+0x213/0x229 [ 579.785179] [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257 [ 579.785179] [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1 [ 579.785179] [<ffffffff816e69fc>] process_backlog+0x99/0x1e2 [ 579.785179] [<ffffffff816e6800>] net_rx_action+0xdf/0x242 [ 579.785179] [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0 [ 579.785179] [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c [ 579.785179] [<ffffffff8188812c>] call_softirq+0x1c/0x30 The steps to reproduce as follow, 1. On Host1, setup brige br0(192.168.1.106) 2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd 3. Start IPVS service on Host1 ipvsadm -A -t 192.168.1.106:80 -s rr ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m 4. Run apache benchmark on Host2(192.168.1.101) ab -n 1000 http://192.168.1.106/ ip_vs_reply4 ip_vs_out handle_response ip_vs_notrack nf_reset() { skb->nf_bridge = NULL; } Actually, IPVS wants in this case just to replace nfct with untracked version. So replace the nf_reset(skb) call in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call. Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_ct_ecache: fix crash with multiple containers, one shutting downPablo Neira Ayuso2012-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hans reports that he's still hitting: BUG: unable to handle kernel NULL pointer dereference at 000000000000027c IP: [<ffffffff813615db>] netlink_has_listeners+0xb/0x60 PGD 0 Oops: 0000 [#3] PREEMPT SMP CPU 0 It happens when adding a number of containers with do: nfct_query(h, NFCT_Q_CREATE, ct); and most likely one namespace shuts down. this problem was supposed to be fixed by: 70e9942 netfilter: nf_conntrack: make event callback registration per-netns Still, it was missing one rcu_access_pointer to check if the callback is set or not. Reported-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | Merge branch 'fixes-for-linus' of ↵Linus Torvalds2012-07-171-1/+1
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull CMA and DMA-mapping fixes from Marek Szyprowski: "Another set of minor fixups for recently merged Contiguous Memory Allocator and ARM DMA-mapping changes. Those patches fix mysterious crashes on systems with CMA and Himem enabled as well as some corner cases caused by typical off-by-one bug." * 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: ARM: dma-mapping: modify condition check while freeing pages mm: cma: fix condition check when setting global cma area mm: cma: don't replace lowmem pages with highmem
| * | mm: cma: fix condition check when setting global cma areaMarek Szyprowski2012-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | dev_set_cma_area incorrectly assigned cma to global area on first call due to incorrect check. This patch fixes this issue. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
| | |
| \ \
| \ \
| \ \
*---. \ \ Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and ↵Linus Torvalds2012-07-143-11/+14
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU, perf, and scheduler fixes from Ingo Molnar. The RCU fix is a revert for an optimization that could cause deadlocks. One of the scheduler commits (164c33c6adee "sched: Fix fork() error path to not crash") is correct but not complete (some architectures like Tile are not covered yet) - the resulting additional fixes are still WIP and Ingo did not want to delay these pending fixes. See this thread on lkml: [PATCH] fork: fix error handling in dup_task() The perf fixes are just trivial oneliners. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix segfault with report and mixed guestmount use perf kvm: Fix regression with guest machine creation perf script: Fix format regression due to libtraceevent merge ring-buffer: Fix accounting of entries when removing pages ring-buffer: Fix crash due to uninitialized new_pages list head * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS/sched: Update scheduler file pattern sched/nohz: Rewrite and fix load-avg computation -- again sched: Fix fork() error path to not crash
| | | * | | sched/nohz: Rewrite and fix load-avg computation -- againPeter Zijlstra2012-07-051-0/+8
| | | | |/ | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thanks to Charles Wang for spotting the defects in the current code: - If we go idle during the sample window -- after sampling, we get a negative bias because we can negate our own sample. - If we wake up during the sample window we get a positive bias because we push the sample to a known active period. So rewrite the entire nohz load-avg muck once again, now adding copious documentation to the code. Reported-and-tested-by: Doug Smythies <dsmythies@telus.net> Reported-and-tested-by: Charles Wang <muming.wq@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins [ minor edits ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | Merge branch 'rcu/urgent' of ↵Ingo Molnar2012-07-063-11/+6
| |\ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull low probability CONFIG_RCU_BOOST=y deadlock fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"Paul E. McKenney2012-07-023-11/+6
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9. (Move PREEMPT_RCU preemption to switch_to() invocation). Testing by Sasha Levin <levinsasha928@gmail.com> showed that this can result in deadlock due to invoking the scheduler when one of the runqueue locks is held. Because this commit was simply a performance optimization, revert it. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Sasha Levin <levinsasha928@gmail.com>
* | | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2012-07-131-1/+9
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull the leap second fixes from Thomas Gleixner: "It's a rather large series, but well discussed, refined and reviewed. It got a massive testing by John, Prarit and tip. In theory we could split it into two parts. The first two patches f55a6faa3843: hrtimer: Provide clock_was_set_delayed() 4873fa070ae8: timekeeping: Fix leapsecond triggered load spike issue are merely preventing the stuff loops forever issues, which people have observed. But there is no point in delaying the other 4 commits which achieve full correctness into 3.6 as they are tagged for stable anyway. And I rather prefer to have the full fixes merged in bulk than a "prevent the observable wreckage and deal with the hidden fallout later" approach." * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Update hrtimer base offsets each hrtimer_interrupt timekeeping: Provide hrtimer update function hrtimers: Move lock held region in hrtimer_interrupt() timekeeping: Maintain ktime_t based offsets for hrtimers timekeeping: Fix leapsecond triggered load spike issue hrtimer: Provide clock_was_set_delayed()
| * | | | timekeeping: Provide hrtimer update functionThomas Gleixner2012-07-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | hrtimer: Provide clock_was_set_delayed()John Stultz2012-07-111-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds2012-07-113-4/+7
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge random patches from Andrew Morton. * Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (32 commits) memblock: free allocated memblock_reserved_regions later mm: sparse: fix usemap allocation above node descriptor section mm: sparse: fix section usemap placement calculation xtensa: fix incorrect memset shmem: cleanup shmem_add_to_page_cache shmem: fix negative rss in memcg memory.stat tmpfs: revert SEEK_DATA and SEEK_HOLE drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT fat: fix non-atomic NFS i_pos read MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section sgi-xp: nested calls to spin_lock_irqsave() fs: ramfs: file-nommu: add SetPageUptodate() drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails h8300/uaccess: add mising __clear_user() h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user() h8300/time: add missing #include <asm/irq_regs.h> h8300/signal: fix typo "statis" h8300/pgtable: add missing #include <asm-generic/pgtable.h> drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled ...
| * | | | | memblock: free allocated memblock_reserved_regions laterYinghai Lu2012-07-111-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memblock_free_reserved_regions() calls memblock_free(), but memblock_free() would double reserved.regions too, so we could free the old range for reserved.regions. Also tj said there is another bug which could be related to this. | I don't think we're saving any noticeable | amount by doing this "free - give it to page allocator - reserve | again" dancing. We should just allocate regions aligned to page | boundaries and free them later when memblock is no longer in use. in that case, when DEBUG_PAGEALLOC, will get panic: memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39 BUG: unable to handle kernel paging request at ffff88102febd948 IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155 PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU 0 Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447 RIP: 0010:[<ffffffff836a5774>] [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155 See the discussion at https://lkml.org/lkml/2012/6/13/469 So try to allocate with PAGE_SIZE alignment and free it later. Reported-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | mm: sparse: fix usemap allocation above node descriptor sectionYinghai Lu2012-07-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint in alloc_bootmem_section"), usemap allocations may easily be placed outside the optimal section that holds the node descriptor, even if there is space available in that section. This results in unnecessary hotplug dependencies that need to have the node unplugged before the section holding the usemap. The reason is that the bootmem allocator doesn't guarantee a linear search starting from the passed allocation goal but may start out at a much higher address absent an upper limit. Fix this by trying the allocation with the limit at the section end, then retry without if that fails. This keeps the fix from f5bf18fa22f8 of not panicking if the allocation does not fit in the section, but still makes sure to try to stay within the section at first. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | memory hotplug: fix invalid memory access caused by stale kswapd pointerJiang Liu2012-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kswapd_stop() is called to destroy the kswapd work thread when all memory of a NUMA node has been offlined. But kswapd_stop() only terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL. The stale pointer will prevent kswapd_run() from creating a new work thread when adding memory to the memory-less NUMA node again. Eventually the stale pointer may cause invalid memory access. An example stack dump as below. It's reproduced with 2.6.32, but latest kernel has the same issue. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81051a94>] exit_creds+0x12/0x78 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/memory/memory391/state CPU 11 Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285 RIP: 0010:exit_creds+0x12/0x78 RSP: 0018:ffff8806044f1d78 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000 RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10 R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000 R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001 FS: 00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600) Stack: ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1 0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003 Call Trace: __put_task_struct+0x5d/0x97 kthread_stop+0x50/0x58 offline_pages+0x324/0x3da memory_block_change_state+0x179/0x1db store_mem_state+0x9e/0xbb sysfs_write_file+0xd0/0x107 vfs_write+0xad/0x169 sys_write+0x45/0x6e system_call_fastpath+0x16/0x1b Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0 RIP exit_creds+0x12/0x78 RSP <ffff8806044f1d78> CR2: 0000000000000000 [akpm@linux-foundation.org: add pglist_data.kswapd locking comments] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2012-07-112-3/+11
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of three fixes for data corruption (libsas task file), oops causing (NULL in scsi_cmd_to_driver) and driver failure (bnx2i). The oops caused by the NULL in scsi_cmd_to_driver() manifests in scsi_eh_send_cmd() and has been seen by several people now. Signed-off-by: James Bottomley <JBottomley@Parallels.com>" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: [SCSI] bnx2i: Removed the reference to the netdev->base_addr [SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtf [SCSI] Fix NULL dereferences in scsi_cmd_to_driver
| * | | | | | [SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtfDan Williams2012-07-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fill_result_tf() grabs the taskfile flags from the originating qc which sas_ata_qc_fill_rtf() promptly overwrites. The presence of an ata_taskfile in the sata_device makes it tempting to just copy the full contents in sas_ata_qc_fill_rtf(). However, libata really only wants the fis contents and expects the other portions of the taskfile to not be touched by ->qc_fill_rtf. To that end store a fis buffer in the sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf() implementation. Cc: <stable@vger.kernel.org> Reported-by: Praveen Murali <pmurali@logicube.com> Tested-by: Praveen Murali <pmurali@logicube.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| * | | | | | [SCSI] Fix NULL dereferences in scsi_cmd_to_driverMark Rustad2012-07-081-1/+7
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid crashing if the private_data pointer happens to be NULL. This has been seen sometimes when a host reset happens, notably when there are many LUNs: host3: Assigned Port ID 0c1601 scsi host3: libfc: Host reset succeeded on port (0c1601) BUG: unable to handle kernel NULL pointer dereference at 0000000000000350 IP: [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0 <snip> Process scsi_eh_3 (pid: 4144, threadinfo ffff88030920c000, task ffff880326b160c0) Stack: 000000010372e6ba 0000000000000282 000027100920dca0 ffffffffa0038ee0 0000000000000000 0000000000030003 ffff88030920dc80 ffff88030920dc80 00000002000e0000 0000000a00004000 ffff8803242f7760 ffff88031326ed80 Call Trace: [<ffffffff8105b590>] ? lock_timer_base+0x70/0x70 [<ffffffff81352fbe>] scsi_eh_tur+0x3e/0xc0 [<ffffffff81353a36>] scsi_eh_test_devices+0x76/0x170 [<ffffffff81354125>] scsi_eh_host_reset+0x85/0x160 [<ffffffff81354291>] scsi_eh_ready_devs+0x91/0x110 [<ffffffff813543fd>] scsi_unjam_host+0xed/0x1f0 [<ffffffff813546a8>] scsi_error_handler+0x1a8/0x200 [<ffffffff81354500>] ? scsi_unjam_host+0x1f0/0x1f0 [<ffffffff8106ec3e>] kthread+0x9e/0xb0 [<ffffffff81509264>] kernel_thread_helper+0x4/0x10 [<ffffffff8106eba0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81509260>] ? gs_change+0x13/0x13 Code: 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 87 80 00 00 00 48 8d b5 60 ff ff ff 89 d1 48 89 fb 41 89 d6 4c 89 fa 48 8b 80 b8 00 00 00 <48> 8b 80 50 03 00 00 48 8b 00 48 89 85 38 ff ff ff 48 8b 07 4c RIP [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0 RSP <ffff88030920dc50> CR2: 0000000000000350 Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
* | | | | | Merge tag 'usb-3.5-rc6' of ↵Linus Torvalds2012-07-111-2/+0
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg Kroah-Hartman: "Here are a few fixes and new device ids for the 3.5-rc6 tree. The PCI changes resolve a long-standing issue with resuming some EHCI controllers. It has been acked by the PCI maintainer, and he asked for it to go through my USB tree instead of his. The xhci patches also resolve a number of reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'usb-3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: PCI: EHCI: fix crash during suspend on ASUS computers USB: cdc-wdm: fix lockup on error in wdm_read USB: metro-usb: fix tty_flip_buffer_push use USB: option: Add MEDIATEK product ids USB: option: add ZTE MF60 xhci: Fix hang on back-to-back Set TR Deq Ptr commands. usb: Add support for root hub port status CAS
| * | | | | PCI: EHCI: fix crash during suspend on ASUS computersAlan Stern2012-07-101-2/+0
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quite a few ASUS computers experience a nasty problem, related to the EHCI controllers, when going into system suspend. It was observed that the problem didn't occur if the controllers were not put into the D3 power state before starting the suspend, and commit 151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during suspend on ASUS computers) was created to do this. It turned out this approach messed up other computers that didn't have the problem -- it prevented USB wakeup from working. Consequently commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b (USB: add NO_D3_DURING_SLEEP flag and revert 151b61284776be2) was merged; it reverted the earlier commit and added a whitelist of known good board names. Now we know the actual cause of the problem. Thanks to AceLan Kao for tracking it down. According to him, an engineer at ASUS explained that some of their BIOSes contain a bug that was added in an attempt to work around a problem in early versions of Windows. When the computer goes into S3 suspend, the BIOS tries to verify that the EHCI controllers were first quiesced by the OS. Nothing's wrong with this, but the BIOS does it by checking that the PCI COMMAND registers contain 0 without checking the controllers' power state. If the register isn't 0, the BIOS assumes the controller needs to be quiesced and tries to do so. This involves making various MMIO accesses to the controller, which don't work very well if the controller is already in D3. The end result is a system hang or memory corruption. Since the value in the PCI COMMAND register doesn't matter once the controller has been suspended, and since the value will be restored anyway when the controller is resumed, we can work around the BIOS bug simply by setting the register to 0 during system suspend. This patch (as1590) does so and also reverts the second commit mentioned above, which is now unnecessary. In theory we could do this for every PCI device. However to avoid introducing new problems, the patch restricts itself to EHCI host controllers. Finally the affected systems can suspend with USB wakeup working properly. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728 Based-on-patch-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Dâniel Fraga <fragabr@gmail.com> Tested-by: Javier Marcet <jmarcet@gmail.com> Tested-by: Andrey Rahmatullin <wrar@wrar.name> Tested-by: Oleksij Rempel <bug-track@fisher-privat.net> Tested-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Cc: stable <stable@vger.kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | | Merge tag 'fixes-for-v3.5' of ↵Linus Torvalds2012-07-101-2/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "Yes, this is a *LATE* GPIO pull request with fixes for v3.5. Grant moved across the planet and accidentally fell off the grid, so he asked me to take over the GPIO merges for a while 10 days ago. Since then I went over the archives and collected this pile of fixes, and pulled two of them from the TI maintainer Kevin Hilman. Then waited for them to at least hit linux-next once or twice." GPIO fixes for v3.5: - Invalid context restore on bank 0 for OMAP driver in runtime suspend/resume cycle - Check for NULL platform data in sta-2x11 driver - Constrain selection of the V1 MSM GPIO driver to applicable platforms (Kconfig issue) - Make sure the correct output value is set in the wm8994 driver - Export devm_gpio_request_one() so it can be used in modules. Apparently some in-kernel modules can be configured to use this leading to breakage. - Check that the GPIO is valid in the lantiq driver - Fix the flag bits introduced for v3.5, so they don't overlap - Fix a device tree intialization bug for imx21-compatible devices - Carry over the OF node to the TPS65910 GPIO chip struct * tag 'fixes-for-v3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: tps65910: initialize of_node of gpio_chip gpio/mxc: make irqs work for fsl,imx21-gpio devices gpio: fix bits conflict for gpio flags mips: pci-lantiq: Fix check for valid gpio gpio: export devm_gpio_request_one gpiolib: wm8994: Pay attention to the value set when enabling as output gpio/msm_v1: CONFIG_GPIO_MSM_V1 is only available on three SoCs gpio-sta2x11: don't use pdata if null gpio/omap: fix invalid context restore of gpio bank-0 gpio/omap: fix irq loss while in idle with debounce on
| * | | | | gpio: fix bits conflict for gpio flagsLaxman Dewangan2012-07-051-2/+2
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bit 2 and 3 in GPIO flag are allocated for the flag OPEN_DRAIN/OPEN_SOURCE. These bits are reused for the flag EXPORT/EXPORT_CHANGEABLE and so creating conflict. Fix this conflict by assigning bit 4 and 5 for the flag EXPORT/EXPORT_CHANGEABLE. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
* | | | | Merge tag 'rpmsg-3.5-fixes' of ↵Linus Torvalds2012-07-091-0/+6
|\ \ \ \ \ | |_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg Pull rpmsg fixes from Ohad Ben-Cohen: "Fixing two (somewhat rare) endpoint-related race issues, both of which were reported by Fernando Guzman Lugo." * tag 'rpmsg-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg: rpmsg: make sure inflight messages don't invoke just-removed callbacks rpmsg: avoid premature deallocation of endpoints
| * | | | rpmsg: make sure inflight messages don't invoke just-removed callbacksOhad Ben-Cohen2012-07-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When inbound messages arrive, rpmsg core looks up their associated endpoint (by destination address) and then invokes their callback. We've made sure that endpoints will never be de-allocated after they were found by rpmsg core, but we also need to protect against the (rare) scenario where the rpmsg driver was just removed, and its callback function isn't available anymore. This is achieved by introducing a callback mutex, which must be taken before the callback is invoked, and, obviously, before it is removed. Cc: stable <stable@vger.kernel.org> Reported-by: Fernando Guzman Lugo <fernando.lugo@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
| * | | | rpmsg: avoid premature deallocation of endpointsOhad Ben-Cohen2012-07-041-0/+3
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an inbound message arrives, the rpmsg core looks up its associated endpoint and invokes the registered callback. If a message arrives while its endpoint is being removed (because the rpmsg driver was removed, or a recovery of a remote processor has kicked in) we must ensure atomicity, i.e.: - Either the ept is removed before it is found or - The ept is found but will not be freed until the callback returns This is achieved by maintaining a per-ept reference count, which, when drops to zero, will trigger deallocation of the ept. With this in hand, it is now forbidden to directly deallocate epts once they have been added to the endpoints idr. Cc: stable <stable@vger.kernel.org> Reported-by: Fernando Guzman Lugo <fernando.lugo@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
* | | | security: Minor improvements to no_new_privs documentationAndy Lutomirski2012-07-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The documentation didn't actually mention how to enable no_new_privs. This also adds a note about possible interactions between no_new_privs and LSMs (i.e. why teaching systemd to set no_new_privs is not necessarily a good idea), and it references the new docs from include/linux/prctl.h. Suggested-by: Rob Landley <rob@landley.net> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>