summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* PM / QoS: Add function dev_pm_qos_read_value() (v3)Rafael J. Wysocki2011-10-042-10/+12
| | | | | | | | | | | | | | | | | | To read the current PM QoS value for a given device we need to make sure that the device's power.constraints object won't be removed while we're doing that. For this reason, put the operation under dev->power.lock and acquire the lock around the initialization and removal of power.constraints. Moreover, since we're using the value of power.constraints to determine whether or not the object is present, the power.constraints_state field isn't necessary any more and may be removed. However, dev_pm_qos_add_request() needs to check if the device is being removed from the system before allocating a new PM QoS constraints object for it, so make it use the power.power_state field of struct device for this purpose. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM QoS: Add global notification mechanism for device constraintsJean Pihet2011-08-251-0/+11
| | | | | | | | | | | | Add a global notification chain that gets called upon changes to the aggregated constraint value for any device. The notification callbacks are passing the full constraint request data in order for the callees to have access to it. The current use is for the platform low-level code to access the target device of the constraint. Signed-off-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM QoS: Implement per-device PM QoS constraintsJean Pihet2011-08-252-0/+51
| | | | | | | | | | | | | | | | | | | | | | Implement the per-device PM QoS constraints by creating a device PM QoS API, which calls the PM QoS constraints management core code. The per-device latency constraints data strctures are stored in the device dev_pm_info struct. The device PM code calls the init and destroy of the per-device constraints data struct in order to support the dynamic insertion and removal of the devices in the system. To minimize the data usage by the per-device constraints, the data struct is only allocated at the first call to dev_pm_qos_add_request. The data is later free'd when the device is removed from the system. A global mutex protects the constraints users from the data being allocated and free'd. Signed-off-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM QoS: Generalize and export constraints management codeJean Pihet2011-08-251-0/+14
| | | | | | | | | | | | | | | | | In preparation for the per-device constratins support: - rename update_target to pm_qos_update_target - generalize and export pm_qos_update_target for usage by the upcoming per-device latency constraints framework: * operate on struct pm_qos_constraints for constraints management, * introduce an 'action' parameter for constraints add/update/remove, * the return value indicates if the aggregated constraint value has changed, - update the internal code to operate on struct pm_qos_constraints - add a NULL pointer check in the API functions Signed-off-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM QoS: Reorganize data structsJean Pihet2011-08-251-0/+19
| | | | | | | | | | | | | | In preparation for the per-device constratins support, re-organize the data strctures: - add a struct pm_qos_constraints which contains the constraints related data - update struct pm_qos_object contents to the PM QoS internal object data. Add a pointer to struct pm_qos_constraints - update the internal code to use the new data structs. Signed-off-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM QoS: Minor clean-upsJean Pihet2011-08-253-13/+13
| | | | | | | | | | | | | | - Misc fixes to improve code readability: * rename struct pm_qos_request_list to struct pm_qos_request, * rename pm_qos_req parameter to req in internal code, consistenly use req in the API parameters, * update the in-kernel API callers to the new parameters names, * rename of fields names (requests, list, node, constraints) Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: markgross <markgross@thegnar.org> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM QoS: Move and rename the implementation filesJean Pihet2011-08-253-6/+29
| | | | | | | | | | | | The PM QoS implementation files are better named kernel/power/qos.c and include/linux/pm_qos.h. The PM QoS support is compiled under the CONFIG_PM option. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: markgross <markgross@thegnar.org> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM: Move clock-related definitions and headers to separate fileRafael J. Wysocki2011-08-252-56/+71
| | | | | | | | | | Since the PM clock management code in drivers/base/power/clock_ops.c is used for both runtime PM and system suspend/hibernation, the definitions of data structures and headers related to it should not be located in include/linux/pm_rumtime.h. Move them to a separate header file. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Domains: Use power.sybsys_data to reduce overheadRafael J. Wysocki2011-08-253-6/+19
| | | | | | | | | | | Currently pm_genpd_runtime_resume() has to walk the list of devices from the device's PM domain to find the corresponding device list object containing the need_restore field to check if the driver's .runtime_resume() callback should be executed for the device. This is suboptimal and can be simplified by using power.sybsys_data to store device information used by the generic PM domains code. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM: Reference counting of power.subsys_dataRafael J. Wysocki2011-08-251-0/+3
| | | | | | | | | | | Since the power.subsys_data device field will be used by multiple filesystems, introduce a reference counting mechanism for it to avoid freeing it prematurely or changing its value at a wrong time. Make the PM clocks management code that currently is the only user of power.subsys_data use the new reference counting. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM: Introduce struct pm_subsys_dataRafael J. Wysocki2011-08-253-3/+19
| | | | | | | | | Introduce struct pm_subsys_data that may be subclassed by subsystems to store subsystem-specific information related to the device. Move the clock management fields accessed through the power.subsys_data pointer in struct device to the new strucutre. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Domains: Rename GPD_STATE_WAIT_PARENT to GPD_STATE_WAIT_MASTERRafael J. Wysocki2011-08-251-1/+1
| | | | | | | | Since it is now possible for a PM domain to have multiple masters instead of one parent, rename the "wait for parent" status to reflect the new situation. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Domains: Allow generic PM domains to have multiple mastersRafael J. Wysocki2011-08-251-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, for a given generic PM domain there may be only one parent domain (i.e. a PM domain it depends on). However, there is at least one real-life case in which there should be two parents (masters) for one PM domain (the A3RV domain on SH7372 turns out to depend on the A4LC domain and it depends on the A4R domain and the same time). For this reason, allow a PM domain to have multiple parents (masters) by introducing objects representing links between PM domains. The (logical) links between PM domains represent relationships in which one domain is a master (i.e. it is depended on) and another domain is a slave (i.e. it depends on the master) with the rule that the slave cannot be powered on if the master is not powered on and the master cannot be powered off if the slave is not powered off. Each struct generic_pm_domain object representing a PM domain has two lists of links, a list of links in which it is a master and a list of links in which it is a slave. The first of these lists replaces the list of subdomains and the second one is used in place of the parent pointer. Each link is represented by struct gpd_link object containing pointers to the master and the slave and two struct list_head members allowing it to hook into two lists (the master's list of "master" links and the slave's list of "slave" links). This allows the code to get to the link from each side (either from the master or from the slave) and follow it in each direction. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Domains: Add "wait for parent" status for generic PM domainsRafael J. Wysocki2011-08-251-0/+1
| | | | | | | | | | | | | | | | | | | | The next patch will make it possible for a generic PM domain to have multiple parents (i.e. multiple PM domains it depends on). To prepare for that change it is necessary to change pm_genpd_poweron() so that it doesn't jump to the start label after running itself recursively for the parent domain. For this purpose, introduce a new PM domain status value GPD_STATE_WAIT_PARENT that will be set by pm_genpd_poweron() before calling itself recursively for the parent domain and modify the code in drivers/base/power/domain.c so that the GPD_STATE_WAIT_PARENT status is guaranteed to be preserved during the execution of pm_genpd_poweron() for the parent. This change also causes pm_genpd_add_subdomain() and pm_genpd_remove_subdomain() to wait for started pm_genpd_poweron() to complete and allows pm_genpd_runtime_resume() to avoid dropping the lock after powering on the PM domain. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Domains: Implement subdomain counters as atomic fieldsRafael J. Wysocki2011-08-251-1/+1
| | | | | | | | | | | | | | | | | | Currently, pm_genpd_poweron() and pm_genpd_poweroff() need to take the parent PM domain's lock in order to modify the parent's counter of active subdomains in a nonracy way. This causes the locking to be considerably complex and in fact is not necessary, because the subdomain counters may be implemented as atomic fields and they won't have to be modified under a lock. Replace the unsigned in sd_count field in struct generic_pm_domain by an atomic_t one and modify the code in drivers/base/power/domain.c to take this change into account. This patch doesn't change the locking yet, that is going to be done in a separate subsequent patch. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-08-241-1/+8
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message fuse: mark pages accessed when written to fuse: delete dead .write_begin and .write_end aops fuse: fix flock fuse: fix non-ANSI void function notation
| * fuse: fix flockMiklos Szeredi2011-08-081-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a number of issues with supporing flock locks over existing POSIX locking infrastructure: - it's not backward compatible, passing flock(2) calls to userspace unconditionally (if userspace sets FUSE_POSIX_LOCKS) - it doesn't cater for the fact that flock locks are automatically unlocked on file release - it doesn't take into account the fact that flock exclusive locks (write locks) don't need an fd opened for write. The last one invalidates the original premise of the patch that flock locks can be emulated with POSIX locks. This patch fixes the first two issues. The last one needs to be fixed in userspace if the filesystem assumed that a write lock will happen only on a file operned for write (as in the case of the current fuse library). Reported-by: Sebastian Pipping <webmaster@hartwork.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2011-08-197-15/+101
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-block: (23 commits) Revert "cfq: Remove special treatment for metadata rqs." block: fix flush machinery for stacking drivers with differring flush flags block: improve rq_affinity placement blktrace: add FLUSH/FUA support Move some REQ flags to the common bio/request area allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH xen/blkback: Make description more obvious. cfq-iosched: Add documentation about idling block: Make rq_affinity = 1 work as expected block: swim3: fix unterminated of_device_id table block/genhd.c: remove useless cast in diskstats_show() drivers/cdrom/cdrom.c: relax check on dvd manufacturer value drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse bsg-lib: add module.h include cfq-iosched: Reduce linked group count upon group destruction blk-throttle: correctly determine sync bio loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices loop: add management interface for on-demand device allocation loop: replace linked list of allocated devices with an idr index ...
| * | block: fix flush machinery for stacking drivers with differring flush flagsJeff Moyer2011-08-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement FLUSH/FUA to support merge, introduced a performance regression when running any sort of fsyncing workload using dm-multipath and certain storage (in our case, an HP EVA). The test I ran was fs_mark, and it dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out that dm-multipath always advertised flush+fua support, and passed commands on down the stack, where those flags used to get stripped off. The above commit changed that behavior: static inline struct request *__elv_next_request(struct request_queue *q) { struct request *rq; while (1) { - while (!list_empty(&q->queue_head)) { + if (!list_empty(&q->queue_head)) { rq = list_entry_rq(q->queue_head.next); - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || - (rq->cmd_flags & REQ_FLUSH_SEQ)) - return rq; - rq = blk_do_flush(q, rq); - if (rq) - return rq; + return rq; } Note that previously, a command would come in here, have REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush: struct request *blk_do_flush(struct request_queue *q, struct request *rq) { unsigned int fflags = q->flush_flags; /* may change, cache it */ bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); unsigned skip = 0; ... if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { rq->cmd_flags &= ~REQ_FLUSH; if (!has_fua) rq->cmd_flags &= ~REQ_FUA; return rq; } So, the flush machinery was bypassed in such cases (q->flush_flags == 0 && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)). Now, however, we don't get into the flush machinery at all. Instead, __elv_next_request just hands a request with flush and fua bits set to the scsi_request_fn, even if the underlying request_queue does not support flush or fua. The agreed upon approach is to fix the flush machinery to allow stacking. While this isn't used in practice (since there is only one request-based dm target, and that target will now reflect the flush flags of the underlying device), it does future-proof the solution, and make it function as designed. In order to make this work, I had to add a field to the struct request, inside the flush structure (to store the original req->end_io). Shaohua had suggested overloading the union with rb_node and completion_data, but the completion data is used by device mapper and can also be used by other drivers. So, I didn't see a way around the additional field. I tested this patch on an HP EVA with both ext4 and xfs, and it recovers the lost performance. Comments and other testers, as always, are appreciated. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | blktrace: add FLUSH/FUA supportNamhyung Kim2011-08-112-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or FUA follows WRITE, use the same 'F' flag for both cases and distinguish them by their (relative) position. The end results look like (other flags might be shown also): - WRITE: W - WRITE_FLUSH: FW - WRITE_FUA: WF - WRITE_FLUSH_FUA: FWF Note that we reuse TC_BARRIER due to lack of bit space of act_mask so that the older versions of blktrace tools will report flush requests as barriers from now on. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | Move some REQ flags to the common bio/request areaMatthew Wilcox2011-08-111-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as on a request, so relocate them to the shared part of the enum. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@gmail.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | loop: add management interface for on-demand device allocationKay Sievers2011-07-312-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Loop devices today have a fixed pre-allocated number of usually 8. The number can only be changed at module init time. To find a free device to use, /dev/loop%i needs to be scanned, and all devices need to be opened until a free one is possibly found. This adds a new /dev/loop-control device node, that allows to dynamically find or allocate a free device, and to add and remove loop devices from the running system: LOOP_CTL_ADD adds a specific device. Arg is the number of the device. It returns the device i or a negative error code. LOOP_CTL_REMOVE removes a specific device, Arg is the number the device. It returns the device i or a negative error code. LOOP_CTL_GET_FREE finds the next unbound device or allocates a new one. No arg is given. It returns the device i or a negative error code. The loop kernel module gets automatically loaded when /dev/loop-control is accessed the first time. The alias specified in the module, instructs udev to create this 'dead' device node, even when the module is not loaded. Example: cfd = open("/dev/loop-control", O_RDWR); # add a new specific loop device err = ioctl(cfd, LOOP_CTL_ADD, devnr); # remove a specific loop device err = ioctl(cfd, LOOP_CTL_REMOVE, devnr); # find or allocate a free loop device to use devnr = ioctl(cfd, LOOP_CTL_GET_FREE); sprintf(loopname, "/dev/loop%i", devnr); ffd = open("backing-file", O_RDWR); lfd = open(loopname, O_RDWR); err = ioctl(lfd, LOOP_SET_FD, ffd); Cc: Tejun Heo <tj@kernel.org> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | loop: replace linked list of allocated devices with an idr indexKay Sievers2011-07-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the linked list, that keeps track of allocated devices, with an idr index to allow a more efficient lookup of devices. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | block: add bsg helper libraryMike Christie2011-07-312-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the FC classes bsg code to the block layer and makes it a lib so that other classes like iscsi and SAS can use it. It is helpful because working with the request queue, bios, creating scatterlists, etc are a pain that the LLD does not have to worry about with normal IOs and should not have to worry about for bsg requests. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-191-1/+15
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: OF: Don't crash when bridge parent is NULL. PCI: export pcie_bus_configure_settings symbol PCI: code and comments cleanup PCI: make cardbus-bridge resources optional PCI: make SRIOV resources optional PCI : ability to relocate assigned pci-resources PCI: honor child buses add_size in hot plug configuration PCI: Set PCI-E Max Payload Size on fabric
| * | | PCI : ability to relocate assigned pci-resourcesRam Pai2011-08-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently pci-bridges are allocated enough resources to satisfy their immediate requirements. Any additional resource-requests fail if additional free space, contiguous to the one already allocated, is not available. This behavior is not reasonable since sufficient contiguous resources, that can satisfy the request, are available at a different location. This patch provides the ability to expand and relocate a allocated resource. v2: Changelog: Fixed size calculation in pci_reassign_resource() v3: Changelog : Split this patch. The resource.c changes are already upstream. All the pci driver changes are in here. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | PCI: Set PCI-E Max Payload Size on fabricJon Mason2011-08-011-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a given PCI-E fabric, each device, bridge, and root port can have a different PCI-E maximum payload size. There is a sizable performance boost for having the largest possible maximum payload size on each PCI-E device. However, if improperly configured, fatal bus errors can occur. Thus, it is important to ensure that PCI-E payloads sends by a device are never larger than the MPS setting of all devices on the way to the destination. This can be achieved two ways: - A conservative approach is to use the smallest common denominator of the entire tree below a root complex for every device on that fabric. This means for example that having a 128 bytes MPS USB controller on one leg of a switch will dramatically reduce performances of a video card or 10GE adapter on another leg of that same switch. It also means that any hierarchy supporting hotplug slots (including expresscard or thunderbolt I suppose, dbl check that) will have to be entirely clamped to 128 bytes since we cannot predict what will be plugged into those slots, and we cannot change the MPS on a "live" system. - A more optimal way is possible, if it falls within a couple of constraints: * The top-level host bridge will never generate packets larger than the smallest TLP (or if it can be controlled independently from its MPS at least) * The device will never generate packets larger than MPS (which can be configured via MRRS) * No support of direct PCI-E <-> PCI-E transfers between devices without some additional code to specifically deal with that case Then we can use an approach that basically ignores downstream requests and focuses exclusively on upstream requests. In that case, all we need to care about is that a device MPS is no larger than its parent MPS, which allows us to keep all switches/bridges to the max MPS supported by their parent and eventually the PHB. In this case, your USB controller would no longer "starve" your 10GE Ethernet and your hotplug slots won't affect your global MPS. Additionally, the hotplugged devices themselves can be configured to a larger MPS up to the value configured in the hotplug bridge. To choose between the two available options, two PCI kernel boot args have been added to the PCI calls. "pcie_bus_safe" will provide the former behavior, while "pcie_bus_perf" will perform the latter behavior. By default, the latter behavior is used. NOTE: due to the location of the enablement, each arch will need to add calls to this function. This patch only enables x86. This patch includes a number of changes recommended by Benjamin Herrenschmidt. Tested-by: Jordan_Hargrave@dell.com Signed-off-by: Jon Mason <mason@myri.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | | | Merge branch 'pm-fixes' of ↵Linus Torvalds2011-08-171-3/+7
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
| * | | | PM / Domains: Fix build for CONFIG_PM_RUNTIME unsetRafael J. Wysocki2011-08-141-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function genpd_queue_power_off_work() is not defined for CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build error to happen in that case. Fix the problem by making pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* | | | | mm: fix __page_to_pfn for a const struct page argumentIan Campbell2011-08-172-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the cast in lowmem_page_address (introduced as a warning fixup to 33dd4e0ec911 "mm: make some struct page's const") to be removed. Propagate const'ness to page_to_section() as well since it is required by __page_to_pfn. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | mm: make HASHED_PAGE_VIRTUAL page_address' struct page argument const.Ian Campbell2011-08-172-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Followup to 33dd4e0ec911 "mm: make some struct page's const" which missed the HASHED_PAGE_VIRTUAL case. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2011-08-171-0/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rtc: Limit RTC PIE frequency rtc: Fix hrtimer deadlock rtc: Handle errors correctly in rtc_irq_set_state() Fixup trivial conflicts in drivers/rtc/interface.c due to slightly trivially versions of the same patch coming in two different ways.
| * | | | | rtc: Limit RTC PIE frequencyThomas Gleixner2011-07-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RTC pie hrtimer is self rearming. We really need to limit the frequency to something sensible. Thus limit it to the 8192Hz max value from the rtc man documentation Cc: Willy Tarreau <w@1wt.eu> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [jstultz: slightly reworked to use RTC_MAX_FREQ value] Signed-off-by: John Stultz <john.stultz@linaro.org>
* | | | | | Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds2011-08-172-1/+11
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Track the owner of irq descriptor irq: Always set IRQF_ONESHOT if no primary handler is specified genirq: Fix wrong bit operation
| * | | | | | irq: Track the owner of irq descriptorSebastian Andrzej Siewior2011-07-282-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Interrupt descriptors can be allocated from modules. The interrupts are used by other modules, but we have no refcount on the module which provides the interrupts and there is no way to establish one on the device level as the interrupt using module is agnostic to the fact that the interrupt is provided by a module rather than by some builtin interrupt controller. To prevent removal of the interrupt providing module, we can track the owner of the interrupt descriptor, which also provides the relevant irq chip functions in the irq descriptor. request/setup_irq() can now acquire a refcount on the owner module to prevent unloading. free_irq() drops the refcount. Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Link: http://lkml.kernel.org/r/20110711101731.GA13804@Chamillionaire.breakpoint.cc Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-141-2/+0
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: mmc: remove unused "ddr" parameter in struct mmc_ios mmc: dw_mmc: Fix DDR mode support. mmc: core: use defined R1_STATE_PRG macro for card status mmc: sdhci: use f_max instead of host->clock for timeouts mmc: sdhci: move timeout_clk calculation farther down mmc: sdhci: check host->clock before using it as a denominator mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK" mmc: tmio: eliminate unused variable 'mmc' warning mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC mmc: sdhci-s3c: Fix build for header change mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro mmc: cb710: fix possible pci_dev leak in cb710_pci_configure() mmc: core: Detect eMMC v4.5 ext_csd entries mmc: mmc_test: avoid stalled file in debugfs mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
| * | | | | | mmc: remove unused "ddr" parameter in struct mmc_iosJaehoon Chung2011-08-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "mmc: dw_mmc: Fix DDR mode support" removed the last user. Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-131-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ASoC: Fix compile warning in wm8750.c ASoC: omap: Update e-mail address of Jarkko Nikula ASoC: SAMSUNG: Add I2S0 internal dma driver ASoC: Terminate WM8750 SPI device ID table ASoC: Add missing break in WM8994 probe ALSA: snd-usb-caiaq: Correct offset fields of outbound iso_frame_desc ALSA: azt3328 - adjust error handling code to include debugging code ALSA: hda - Add CONFIG_SND_HDA_POWER_SAVE to stac_vrefout_set() ALSA: usb-audio - Add quirk for BOSS Micro BR-80 ASoC: Fix typo in wm8750 spi_ids ASoC: Fix warning in Speyside WM8962 ASoC: Fix SPI driver binding for WM8987 ASoC: Fix binding of WM8750 on Jive ASoC: WM8903: Free IRQ on device removal ASoC: Tegra: wm8903 machine driver: Allow re-insertion of module ASoC: Tegra: tegra_pcm_deallocate_dma_buffer: Don't OOPS
| * | | | | | | ASoC: omap: Update e-mail address of Jarkko NikulaJarkko Nikula2011-08-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My gmail account got disabled and I'm not going to reopen it. Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
* | | | | | | | Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds2011-08-121-0/+7
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6: dt: add empty of_get_property for non-dt
| * | | | | | | | dt: add empty of_get_property for non-dtStephen Warren2011-08-091-0/+7
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds empty function of_get_property for non-dt build, so that drivers migrating to dt can save some '#ifdef CONFIG_OF'. This also fixes the current Tegra compile problem in linux-next. Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2011-08-123-4/+6
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) e1000e: increase driver version number e1000e: alternate MAC address update e1000e: do not disable receiver on 82574/82583 e1000e: alternate MAC address does not work on device id 0x1060 PCnet: Fix section mismatch bnx2x: disable dcb on 578xx since not supported yet bnx2x: properly clean indirect addresses bnx2x: prevent race between undi_unload and load flows bnx2x: fix select_queue when FCoE is disabled bnx2x: init FCOE FP only once ipv4: some rt_iif -> rt_route_iif conversions net/bridge/netfilter/ebtables.c: use available error handling code net/netlabel/netlabel_kapi.c: add missing cleanup code net/irda: sh_sir: tidyup compile warning net/irda: sh_sir: add missing header net/irda: sh_irda: add missing header slcan: ldisc generated skbs are received in softirq context scm: Capture the full credentials of the scm sender tcp: initialize variable ecn_ok in syncookies path drivers/net/wireless/wl1251: add missing kfree ...
| * | | | | | | | ipv4: route non-local sources for raw socketJulian Anastasov2011-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The raw sockets can provide source address for routing but their privileges are not considered. We can provide non-local source address, make sure the FLOWI_FLAG_ANYSRC flag is set if socket has privileges for this, i.e. based on hdrincl (IP_HDRINCL) and transparent flags. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: Make userland include of netlink.h more sane.David S. Miller2011-08-072-3/+5
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently userland will barf when including linux/netlink.h unless it precisely includes sys/socket.h first. The issue is where the definition of "sa_family_t" comes from. We've been back and forth on how to fix this issue in the past, see: http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621 http://thread.gmane.org/gmane.linux.network/143380 Ben Hutchings suggested we take a hint from how we handle the sockaddr_storage type. First we define a "__kernel_sa_family_t" to linux/socket.h that is always defined. Then if __KERNEL__ is defined, we also define "sa_family_t" as equal to "__kernel_sa_family_t". Then in places like linux/netlink.h we use __kernel_sa_family_t in user visible datastructures. Reported-by: Michel Machado <michel@digirati.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | move RLIMIT_NPROC check from set_user() to do_execve_common()Vasiliy Kulikov2011-08-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC check in set_user() to check for NPROC exceeding via setuid() and similar functions. Before the check there was a possibility to greatly exceed the allowed number of processes by an unprivileged user if the program relied on rlimit only. But the check created new security threat: many poorly written programs simply don't check setuid() return code and believe it cannot fail if executed with root privileges. So, the check is removed in this patch because of too often privilege escalations related to buggy programs. The NPROC can still be enforced in the common code flow of daemons spawning user processes. Most of daemons do fork()+setuid()+execve(). The check introduced in execve() (1) enforces the same limit as in setuid() and (2) doesn't create similar security issues. Neil Brown suggested to track what specific process has exceeded the limit by setting PF_NPROC_EXCEEDED process flag. With the change only this process would fail on execve(), and other processes' execve() behaviour is not changed. Solar Designer suggested to re-check whether NPROC limit is still exceeded at the moment of execve(). If the process was sleeping for days between set*uid() and execve(), and the NPROC counter step down under the limit, the defered execve() failure because NPROC limit was exceeded days ago would be unexpected. If the limit is not exceeded anymore, we clear the flag on successful calls to execve() and fork(). The flag is also cleared on successful calls to set_user() as the limit was exceeded for the previous user, not the current one. Similar check was introduced in -ow patches (without the process flag). v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user(). Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-091-14/+14
|\ \ \ \ \ \ \ \ | | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: sound: pss - don't use the deprecated function check_region ALSA: timer - Add NULL-check for invalid slave timer ALSA: timer - Fix Oops at closing slave timer ASoC: Acknowledge WM8996 interrupts before acting on them ASoC: Rename WM8915 to WM8996 ALSA: Fix dependency of CONFIG_SND_TEA575X ALSA: asihpi - use kzalloc() ALSA: snd-usb-caiaq: Fix keymap for RigKontrol3 ALSA: snd-usb: Fix uninitialized variable usage ALSA: hda - Fix a complile warning in patch_via.c ALSA: hdspm - Fix uninitialized compile warnings ALSA: usb-audio - add quirk for Keith McMillen StringPort ALSA: snd-usb: operate on given mixer interface only ALSA: snd-usb: avoid dividing by zero on invalid input ALSA: snd-usb: Accept UAC2 FORMAT_TYPE descriptors with bLength > 6 sound: oss/pas2: Remove CLOCK_TICK_RATE dependency from PAS16 driver ALSA: hda - Use auto-parser for ASUS UX50, Eee PC P901, S101 and P1005 ALSA: hda - Fix digital-mic mono recording on ASUS Eee PC ASoC: sgtl5000: fix cache handling ASoC: Disable wm_hubs periodic DC servo update
| * | | | | | | ASoC: Rename WM8915 to WM8996Mark Brown2011-08-081-14/+14
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For marketing reasons the part will be called WM8996. In order to avoid user confusion rename the driver to reflect this. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Kukjin Kim <kgene.kim@samsung.com> Acked-by: Liam Girdwood <lrg@ti.com>
* | | | | | | mm: Fix fixup_user_fault() for MMU=nPeter Zijlstra2011-08-081-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 2efaca927f5c ("mm/futex: fix futex writes on archs with SW tracking of dirty & young") we forgot about MMU=n. This patch fixes that. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/1311761831.24752.413.camel@twins Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | cred: use 'const' in get_current_{user,groups}Linus Torvalds2011-08-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid annoying warnings from these functions ("discards qualifiers") because they assign 'current_cred()' to a non-const pointer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | CRED: Restore const to current_cred()David Howells2011-08-081-1/+1
|/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3295514841c2 ("fix rcu annotations noise in cred.h") accidentally dropped the const of current->cred inside current_cred() by the insertion of a cast to deal with an RCU annotation loss warning from sparce. Use an appropriate RCU wrapper instead so as not to lose the const. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>