summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'idle-release' of ↵Linus Torvalds2011-01-1313-113/+145
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer intel_idle: open broadcast clock event cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOW cpuidle: delete NOP CPUIDLE_FLAG_POLL ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGs cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL ACPI, intel_idle: Cleanup idle= internal variables cpuidle: Make cpuidle_enable_device() call poll_idle_init() intel_idle: update Sandy Bridge core C-state residency targets
| * Merge branch 'cpuidle-perf-events' into idle-testLen Brown2011-01-125-16/+12
| |\
| | * cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle ↵Thomas Renninger2011-01-125-16/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | events from the cpuidle layer Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle" events -> this patch fixes it and makes cpu_idle events throwing less complex. It also introduces cpu_idle events for all architectures which use the cpuidle subsystem, namely: - arch/arm/mach-at91/cpuidle.c - arch/arm/mach-davinci/cpuidle.c - arch/arm/mach-kirkwood/cpuidle.c - arch/arm/mach-omap2/cpuidle34xx.c - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait) - arch/x86/kernel/process.c (did throw events before, but was a mess) - drivers/idle/intel_idle.c (did throw events before) Convention should be: Fire cpu_idle events inside the current pm_idle function (not somewhere down the the callee tree) to keep things easy. Current possible pm_idle functions in X86: c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle -> this is really easy is now. This affects userspace: The type field of the cpu_idle power event can now direclty get mapped to: /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...} instead of throwing very CPU/mwait specific values. This change is not visible for the intel_idle driver. For the acpi_idle driver it should only be visible if the vendor misses out C-states in his BIOS. Another (perf timechart) patch reads out cpuidle info of cpu_idle events from: /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped to the correct C-/cpuidle state again, even if e.g. vendors miss out C-states in their BIOS and for example only export C1 and C3. -> everything is fine. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Robert Schoene <robert.schoene@tu-dresden.de> CC: Jean Pihet <j-pihet@ti.com> CC: Arjan van de Ven <arjan@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: linux-pm@lists.linux-foundation.org CC: linux-acpi@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-perf-users@vger.kernel.org CC: linux-omap@vger.kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
| * | Merge branch 'linus' into idle-testLen Brown2011-01-125728-291602/+482888
| |\|
| * | intel_idle: open broadcast clock eventShaohua Li2011-01-121-1/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel_idle driver uses CLOCK_EVT_NOTIFY_BROADCAST_ENTER CLOCK_EVT_NOTIFY_BROADCAST_EXIT for broadcast clock events. The _ENTER/_EXIT doesn't really open broadcast clock events, please see processor_idle.c for an example. In some situation, this will cause boot hang, because some CPUs enters idle but local APIC timer stalls. Reported-and-tested-by: Yan Zheng <zheng.z.yan@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> cc: stable@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
| * | cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specificLen Brown2011-01-122-1/+2
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| * | cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idleLen Brown2011-01-122-1/+8
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| * | cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitionsLen Brown2011-01-121-3/+0
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
| * | SH, cpuidle: delete use of NOP CPUIDLE_FLAGS_SHALLOWLen Brown2011-01-121-1/+0
| | | | | | | | | | | | | | | | | | set but not checked. Signed-off-by: Len Brown <len.brown@intel.com>
| * | cpuidle: delete NOP CPUIDLE_FLAG_POLLLen Brown2011-01-122-2/+1
| | | | | | | | | | | | | | | | | | it serves no purpose Signed-off-by: Len Brown <len.brown@intel.com>
| * | ACPI: processor_idle: delete use of NOP CPUIDLE_FLAGsLen Brown2011-01-121-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPUIDLE_FLAG_SHALLOW CPUIDLE_FLAG_BALANCED CPUIDLE_FLAG_DEEP CPUIDLE_FLAG_CHECK_BM were set by acpi_processor_setup_cpuidle(), but never used by cpuidle or by acpi_idle. So stop setting them. Signed-off-by: Len Brown <len.brown@intel.com>
| * | cpuidle: Rename X86 specific idle poll state[0] from C0 to POLLThomas Renninger2011-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C0 means and is well know as "not idle". All documentation out there uses this term as "running"/"not idle" state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat) show C0 residency which there is correct, but means something totally else than cpuidle "POLL" state. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
| * | ACPI, intel_idle: Cleanup idle= internal variablesThomas Renninger2011-01-127-40/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having four variables for the same thing: idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides is rather confusing and unnecessary complex. if idle= boot param is passed, only set up one variable: boot_option_idle_overrides Introduces following functional changes/fixes: - intel_idle driver does not register if any idle=xy boot param is passed. - processor_idle.c will also not register a cpuidle driver and get active if idle=halt is passed. Before a cpuidle driver with one (C1, halt) state got registered Now the default_idle function will be used which finally uses the same idle call to enter sleep state (safe_halt()), but without registering a whole cpuidle driver. That means idle= param will always avoid cpuidle drivers to register with one exception (same behavior as before): idle=nomwait may still register acpi_idle cpuidle driver, but C1 will not use mwait, but hlt. This can be a workaround for IO based deeper sleep states where C1 mwait causes problems. Signed-off-by: Thomas Renninger <trenn@suse.de> cc: x86@kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
| * | cpuidle: Make cpuidle_enable_device() call poll_idle_init()Rafael J. Wysocki2011-01-121-41/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following scenario is possible with the current cpuidle code and the ACPI cpuidle driver: (1) acpi_processor_cst_has_changed() is called, (2) cpuidle_disable_device() is called, (3) cpuidle_remove_state_sysfs() is called to remove the (presumably outdated) states info from sysfs, (3) acpi_processor_get_power_info() is called, the first entry in the pr->power.states[] table is filled with zeros, (4) acpi_processor_setup_cpuidle() is called and it doesn't fill the first entry in pr->power.states[], (5) cpuidle_enable_device() is called, (6) __cpuidle_register_device() is _not_ called, since the device has already been registered, (7) Consequently, poll_idle_init() is _not_ called either, (8) cpuidle_add_state_sysfs() is called to create the sysfs attributes for the new states and it uses the bogus first table entry from acpi_processor_get_power_info() for creating state0. This problem is avoided if cpuidle_enable_device() unconditionally calls poll_idle_init(). Reported-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> cc: stable@kernel.org
| * | intel_idle: update Sandy Bridge core C-state residency targetsLen Brown2011-01-121-4/+4
| | | | | | | | | | | | Signed-off-by: Len Brown <len.brown@intel.com>
* | | Merge branch 'sfi-release' of ↵Linus Torvalds2011-01-131-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 * 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6: SFI: use ioremap_cache() instead of ioremap()
| * | | SFI: use ioremap_cache() instead of ioremap()Len Brown2011-01-111-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | We copied ACPI's oversight of using ioremap() and creating non-cached table mappings when we should have been using ioremap_cache(). Signed-off-by: Len Brown <len.brown@intel.com>
* | | Merge branch 'vfs-scale-working' of ↵Linus Torvalds2011-01-136-14/+40
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: fs: fix do_last error case when need_reval_dot nfs: add missing rcu-walk check fs: hlist UP debug fixup fs: fix dropping of rcu-walk from force_reval_path fs: force_reval_path drop rcu-walk before d_invalidate fs: small rcu-walk documentation fixes Fixed up trivial conflicts in Documentation/filesystems/porting
| * | | fs: fix do_last error case when need_reval_dotJ. R. Okajima2011-01-141-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When open(2) without O_DIRECTORY opens an existing dir, it should return EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it is changed by d_revalidate() which returns any positive to represent 'the target dir is valid.' Should we keep and return the initialized 'error' in this case. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
| * | | nfs: add missing rcu-walk checkNick Piggin2011-01-141-1/+5
| | | | | | | | | | | | | | | | Signed-off-by: Nick Piggin <npiggin@kernel.dk>
| * | | fs: hlist UP debug fixupNick Piggin2011-01-142-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Po-Yu Chuang <ratbert.chuang@gmail.com> noticed that hlist_bl_set_first could crash on a UP system when LIST_BL_LOCKMASK is 0, because LIST_BL_BUG_ON(!((unsigned long)h->first & LIST_BL_LOCKMASK)); always evaulates to true. Fix the expression, and also avoid a dependency between bit spinlock implementation and list bl code (list code shouldn't know anything except that bit 0 is set when adding and removing elements). Eventually if a good use case comes up, we might use this list to store 1 or more arbitrary bits of data, so it really shouldn't be tied to locking either, but for now they are helpful for debugging. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
| * | | fs: fix dropping of rcu-walk from force_reval_pathNick Piggin2011-01-141-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As J. R. Okajima noted, force_reval_path passes in the same dentry to d_revalidate as the one in the nameidata structure (other callers pass in a child), so the locking breaks. This can oops with a chrooted nfs mount, for example. Similarly there can be other problems with revalidating a dentry which is already in nameidata of the path walk. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
| * | | fs: force_reval_path drop rcu-walk before d_invalidateNick Piggin2011-01-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d_revalidate can return in rcu-walk mode even when it returns 0. We can't just call any old dcache function on rcu-walk dentry (the dentry is unstable, so even through d_lock can safely be taken, the result may no longer be what we expect -- careful re-checks would be required). So just drop rcu in this case. (I missed this conversion when switching to the rcu-walk convention that Linus suggested) Signed-off-by: Nick Piggin <npiggin@kernel.dk>
| * | | fs: small rcu-walk documentation fixesNick Piggin2011-01-142-6/+6
| | | | | | | | | | | | | | | | Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* | | | Merge branch 'stable/gntdev' of ↵Linus Torvalds2011-01-1310-371/+1408
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/gntdev' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/p2m: Fix module linking error. xen p2m: clear the old pte when adding a page to m2p_override xen gntdev: use gnttab_map_refs and gnttab_unmap_refs xen: introduce gnttab_map_refs and gnttab_unmap_refs xen p2m: transparently change the p2m mappings in the m2p override xen/gntdev: Fix circular locking dependency xen/gntdev: stop using "token" argument xen: gntdev: move use of GNTMAP_contains_pte next to the map_op xen: add m2p override mechanism xen: move p2m handling to separate file xen/gntdev: add VM_PFNMAP to vma xen/gntdev: allow usermode to map granted pages xen: define gnttab_set_map_op/unmap_op Fix up trivial conflict in drivers/xen/Kconfig
| * | | | xen/p2m: Fix module linking error.Konrad Rzeszutek Wilk2011-01-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: ERROR: "m2p_find_override_pfn" [drivers/block/xen-blkfront.ko] undefined! Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen p2m: clear the old pte when adding a page to m2p_overrideJeremy Fitzhardinge2011-01-113-9/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding a page to m2p_override we change the p2m of the page so we need to also clear the old pte of the kernel linear mapping because it doesn't correspond anymore. When we remove the page from m2p_override we restore the original p2m of the page and we also restore the old pte of the kernel linear mapping. Before changing the p2m mappings in m2p_add_override and m2p_remove_override, check that the page passed as argument is valid and return an error if it is not. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen gntdev: use gnttab_map_refs and gnttab_unmap_refsStefano Stabellini2011-01-111-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use gnttab_map_refs and gnttab_unmap_refs to map and unmap the grant ref, so that we can have a corresponding struct page. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen: introduce gnttab_map_refs and gnttab_unmap_refsStefano Stabellini2011-01-112-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gnttab_map_refs maps some grant refs and uses the new m2p override to set a proper m2p mapping for the granted pages. gnttab_unmap_refs unmaps the granted refs and removes th mappings from the m2p override. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen p2m: transparently change the p2m mappings in the m2p overrideStefano Stabellini2011-01-111-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In m2p_add_override store the original mfn into page->index and then change the p2m mapping, setting mfns as FOREIGN_FRAME. In m2p_remove_override restore the original mapping. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/gntdev: Fix circular locking dependencyDaniel De Graaf2011-01-111-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | apply_to_page_range will acquire PTE lock while priv->lock is held, and mn_invl_range_start tries to acquire priv->lock with PTE already held. Fix by not holding priv->lock during the entire map operation. This is safe because map->vma is set nonzero while the lock is held, which will cause subsequent maps to fail and will cause the unmap ioctl (and other users of gntdev_del_map) to return -EBUSY until the area is unmapped. It is similarly impossible for gntdev_vma_close to be called while the vma is still being created. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/gntdev: stop using "token" argumentJeremy Fitzhardinge2011-01-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's the struct page of the L1 pte page. But we can get its mfn by simply doing an arbitrary_virt_to_machine() on it anyway (which is the safe conservative choice; since we no longer allow HIGHPTE pages, we would never expect to be operating on a mapped pte page). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen: gntdev: move use of GNTMAP_contains_pte next to the map_opIan Campbell2011-01-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag controls the meaning of gnttab_map_grant_ref.host_addr and specifies that the field contains a reference to the pte entry to be used to perform the mapping. Therefore move the use of this flag to the point at which we actually use a reference to the pte instead of something else, splitting up the usage of the flag in this way is confusing and potentially error prone. The other flags are all properties of the mapping itself as opposed to properties of the hypercall arguments and therefore it make sense to continue to pass them round in map->flags. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com> Cc: Derek G. Murray <Derek.Murray@cl.cam.ac.uk> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen: add m2p override mechanismJeremy Fitzhardinge2011-01-112-3/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a simple hashtable based mechanism to override some portions of the m2p, so that we can find out the pfn corresponding to an mfn of a granted page. In fact entries corresponding to granted pages in the m2p hold the original pfn value of the page in the source domain that granted it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen: move p2m handling to separate fileJeremy Fitzhardinge2011-01-113-366/+378
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/gntdev: add VM_PFNMAP to vmaJeremy Fitzhardinge2011-01-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These pages are from other domains, so don't have any local PFN. VM_PFNMAP is the closest concept Linux has to this. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen/gntdev: allow usermode to map granted pagesGerd Hoffmann2011-01-114-0/+763
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The gntdev driver allows usermode to map granted pages from other domains. This is typically used to implement a Xen backend driver in user mode. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen: define gnttab_set_map_op/unmap_opIan Campbell2011-01-111-1/+38
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: hypercall definitions These functions populate the gnttab data structures used by the granttab map and unmap ops and are used in the backend drivers. Originally xen-unstable.hg 9625:c3bb51c443a7 [ Include Stefano's fix for phys_addr_t ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | Merge branch 'stable/platform-pci-fixes' of ↵Linus Torvalds2011-01-133-16/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/platform-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen-platform: Fix compile errors if CONFIG_PCI is not enabled. xen: rename platform-pci module to xen-platform-pci. xen-platform: use PCI interfaces to request IO and MEM resources.
| * | | | xen-platform: Fix compile errors if CONFIG_PCI is not enabled.Konrad Rzeszutek Wilk2011-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/xen/platform-pci.c:127: error: implicit declaration of function 'pci_request_region' drivers/xen/platform-pci.c:165: error: implicit declaration of function 'pci_release_region' Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen: rename platform-pci module to xen-platform-pci.Ian Campbell2011-01-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | platform-pci is rather generic for a modular distro style kernel. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | xen-platform: use PCI interfaces to request IO and MEM resources.Ian Campbell2011-01-121-14/+7
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the correct interface to use and something has broken the use of the previous incorrect interface (which fails because the request conflicts with the resources assigned for the PCI device itself instead of nesting like the PCI interfaces do). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@kernel.org # 2.6.37 only
* | | | memcg: fix memory migration of shmem swapcacheDaisuke Nishimura2011-01-133-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation mem_cgroup_end_migration() decides whether the page migration has succeeded or not by checking "oldpage->mapping". But if we are tring to migrate a shmem swapcache, the page->mapping of it is NULL from the begining, so the check would be invalid. As a result, mem_cgroup_end_migration() assumes the migration has succeeded even if it's not, so "newpage" would be freed while it's not uncharged. This patch fixes it by passing mem_cgroup_end_migration() the result of the page migration. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: use [kv]zalloc[_node] rather than [kv]malloc+memsetJesper Juhl2011-01-131-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In mem_cgroup_alloc() we currently do either kmalloc() or vmalloc() then followed by memset() to zero the memory. This can be more efficiently achieved by using kzalloc() and vzalloc(). There's also one situation where we can use kzalloc_node() - this is what's new in this version of the patch. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: fix deadlock between cpuset and memcgDaisuke Nishimura2011-01-131-35/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b1dd693e ("memcg: avoid deadlock between move charge and try_charge()") can cause another deadlock about mmap_sem on task migration if cpuset and memcg are mounted onto the same mount point. After the commit, cgroup_attach_task() has sequence like: cgroup_attach_task() ss->can_attach() cpuset_can_attach() mem_cgroup_can_attach() down_read(&mmap_sem) (1) ss->attach() cpuset_attach() mpol_rebind_mm() down_write(&mmap_sem) (2) up_write(&mmap_sem) cpuset_migrate_mm() do_migrate_pages() down_read(&mmap_sem) up_read(&mmap_sem) mem_cgroup_move_task() mem_cgroup_clear_mc() up_read(&mmap_sem) We can cause deadlock at (2) because we've already aquire the mmap_sem at (1). But the commit itself is necessary to fix deadlocks which have existed before the commit like: Ex.1) move charge | try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() | down_write(&mmap_sem) mc.moving_task = current | .. mem_cgroup_precharge_mc() | __mem_cgroup_try_charge() mem_cgroup_count_precharge() | prepare_to_wait() down_read(&mmap_sem) | if (mc.moving_task) -> cannot aquire the lock | -> true | schedule() | -> move charge should wake it up Ex.2) move charge | try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() | mc.moving_task = current | mem_cgroup_precharge_mc() | mem_cgroup_count_precharge() | down_read(&mmap_sem) | .. | up_read(&mmap_sem) | | down_write(&mmap_sem) mem_cgroup_move_task() | .. mem_cgroup_move_charge() | __mem_cgroup_try_charge() down_read(&mmap_sem) | prepare_to_wait() -> cannot aquire the lock | if (mc.moving_task) | -> true | schedule() | -> move charge should wake it up This patch fixes all of these problems by: 1. revert the commit. 2. To fix the Ex.1, we set mc.moving_task after mem_cgroup_count_precharge() has released the mmap_sem. 3. To fix the Ex.2, we use down_read_trylock() instead of down_read() in mem_cgroup_move_charge() and, if it has failed to aquire the lock, cancel all extra charges, wake up all waiters, and retry trylock. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: Ben Blum <bblum@andrew.cmu.edu> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com> Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: remove unnecessary return from void-returning mem_cgroup_del_lru_list()Minchan Kim2011-01-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: fix unit mismatch in memcg oom limit calculationJohannes Weiner2011-01-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding the number of swap pages to the byte limit of a memory control group makes no sense. Convert the pages to bytes before adding them. The only user of this code is the OOM killer, and the way it is used means that the error results in a higher OOM badness value. Since the cgroup limit is the same for all tasks in the cgroup, the error should have no practical impact at the moment. But let's not wait for future or changing users to trip over it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com> Cc: David Rientjes <rientjes@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: add lock to synchronize page accounting and migrationKAMEZAWA Hiroyuki2011-01-132-5/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new bit spin lock, PCG_MOVE_LOCK, to synchronize the page accounting and migration code. This reworks the locking scheme of _update_stat() and _move_account() by adding new lock bit PCG_MOVE_LOCK, which is always taken under IRQ disable. 1. If pages are being migrated from a memcg, then updates to that memcg page statistics are protected by grabbing PCG_MOVE_LOCK using move_lock_page_cgroup(). In an upcoming commit, memcg dirty page accounting will be updating memcg page accounting (specifically: num writeback pages) from IRQ context (softirq). Avoid a deadlocking nested spin lock attempt by disabling irq on the local processor when grabbing the PCG_MOVE_LOCK. 2. lock for update_page_stat is used only for avoiding race with move_account(). So, IRQ awareness of lock_page_cgroup() itself is not a problem. The problem is between mem_cgroup_update_page_stat() and mem_cgroup_move_account_page(). Trade-off: * Changing lock_page_cgroup() to always disable IRQ (or local_bh) has some impacts on performance and I think it's bad to disable IRQ when it's not necessary. * adding a new lock makes move_account() slower. Score is here. Performance Impact: moving a 8G anon process. Before: real 0m0.792s user 0m0.000s sys 0m0.780s After: real 0m0.854s user 0m0.000s sys 0m0.842s This score is bad but planned patches for optimization can reduce this impact. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Greg Thelen <gthelen@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Andrea Righi <arighi@develer.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: create extensible page stat update routinesGreg Thelen2011-01-133-14/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace usage of the mem_cgroup_update_file_mapped() memcg statistic update routine with two new routines: * mem_cgroup_inc_page_stat() * mem_cgroup_dec_page_stat() As before, only the file_mapped statistic is managed. However, these more general interfaces allow for new statistics to be more easily added. New statistics are added with memcg dirty page accounting. Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Andrea Righi <arighi@develer.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | memcg: document cgroup dirty memory interfacesGreg Thelen2011-01-131-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document cgroup dirty memory interfaces and statistics. [akpm@linux-foundation.org: fix use_hierarchy description] Signed-off-by: Andrea Righi <arighi@develer.com> Signed-off-by: Greg Thelen <gthelen@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>