summaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'irq-core-for-linus' of ↵Linus Torvalds2015-11-0311-120/+426
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The irq departement delivers: - Rework the irqdomain core infrastructure to accomodate ACPI based systems. This is required to support ARM64 without creating artificial device tree nodes. - Sanitize the ACPI based ARM GIC initialization by making use of the new firmware independent irqdomain core - Further improvements to the generic MSI management - Generalize the irq migration on CPU hotplug - Improvements to the threaded interrupt infrastructure - Allow the migration of "chained" low level interrupt handlers - Allow optional force masking of interrupts in disable_irq[_nosysnc] - Support for two new interrupt chips - Sigh! - A larger set of errata fixes for ARM gicv3 - The usual pile of fixes, updates, improvements and cleanups all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) Document that IRQ_NONE should be returned when IRQ not actually handled PCI/MSI: Allow the MSI domain to be device-specific PCI: Add per-device MSI domain hook of/irq: Use the msi-map property to provide device-specific MSI domain of/irq: Split of_msi_map_rid to reuse msi-map lookup irqchip/gic-v3-its: Parse new version of msi-parent property PCI/MSI: Use of_msi_get_domain instead of open-coded "msi-parent" parsing of/irq: Use of_msi_get_domain instead of open-coded "msi-parent" parsing of/irq: Add support code for multi-parent version of "msi-parent" irqchip/gic-v3-its: Add handling of PCI requester id. PCI/MSI: Add helper function pci_msi_domain_get_msi_rid(). of/irq: Add new function of_msi_map_rid() Docs: dt: Add PCI MSI map bindings irqchip/gic-v2m: Add support for multiple MSI frames irqchip/gic-v3: Fix translation of LPIs after conversion to irq_fwspec irqchip/mxs: Add Alphascale ASM9260 support irqchip/mxs: Prepare driver for hardware with different offsets irqchip/mxs: Panic if ioremap or domain creation fails irqdomain: Documentation updates irqdomain/msi: Use fwnode instead of of_node ...
| * irqdomain/msi: Use fwnode instead of of_nodeMarc Zyngier2015-10-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we continue to push of_node towards the outskirts of irq domains, let's start tackling the case of msi_create_irq_domain and its little friends. This has limited impact in both PCI/MSI, platform MSI, and a few drivers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-17-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Introduce irq_domain_create_hierarchyMarc Zyngier2015-10-131-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to start converting the various MSI layers to use fwnode_handle instead of device_node, add irq_domain_create_hierarchy as a directly equivalent of irq_domain_add_hierarchy (which still exists as a compatibility interface). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-16-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Add a fwnode_handle allocatorMarc Zyngier2015-10-131-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to reference an irqdomain from ACPI, we need to be able to create an identifier, which is usually a struct device_node. This device node does't really fit the ACPI infrastructure, so we cunningly allocate a new structure containing a fwnode_handle, and return that. This structure doesn't really point to a device (interrupt controllers are not "real" devices in Linux), but as we cannot really deny that they exist, we create them with a new fwnode_type (FWNODE_IRQCHIP). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-9-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Introduce irq_domain_create_{linear, tree}Marc Zyngier2015-10-131-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like we have irq_domain_add_{linear,tree} to create a irq domain identified by an of_node, introduce irq_domain_create_{linear,tree} that do the same thing, except that they take a struct fwnode_handle. Existing functions get rewritten in terms of the new ones so that everything keeps working as before (and __irq_domain_add is now fwnode_handle based as well). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-8-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Introduce irq_create_fwspec_mappingMarc Zyngier2015-10-131-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like we have irq_create_of_mapping, irq_create_fwspec_mapping creates a IRQ domain mapping for an interrupt described in a struct irq_fwspec. irq_create_of_mapping gets rewritten in terms of the new function, and the hack we introduced before gets removed (now that no stacked irqchip uses of_phandle_args anymore). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-7-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Introduce a firmware-specific IRQ specifier structureMarc Zyngier2015-10-131-11/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far the closest thing to a generic IRQ specifier structure is of_phandle_args, which happens to be pretty OF specific (the of_node pointer in there is quite annoying). Let's introduce 'struct irq_fwspec' that can be used in place of of_phandle_args for OF, but also for other firmware implementations (that'd be ACPI). This is used together with a new 'translate' method that is the pendent of 'xlate'. We convert irq_create_of_mapping to use this new structure (with a small hack that will be removed later). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-5-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Allow irq domain lookup by fwnodeMarc Zyngier2015-10-131-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, our irq domains are still looked up by device node. Let's change this and allow a domain to be looked up using a fwnode_handle pointer. The existing interfaces are preserved with a couple of helpers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-4-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Convert irqdomain-%3Eof_node to fwnodeMarc Zyngier2015-10-131-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have everyone accessing the of_node field via the irq_domain_get_of_node accessor, it is pretty easy to swap it for a pointer to a fwnode_handle. This translates into a few limited changes in __irq_domain_add, and an updated irq_domain_get_of_node. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * irqdomain: Use irq_domain_get_of_node() instead of direct field accessMarc Zyngier2015-10-131-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct irq_domain contains a "struct device_node *" field (of_node) that is almost the only link between the irqdomain and the device tree infrastructure. In order to prepare for the removal of that field, convert all users to use irq_domain_get_of_node() instead. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Graeme Gregory <graeme@xora.org.uk> Cc: Jake Oshins <jakeo@microsoft.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/1444737105-31573-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * Merge branch 'linus' into irq/coreThomas Gleixner2015-10-138-52/+129
| |\ | | | | | | | | | Bring in upstream updates for patches which depend on them
| * | genirq: Add flag to force mask in disable_irq[_nosync]()Thomas Gleixner2015-10-113-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an irq chip does not implement the irq_disable callback, then we use a lazy approach for disabling the interrupt. That means that the interrupt is marked disabled, but the interrupt line is not immediately masked in the interrupt chip. It only becomes masked if the interrupt is raised while it's marked disabled. We use this to avoid possibly expensive mask/unmask operations for common case operations. Unfortunately there are devices which do not allow the interrupt to be disabled easily at the device level. They are forced to use disable_irq_nosync(). This can result in taking each interrupt twice. Instead of enforcing the non lazy mode on all interrupts of a irq chip, provide a settings flag, which can be set by the driver for that particular interrupt line. Reported-and-tested-by: Duc Dang <dhdang@apm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1510092348370.6097@nanos
| * | genirq: Make irq_set_vcpu_affinity available for CONFIG_SMP=nFeng Wu2015-10-091-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irq_set_vcpu_affinity() is needed when CONFIG_SMP=n, so move the definition out of "#ifdef CONFIG_SMP" Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Feng Wu <feng.wu@intel.com> Cc: jiang.liu@linux.intel.com Cc: pbonzini@redhat.com Link: http://lkml.kernel.org/r/1443860438-144926-1-git-send-email-feng.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | genirq: Allow migration of chained interrupts by installing default actionMika Westerberg2015-10-093-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU is offlined all interrupts that have an action are migrated to other still online CPUs. However, if the interrupt has chained handler installed this is not done. Chained handlers are used by GPIO drivers which support interrupts, for instance. When the affinity is not corrected properly we end up in situation where most interrupts are not arriving to the online CPUs anymore. For example on Intel Braswell system which has SD-card card detection signal connected to a GPIO the IO-APIC routing entries look like below after CPU1 is offlined: pin30, enabled , level, low , V(52), IRR(0), S(0), logical , D(03), M(1) pin31, enabled , level, low , V(42), IRR(0), S(0), logical , D(03), M(1) pin32, enabled , level, low , V(62), IRR(0), S(0), logical , D(03), M(1) pin5b, enabled , level, low , V(72), IRR(0), S(0), logical , D(03), M(1) The problem here is that the destination mask still contains both CPUs even if CPU1 is already offline. This means that the IO-APIC still routes interrupts to the other CPU as well. We solve the problem by providing a default action for chained interrupts. This action allows the migration code to correct affinity (as it finds desc->action != NULL). Also make the default action handler to emit a warning if for some reason a chained handler ends up calling it. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1444039935-30475-1-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | Merge branch 'irq/for-arm' into irq/coreThomas Gleixner2015-10-016-30/+178
| |\ \ | | | | | | | | | | | | | | | | Bring in the change which we offered arm[64] folks to pull into their trees.
| | * | genirq: Introduce generic irq migration for cpu hotunplugYang Yingliang2015-10-013-0/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM and ARM64 have almost identical code for migrating interrupts on cpu hotunplug. Provide a generic version which can be used by both. The new code addresses a shortcoming in the ARM[64] variants which fails to update the affinity change in some cases. The solution for this is to use the core function irq_do_set_affinity() instead of open coding it. [ tglx: Added copyright notice and license boilerplate. Rewrote subject and changelog. ] Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Russell King - ARM Linux <linux@arm.linux.org.uk> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: <linux-arm-kernel@lists.infradead.org> Link: http://lkml.kernel.org/r/1443087135-17044-2-git-send-email-yangyingliang@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | genirq: Remove the second parameter from handle_irq_event_percpu()Huang Shijie2015-09-223-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Actually, we always use the first irq action of the @desc->action chain, so remove the second parameter from handle_irq_event_percpu() which makes the code more tidy. Signed-off-by: Huang Shijie <shijie.huang@arm.com> Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: peterz@infradead.org Cc: marc.zyngier@arm.com Link: http://lkml.kernel.org/r/1441160695-19809-1-git-send-email-shijie.huang@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | genirq: Handle force threading of irqs with primary and thread handlerThomas Gleixner2015-09-221-41/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Force threading of interrupts does not really deal with interrupts which are requested with a primary and a threaded handler. The current policy is to leave them alone and let the primary handler run in interrupt context, but we set the ONESHOT flag for those interrupts as well. Kohji Okuno debugged a problem with the SDHCI driver where the interrupt thread waits for a hardware interrupt to trigger, which can't work well because the hardware interrupt is masked due to the ONESHOT flag being set. He proposed to set the ONESHOT flag only if the interrupt does not provide a thread handler. Though that does not work either because these interrupts can be shared. So the other interrupt would rightfully get the ONESHOT flag set and therefor the same situation would happen again. To deal with this proper, we need to force thread the primary handler of such interrupts as well. That means that the primary interrupt handler is treated as any other primary interrupt handler which is not marked IRQF_NO_THREAD. The threaded handler becomes a separate thread so the SDHCI flow logic can be handled gracefully. The same issue was reported against 4.1-rt. Reported-and-tested-by: Kohji Okuno <okuno.kohji@jp.panasonic.com> Reported-By: Michal Smucr <msmucr@gmail.com> Reported-and-tested-by: Nathan Sullivan <nathan.sullivan@ni.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509211058080.5606@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2015-11-039-46/+79
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer departement provides: - More y2038 work in the area of ntp and pps. - Optimization of posix cpu timers - New time related selftests - Some new clocksource drivers - The usual pile of fixes, cleanups and improvements" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) timeconst: Update path in comment timers/x86/hpet: Type adjustments clocksource/drivers/armada-370-xp: Implement ARM delay timer clocksource/drivers/tango_xtal: Add new timer for Tango SoCs clocksource/drivers/imx: Allow timer irq affinity change clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr() clocksource/drivers/h8300_*: Remove unneeded memset()s clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup() clocksource/drivers/em_sti: Remove unneeded memset()s clocksource/drivers/mediatek: Use GPT as sched clock source clockevents/drivers/mtk: Fix spurious interrupt leading to crash posix_cpu_timer: Reduce unnecessary sighand lock contention posix_cpu_timer: Convert cputimer->running to bool posix_cpu_timer: Check thread timers only when there are active thread timers posix_cpu_timer: Optimize fastpath_timer_check() timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments timers: Use __fls in apply_slack() clocksource: Remove return statement from void functions net: sfc: avoid using timespec ntp/pps: use y2038 safe types in pps_event_time ...
| * | | | timeconst: Update path in commentJason A. Donenfeld2015-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: hofrat@osadl.org Link: http://lkml.kernel.org/r/1436894685-5868-1-git-send-email-Jason@zx2c4.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge branch 'fortglx/4.4/time' of ↵Thomas Gleixner2015-10-203-16/+16
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.linaro.org/people/john.stultz/linux into timers/core Time updates from John Stultz: - More 2038 work from Arnd Bergmann around ntp and pps
| | * | | | ntp: use timespec64 in sync_cmos_clockArnd Bergmann2015-10-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sync_cmos_clock has one use of struct timespec, which we want to eventually replace with timespec64 or similar in the kernel. There is no way this one can overflow, but the conversion to timespec64 is trivial and has no other dependencies. Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | ntp/pps: replace getnstime_raw_and_real with 64-bit versionArnd Bergmann2015-10-011-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is exactly one caller of getnstime_raw_and_real in the kernel, which is the pps_get_ts function. This changes the caller and the implementation to work on timespec64 types rather than timespec, to avoid the time_t overflow on 32-bit architectures. For consistency with the other new functions (ktime_get_seconds, ktime_get_real_*, ...), I'm renaming the function to ktime_get_raw_and_real_ts64. We still need to convert from the internal 64-bit type to 32 bit types in the caller, but this conversion is now pushed out from getnstime_raw_and_real to pps_get_ts. A follow-up patch changes the remaining pps code to completely avoid the conversion. Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | | | ntp/pps: use timespec64 for hardpps()Arnd Bergmann2015-10-013-8/+8
| | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is only one user of the hardpps function in the kernel, so it makes sense to atomically change it over to using 64-bit timestamps for y2038 safety. In the hardpps implementation, we also need to change the pps_normtime structure, which is similar to struct timespec and also requires a 64-bit seconds portion. This introduces two temporary variables in pps_kc_event() to do the conversion, they will be removed again in the next step, which seemed preferable to having a larger patch changing it all at the same time. Acked-by: Richard Cochran <richardcochran@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | | | posix_cpu_timer: Reduce unnecessary sighand lock contentionJason Low2015-10-151-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was found while running a database workload on large systems that significant time was spent trying to acquire the sighand lock. The issue was that whenever an itimer expired, many threads ended up simultaneously trying to send the signal. Most of the time, nothing happened after acquiring the sighand lock because another thread had just already sent the signal and updated the "next expire" time. The fastpath_timer_check() didn't help much since the "next expire" time was updated after the threads exit fastpath_timer_check(). This patch addresses this by having the thread_group_cputimer structure maintain a boolean to signify when a thread in the group is already checking for process wide timers, and adds extra logic in the fastpath to check the boolean. Signed-off-by: Jason Low <jason.low2@hp.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: hideaki.kimura@hpe.com Cc: terry.rudd@hpe.com Cc: scott.norton@hpe.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1444849677-29330-5-git-send-email-jason.low2@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | posix_cpu_timer: Convert cputimer->running to boolJason Low2015-10-152-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the next patch in this series, a new field 'checking_timer' will be added to 'struct thread_group_cputimer'. Both this and the existing 'running' integer field are just used as boolean values. To save space in the structure, we can make both of these fields booleans. This is a preparatory patch to convert the existing running integer field to a boolean. Suggested-by: George Spelvin <linux@horizon.com> Signed-off-by: Jason Low <jason.low2@hp.com> Reviewed: George Spelvin <linux@horizon.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: hideaki.kimura@hpe.com Cc: terry.rudd@hpe.com Cc: scott.norton@hpe.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1444849677-29330-4-git-send-email-jason.low2@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | posix_cpu_timer: Check thread timers only when there are active thread timersJason Low2015-10-151-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fastpath_timer_check() contains logic to check for if any timers are set by checking if !task_cputime_zero(). Similarly, we can do this before calling check_thread_timers(). In the case where there are only process-wide timers, this will skip all of the computations for per-thread timers when there are no per-thread timers. As suggested by George, we can put the task_cputime_zero() check in check_thread_timers(), since that is more of an optization to the function. Similarly, we move the existing check of cputimer->running to check_process_timers(). Signed-off-by: Jason Low <jason.low2@hp.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: hideaki.kimura@hpe.com Cc: terry.rudd@hpe.com Cc: scott.norton@hpe.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1444849677-29330-3-git-send-email-jason.low2@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | posix_cpu_timer: Optimize fastpath_timer_check()Jason Low2015-10-151-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In fastpath_timer_check(), the task_cputime() function is always called to compute the utime and stime values. However, this is not necessary if there are no per-thread timers to check for. This patch modifies the code such that we compute the task_cputime values only when there are per-thread timers set. Signed-off-by: Jason Low <jason.low2@hp.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: hideaki.kimura@hpe.com Cc: terry.rudd@hpe.com Cc: scott.norton@hpe.com Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1444849677-29330-2-git-send-email-jason.low2@hp.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | Merge tag 'v4.3-rc5' into timers/core, to pick up fixes before applying new ↵Ingo Molnar2015-10-1211-82/+220
| |\ \ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | timers: Use __fls in apply_slack()Rasmus Villemoes2015-10-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In apply_slack(), find_last_bit() is applied to a bitmask consisting of precisely BITS_PER_LONG bits. Since mask is non-zero, we might as well eliminate the function call and use __fls() directly. On x86_64, this shaves 23 bytes of the only caller, mod_timer(). This also gets rid of Coverity CID 1192106, but that is a false positive: Coverity is not aware that mask != 0 implies that find_last_bit will not return BITS_PER_LONG. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1443771931-6284-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | clocksource: Remove return statement from void functionsGuillaume Gomez2015-10-111-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Guillaume Gomez <guillaume1.gomez@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/CAAOQCfSDgmqSWDBsetau%2ByF8x0%2BDagCF_pfFw0p5xH_BKkKEog@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | timers: Fix data race in timer_stats_account_timer()Dmitry Vyukov2015-09-221-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timer_stats_account_timer() reads timer->start_site, then checks it for NULL and then re-reads it again, while timer_stats_timer_clear_start_info() can concurrently reset timer->start_site to NULL. This should not lead to crashes, but can double number of entries in timer stats as start_site is used during comparison, the doubled entries will have unuseful NULL start_site. Read timer->start_site only once in timer_stats_account_timer(). The data race was found with KernelThreadSanitizer (KTSAN). Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: andreyknvl@google.com Cc: glider@google.com Cc: kcc@google.com Cc: ktsan@googlegroups.com Cc: john.stultz@linaro.org Link: http://lkml.kernel.org/r/1442584463-69553-1-git-send-email-dvyukov@google.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | time: Fix spelling in commentsZhen Lei2015-09-223-4/+4
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Xinwei Hu <huxinwei@huawei.com> Cc: Xunlei Pang <pang.xunlei@linaro.org> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com [ Fixed yet another typo in one of the sentences fixed. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds2015-11-011-2/+12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull memremap fix from Dan Williams: "The new memremap() api introduced in the 4.3 cycle to unify/replace ioremap_cache() and ioremap_wt() is mishandling the highmem case. This patch has received a build success notification from a 0day-kbuild-robot run and has received an ack from Ard" From the commit message: "The impact of this bug is low for now since the pmem driver is the only user of memremap(), but this is important to fix before more conversions to memremap arrive in 4.4" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: memremap: fix highmem support
| * | | | memremap: fix highmem supportDan Williams2015-10-261-2/+12
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently memremap checks if the range is "System RAM" and returns the kernel linear address. This is broken for highmem platforms where a range may be "System RAM", but is not part of the kernel linear mapping. Fallback to ioremap_cache() in these cases, to let the arch code attempt to handle it. Note that ARM ioremap will WARN when attempting to remap ram, and in that case the caller needs to be fixed. For this reason, existing ioremap_cache() usages for ARM are already trained to avoid attempts to remap ram. The impact of this bug is low for now since the pmem driver is the only user of memremap(), but this is important to fix before more conversions to memremap arrive in 4.4. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | Merge tag 'fixes-for-linus' of ↵Linus Torvalds2015-10-281-2/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module preemption fix from Rusty Russell: "Turns out we should have always been disabling preemption here; someone finally caught it thanks to Peter Z's additional checks" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: Fix locking in symbol_put_addr()
| * | | | module: Fix locking in symbol_put_addr()Peter Zijlstra2015-08-241-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Poma (on the way to another bug) reported an assertion triggering: [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90 [<ffffffff81150822>] __module_address+0x32/0x150 [<ffffffff81150956>] __module_text_address+0x16/0x70 [<ffffffff81150f19>] symbol_put_addr+0x29/0x40 [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core] Laura Abbott <labbott@redhat.com> produced a patch which lead us to inspect symbol_put_addr(). This function has a comment claiming it doesn't need to disable preemption around the module lookup because it holds a reference to the module it wants to find, which therefore cannot go away. This is wrong (and a false optimization too, preempt_disable() is really rather cheap, and I doubt any of this is on uber critical paths, otherwise it would've retained a pointer to the actual module anyway and avoided the second lookup). While its true that the module cannot go away while we hold a reference on it, the data structure we do the lookup in very much _CAN_ change while we do the lookup. Therefore fix the comment and add the required preempt_disable(). Reported-by: poma <pomidorabelisima@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Fixes: a6e6abd575fc ("module: remove module_text_address()") Cc: stable@kernel.org
* | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2015-10-234-12/+28
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Misc fixes all around the map: an instrumentation fix, a nohz usability fix, a lockdep annotation fix and two task group scheduling fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Add missing lockdep_unpin() annotations sched/deadline: Fix migration of SCHED_DEADLINE tasks nohz: Revert "nohz: Set isolcpus when nohz_full is set" sched/fair: Update task group's load_avg after task migration sched/fair: Fix overly small weight for interactive group entities sched, tracing: Stop/start critical timings around the idle=poll idle loop
| * | | | | sched/core: Add missing lockdep_unpin() annotationsPeter Zijlstra2015-10-232-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Luca and Wanpeng reported two missing annotations that led to false lockdep complaints. Add the missing annotations. Reported-by: Luca Abeni <luca.abeni@unitn.it> Reported-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: cbce1a686700 ("sched,lockdep: Employ lock pinning") Link: http://lkml.kernel.org/r/20151023095008.GY17308@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched/deadline: Fix migration of SCHED_DEADLINE tasksLuca Abeni2015-10-201-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") broke select_task_rq_dl() and find_lock_later_rq(), because it introduced a comparison between the local task's deadline and dl.earliest_dl.curr of the remote queue. However, if the remote runqueue does not contain any SCHED_DEADLINE task its earliest_dl.curr is 0 (always smaller than the deadline of the local task) and the remote runqueue is not selected for pushing. As a result, if an application creates multiple SCHED_DEADLINE threads, they will never be pushed to runqueues that do not already contain SCHED_DEADLINE tasks. This patch fixes the issue by checking if dl.dl_nr_running == 0. Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: 9d5142624256 ("sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target") Link: http://lkml.kernel.org/r/1444982781-15608-1-git-send-email-luca.abeni@unitn.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | nohz: Revert "nohz: Set isolcpus when nohz_full is set"Frederic Weisbecker2015-10-201-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts: 8cb9764fc88b ("nohz: Set isolcpus when nohz_full is set") We assumed that full-nohz users always want scheduler isolation on full dynticks CPUs, therefore we included full-nohz CPUs on cpu_isolated_map. This means that tasks run by default on CPUs outside the nohz_full range unless their affinity is explicity overwritten. This suits pure isolation workloads but when the machine is needed to run common workloads, the available sets of CPUs to run common tasks becomes reduced. We reach an extreme case when CONFIG_NO_HZ_FULL_ALL is enabled as it leaves only CPU 0 for non-isolation tasks, which makes people think that their supercomputer regressed to 90's UP - which is true in a sense. Some full-nohz users appear to be interested in running normal workloads either before or after an isolation workload. Full-nohz isn't optimized toward normal workloads but it's still better than UP performance. We are reaching a limitation in kernel presets here. Lets revert this cpu_isolated_map inclusion and let userspace do its own scheduler isolation using cpusets or explicit affinity settings. Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1444663283-30068-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched/fair: Update task group's load_avg after task migrationYuyang Du2015-10-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cfs_rq has cfs_rq->removed_load_avg set (when a task migrates from this cfs_rq), we need to update its contribution to the group's load_avg. This should not increase tg's update too much, because in most cases, the cfs_rq has already decayed its load_avg. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1444699103-20272-2-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched/fair: Fix overly small weight for interactive group entitiesYuyang Du2015-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") led to an overly small weight for interactive group entities. The bad case can be easily reproduced when a number of CPU hogs compete for the CPUs at the same time (thanks to Mike). This is largly because the task group's load average tracking cross CPUs lags behind the real changes. To fix this we accelerate the group share distribution process by using the load.weight of the cfs_rq. This may increase the entire group's share, but we have to do so to protect the (fragile) interactive tasks, especially from CPU hogs. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Yuyang Du <yuyang.du@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1444699103-20272-1-git-send-email-yuyang.du@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | sched, tracing: Stop/start critical timings around the idle=poll idle loopDaniel Bristot de Oliveira2015-10-121-0/+2
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using idle=poll, the preemptoff tracer is always showing the idle task as the culprit for long latencies. That happens because critical timings are not stopped before idle loop. This patch stops critical timings before entering the idle loop, starting it again after the idle loop. This problem does not affect the irqsoff tracer because interruptions are enabled before entering the idle loop. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/10fc3705874aef11dbe152a068b591a7be1899b4.1444314899.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2015-10-231-2/+6
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixes from Andrew Morton: "9 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put fault-inject: fix inverted interval/probability values in printk lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y mm: make sendfile(2) killable thp: use is_zero_pfn() only after pte_present() check mailmap: update Javier Martinez Canillas' email MAINTAINERS: add Sergey as zsmalloc reviewer mm: cma: fix incorrect type conversion for size during dma allocation kmod: don't run async usermode helper as a child of kworker thread
| * | | | | kmod: don't run async usermode helper as a child of kworker threadOleg Nesterov2015-10-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | call_usermodehelper_exec_sync() does fork() + wait() with "unignored" SIGCHLD. What we have missed is that this worker thread can have other children previously forked by call_usermodehelper_exec_work() without UMH_WAIT_PROC. If such a child exits in between it becomes a zombie because auto-reaping only works if SIGCHLD is ignored, and nobody can reap it (unless/until this worker thread exits too). Change the !UMH_WAIT_PROC case to use CLONE_PARENT. Note: this is only first step. All PF_KTHREAD tasks, even created by kernel_thread() should have ->parent == kthreadd by default. Fixes: bb304a5c6fc63d8506c ("kmod: handle UMH_WAIT_PROC from system unbound workqueue") Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | tracing: Do not allow stack_tracer to record stack in NMISteven Rostedt (Red Hat)2015-10-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code in stack tracer should not be executed within an NMI as it grabs spinlocks and stack tracing an NMI gives the possibility of causing a deadlock. Although this is safe on x86_64, because it does not perform stack traces when the task struct stack is not in use (interrupts and NMIs), it may be an issue for NMIs on i386 and other archs that use the same stack as the NMI. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | | tracing: Have stack tracer force RCU to be watchingSteven Rostedt (Red Hat)2015-10-201-0/+7
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stack tracer was triggering the WARN_ON() in module.c: static void module_assert_mutex_or_preempt(void) { #ifdef CONFIG_LOCKDEP if (unlikely(!debug_locks)) return; WARN_ON(!rcu_read_lock_sched_held() && !lockdep_is_held(&module_mutex)); #endif } The reason is that the stack tracer traces all function calls, and some of those calls happen while exiting or entering user space and idle. Some of these functions are called after RCU had already stopped watching, as RCU does not watch userspace or idle CPUs. If a max stack is hit, then the save_stack_trace() is called, which will check module addresses and call module_assert_mutex_or_preempt(), and then trigger the warning. Sad part is, the warning itself will also do a stack trace and tigger the same warning. That probably should be fixed. The warning was added by 0be964be0d45 "module: Sanitize RCU usage and locking" but this bug has probably been around longer. But it's unlikely to cause much harm, but the new warning causes the system to lock up. Cc: stable@vger.kernel.org # 4.2+ Cc: Peter Zijlstra <peterz@infradead.org> Cc:"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| | | | |
| \ \ \ \
*-. \ \ \ \ Merge branches 'irq-urgent-for-linus' and 'timers-urgent-for-linus' of ↵Linus Torvalds2015-10-172-6/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq/timer fixes from Thomas Gleixner: "irq: a fix for the new hierarchical MSI interrupt handling which unbreaks PCI=n configurations. timers: a fix for the new hrtimer clock offset update mechanism to ensure that the boot time offset is respected" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Do not use pci_msi_[un]mask_irq as default methods * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Increment clock_was_set_seq in timekeeping_init()
| | * | | | | timekeeping: Increment clock_was_set_seq in timekeeping_init()Thomas Gleixner2015-10-161-1/+1
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timekeeping_init() can set the wall time offset, so we need to increment the clock_was_set_seq counter. That way hrtimers will pick up the early offset immediately. Otherwise on a machine which does not set wall time later in the boot process the hrtimer offset is stale at 0 and wall time timers are going to expire with a delay of 45 years. Fixes: 868a3e915f7f "hrtimer: Make offset update smarter" Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stefan Liebler <stli@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>