linux-stable.git - Linux kernel stable tree

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	time: Add y2038 safe read_boot_clock64()	Xunlei Pang	2015-04-03	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of addressing in-kernel y2038 issues, this patch adds read_boot_clock64() and replaces all the call sites of read_boot_clock() with this function. This is a __weak implementation, which simply calls the existing y2038 unsafe read_boot_clock(). This allows architecture specific implementations to be converted independently, and eventually the y2038 unsafe read_boot_clock() can be removed after all its architecture specific implementations have been converted to read_boot_clock64(). Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1427945681-29972-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	clockevents: Make suspend/resume calls explicit	Thomas Gleixner	2015-04-01	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the suspend/resume() calls and invoke them directly from the call sites. No locking required at this point because these calls happen with interrupts disabled and a single cpu online. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebased on top of 4.0-rc5. ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan [ Rebased on top of latest timers/core. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	time: Introduce tk_fast_raw	Peter Zijlstra	2015-03-27	1	-0/+13
\| \| \| \| \| \| \| \| \| \|	Add the NMI safe CLOCK_MONOTONIC_RAW accessor.. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.562746929@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	time: Parametrize all tk_fast_mono users	Peter Zijlstra	2015-03-27	1	-10/+15
\| \| \| \| \| \| \| \| \| \| \|	In preparation for more tk_fast instances, remove all hard-coded tk_fast_mono references. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.484279927@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	time: Add timerkeeper::tkr_raw	Peter Zijlstra	2015-03-27	1	-22/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce tkr_raw and make use of it. base_raw -> tkr_raw.base clock->{mult,shift} -> tkr_raw.{mult.shift} Kill timekeeping_get_ns_raw() in favour of timekeeping_get_ns(&tkr_raw), this removes all mono_raw special casing. Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last, both need the same value. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	time: Rename timekeeper::tkr to timekeeper::tkr_mono	Peter Zijlstra	2015-03-27	1	-75/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	timekeeping: Add warnings when overflows or underflows are observed	John Stultz	2015-03-13	1	-7/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was suggested that the underflow/overflow protection should probably throw some sort of warning out, rather than just silently fixing the issue. So this patch adds some warnings here. The flag variables used are not protected by locks, but since we can't print from the reading functions, just being able to say we saw an issue in the update interval is useful enough, and can be slightly racy without real consequence. The big complication is that we're only under a read seqlock, so the data could shift under us during our calculation to see if there was a problem. This patch avoids this issue by nesting another seqlock which allows us to snapshot the just required values atomically. So we shouldn't see false positives. I also added some basic rate-limiting here, since on one build machine w/ skewed TSCs it was fairly noisy at bootup. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-8-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	timekeeping: Try to catch clocksource delta underflows	John Stultz	2015-03-13	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the case where there is a broken clocksource where there are multiple actual clocks that aren't perfectly aligned, we may see small "negative" deltas when we subtract 'now' from 'cycle_last'. The values are actually negative with respect to the clocksource mask value, not necessarily negative if cast to a s64, but we can check by checking the delta to see if it is a small (relative to the mask) negative value (again negative relative to the mask). If so, we assume we jumped backwards somehow and instead use zero for our delta. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-7-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	timekeeping: Add checks to cap clocksource reads to the 'max_cycles' value	John Stultz	2015-03-13	1	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When calculating the current delta since the last tick, we currently have no hard protections to prevent a multiplication overflow from occuring. This patch introduces infrastructure to allow a cap that limits the clocksource read delta value to the 'max_cycles' value, which is where an overflow would occur. Since this is in the hotpath, it adds the extra checking under CONFIG_DEBUG_TIMEKEEPING=y. There was some concern that capping time like this could cause problems as we may stop expiring timers, which could go circular if the timer that triggers time accumulation were mis-scheduled too far in the future, which would cause time to stop. However, since the mult overflow would result in a smaller time value, we would effectively have the same problem there. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-6-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	timekeeping: Add debugging checks to warn if we see delays	John Stultz	2015-03-13	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently there's been requests for better sanity checking in the time code, so that it's more clear when something is going wrong, since timekeeping issues could manifest in a large number of strange ways in various subsystems. Thus, this patch adds some extra infrastructure to add a check to update_wall_time() to print two new warnings: 1) if we see the call delayed beyond the 'max_cycles' overflow point, 2) or if we see the call delayed beyond the clocksource's 'max_idle_ns' value, which is currently 50% of the overflow point. This extra infrastructure is conditional on a new CONFIG_DEBUG_TIMEKEEPING option, also added in this patch - default off. Tested this a bit by halting qemu for specified lengths of time to trigger the warnings. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org [ Improved the changelog and the messages a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
*	PM / sleep: Make it possible to quiesce timers during suspend-to-idle	Rafael J. Wysocki	2015-02-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The efficiency of suspend-to-idle depends on being able to keep CPUs in the deepest available idle states for as much time as possible. Ideally, they should only be brought out of idle by system wakeup interrupts. However, timer interrupts occurring periodically prevent that from happening and it is not practical to chase all of the "misbehaving" timers in a whack-a-mole fashion. A much more effective approach is to suspend the local ticks for all CPUs and the entire timekeeping along the lines of what is done during full suspend, which also helps to keep suspend-to-idle and full suspend reasonably similar. The idea is to suspend the local tick on each CPU executing cpuidle_enter_freeze() and to make the last of them suspend the entire timekeeping. That should prevent timer interrupts from triggering until an IO interrupt wakes up one of the CPUs. It needs to be done with interrupts disabled on all of the CPUs, though, because otherwise the suspended clocksource might be accessed by an interrupt handler which might lead to fatal consequences. Unfortunately, the existing ->enter callbacks provided by cpuidle drivers generally cannot be used for implementing that, because some of them re-enable interrupts temporarily and some idle entry methods cause interrupts to be re-enabled automatically on exit. Also some of these callbacks manipulate local clock event devices of the CPUs which really shouldn't be done after suspending their ticks. To overcome that difficulty, introduce a new cpuidle state callback, ->enter_freeze, that will be guaranteed (1) to keep interrupts disabled all the time (and return with interrupts disabled) and (2) not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to look for the deepest available idle state with ->enter_freeze present and to make the CPU execute that callback with suspended tick (and the last of the online CPUs to execute it with suspended timekeeping). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
*	timekeeping: Make it safe to use the fast timekeeper while suspended	Rafael J. Wysocki	2015-02-15	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Theoretically, ktime_get_mono_fast_ns() may be executed after timekeeping has been suspended (or before it is resumed) which in turn may lead to undefined behavior, for example, when the clocksource read from timekeeping_get_ns() called by it is not accessible at that time. Prevent that from happening by setting up a dummy readout base for the fast timekeeper during timekeeping_suspend() such that it will always return the same number of cycles. After the last timekeeping_update() in timekeeping_suspend() the clocksource is read and the result is stored as cycles_at_suspend. The readout base from the current timekeeper is copied onto the dummy and the ->read pointer of the dummy is set to a routine unconditionally returning cycles_at_suspend. Next, the dummy is passed to update_fast_timekeeper(). Then, ktime_get_mono_fast_ns() will work until the subsequent timekeeping_resume() and the proper readout base for the fast timekeeper will be restored by the timekeeping_update() called right after clearing timekeeping_suspended. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
*	timekeeping: Pass readout base to update_fast_timekeeper()	Rafael J. Wysocki	2015-02-13	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Modify update_fast_timekeeper() to take a struct tk_read_base pointer as its argument (instead of a struct timekeeper pointer) and update its kerneldoc comment to reflect that. That will allow a struct tk_read_base that is not part of a struct timekeeper to be passed to it in the next patch. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org>
*	time: Expose getboottime64 for in-kernel uses	John Stultz	2015-01-23	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Adds a timespec64 based getboottime64() implementation that can be used as we convert internal users of getboottime away from using timespecs. Cc: pang.xunlei <pang.xunlei@linaro.org> Cc: Arnd Bergmann <arnd.bergmann@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	Merge branch 'timers-2038-for-linus' of ↵	Linus Torvalds	2014-12-10	1	-5/+63
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more 2038 timer work from Thomas Gleixner: "Two more patches for the ongoing 2038 work: - New accessors to clock MONOTONIC and REALTIME seconds This is a seperate branch as Arnd has follow up work depending on this" * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Provide y2038 safe accessor to the seconds portion of CLOCK_REALTIME timekeeping: Provide fast accessor to the seconds part of CLOCK_MONOTONIC
\| *	timekeeping: Provide y2038 safe accessor to the seconds portion of ↵	Heena Sirwani	2014-10-29	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CLOCK_REALTIME ktime_get_real_seconds() is the replacement function for get_seconds() returning the seconds portion of CLOCK_REALTIME in a time64_t. For 64bit the function is equivivalent to get_seconds(), but for 32bit it protects the readout with the timekeeper sequence count. This is required because 32-bit machines cannot access 64-bit tk->xtime_sec variable atomically. [tglx: Massaged changelog and added docbook comment ] Signed-off-by: Heena Sirwani <heenasirwani@gmail.com> Reviewed-by: Arnd Bergman <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: opw-kernel@googlegroups.com Link: http://lkml.kernel.org/r/7adcfaa8962b8ad58785d9a2456c3f77d93c0ffb.1414578445.git.heenasirwani@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| *	timekeeping: Provide fast accessor to the seconds part of CLOCK_MONOTONIC	Heena Sirwani	2014-10-29	1	-5/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the counterpart to get_seconds() based on CLOCK_MONOTONIC. The use case for this interface are kernel internal coarse grained timestamps which do neither require the nanoseconds fraction of current time nor the CLOCK_REALTIME properties. Such timestamps can currently only retrieved by calling ktime_get_ts64() and using the tv_sec field of the returned timespec64. That's inefficient as it involves the read of the clocksource, math operations and must be protected by the timekeeper sequence counter. To avoid the sequence counter protection we restrict the return value to unsigned 32bit on 32bit machines. This covers ~136 years of uptime and therefor an overflow is not expected to hit anytime soon. To avoid math in the function we calculate the current seconds portion of CLOCK_MONOTONIC when the timekeeper gets updated in tk_update_ktime_data() similar to the CLOCK_REALTIME counterpart xtime_sec. [ tglx: Massaged changelog, simplified and commented the update function, added docbook comment ] Signed-off-by: Heena Sirwani <heenasirwani@gmail.com> Reviewed-by: Arnd Bergman <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: opw-kernel@googlegroups.com Link: http://lkml.kernel.org/r/da0b63f4bdf3478909f92becb35861197da3a905.1414578445.git.heenasirwani@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* \|	time: Fix sign bug in NTP mult overflow warning	John Stultz	2014-11-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 6067dc5a8c2b ("time: Avoid possible NTP adjustment mult overflow") a new check was added to watch for adjustments that could cause a mult overflow. Unfortunately the check compares a signed with unsigned value and ignored the case where the adjustment was negative, which causes spurious warn-ons on some systems (and seems like it would result in problematic time adjustments there as well, due to the early return). Thus this patch adds a check to make sure the adjustment is positive before we check for an overflow, and resovles the issue in my testing. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Debugged-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1416890145-30048-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* \|	time: Fixup comments to reflect usage of timespec64	John Stultz	2014-11-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix up a few comments that weren't updated when the functions were converted to use timespec64 structures. Acked-by: Arnd Bergmann <arnd.bergmann@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
* \|	time: Expose get_monotonic_coarse64() for in-kernel uses	John Stultz	2014-11-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a timespec64 based get_monotonic_coarse64() implementation that can be used as we convert internal users of get_monotonic_coarse away from using timespecs. Signed-off-by: John Stultz <john.stultz@linaro.org>
* \|	time: Expose getrawmonotonic64 for in-kernel uses	John Stultz	2014-11-21	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds a timespec64 based getrawmonotonic64() implementation that can be used as we convert internal users of getrawmonotonic away from using timespecs. Signed-off-by: John Stultz <john.stultz@linaro.org>
* \|	time: Provide y2038 safe timekeeping_inject_sleeptime() replacement	pang.xunlei	2014-11-21	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of addressing "y2038 problem" for in-kernel uses, this patch adds timekeeping_inject_sleeptime64() using timespec64. After this patch, timekeeping_inject_sleeptime() is deprecated and all its call sites will be fixed using the new interface, after that it can be removed. NOTE: timekeeping_inject_sleeptime() is safe actually, but we want to eliminate timespec eventually, so comes this patch. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
* \|	time: Provide y2038 safe do_settimeofday() replacement	pang.xunlei	2014-11-21	1	-10/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel uses 32-bit signed value(time_t) for seconds elapsed 1970-01-01:00:00:00, thus it will overflow at 2038-01-19 03:14:08 on 32-bit systems. This is widely known as the y2038 problem. As part of addressing "y2038 problem" for in-kernel uses, this patch adds safe do_settimeofday64() using timespec64. After this patch, do_settimeofday() is deprecated and all its call sites will be fixed using do_settimeofday64(), after that it can be removed. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org>
* \|	time: Complete NTP adjustment threshold judging conditions	pang.xunlei	2014-11-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The clocksource mult-adjustment threshold is [mult-maxadj, mult+maxadj], timekeeping_adjust() only deals with the upper threshold, but misses the lower threshold. This patch adds the lower threshold judging condition. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> [jstultz: Minor fix for > 80 char line] Signed-off-by: John Stultz <john.stultz@linaro.org>
* \|	time: Avoid possible NTP adjustment mult overflow.	pang.xunlei	2014-11-21	1	-0/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ideally, __clocksource_updatefreq_scale, selects the largest shift value possible for a clocksource. This results in the mult memember of struct clocksource being particularly large, although not so large that NTP would adjust the clock to cause it to overflow. That said, nothing actually prohibits an overflow from occuring, its just that it "shouldn't" occur. So while very unlikely, and so far never observed, the value of (cs->mult+cs->maxadj) may have a chance to reach very near 0xFFFFFFFF, so there is a possibility it may overflow when doing NTP positive adjustment See the following detail: When NTP slewes the clock, kernel goes through update_wall_time()->...->timekeeping_apply_adjustment(): tk->tkr.mult += mult_adj; Since there is no guard against it, its possible tk->tkr.mult may overflow during this operation. This patch avoids any possible mult overflow by judging the overflow case before adding mult_adj to mult, also adds the WARNING message when capturing such case. Signed-off-by: pang.xunlei <pang.xunlei@linaro.org> [jstultz: Reworded commit message] Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Update timekeeper before updating vsyscall and pvclock	Thomas Gleixner	2014-09-06	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The update_walltime() code works on the shadow timekeeper to make the seqcount protected region as short as possible. But that update to the shadow timekeeper does not update all timekeeper fields because it's sufficient to do that once before it becomes life. One of these fields is tkr.base_mono. That stays stale in the shadow timekeeper unless an operation happens which copies the real timekeeper to the shadow. The update function is called after the update calls to vsyscall and pvclock. While not correct, it did not cause any problems because none of the invoked update functions used base_mono. commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based) changed that in the kvm pvclock update function, so the stale mono_base value got used and caused kvm-clock to malfunction. Put the update where it belongs and fix the issue. Reported-by: Chris J Arges <chris.j.arges@canonical.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	timekeeping: Another fix to the VSYSCALL_OLD update_vsyscall	John Stultz	2014-08-14	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Benjamin Herrenschmidt pointed out that I further missed modifying update_vsyscall after the wall_to_mono value was changed to a timespec64. This causes issues on powerpc32, which expects a 32bit timespec. This patch fixes the problem by properly converting from a timespec64 to a timespec before passing the value on to the arch-specific vsyscall logic. [ Thomas is currently on vacation, but reviewed it and wanted me to send this fix on to you directly. ] Cc: LKML <linux-kernel@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	timekeeping: Use cached ntp_tick_length when accumulating error	John Stultz	2014-07-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By caching the ntp_tick_length() when we correct the frequency error, and then using that cached value to accumulate error, we avoid large initial errors when the tick length is changed. This makes convergence happen much faster in the simulator, since the initial error doesn't have to be slowly whittled away. This initially seems like an accounting error, but Miroslav pointed out that ntp_tick_length() can change mid-tick, so when we apply it in the error accumulation, we are applying any recent change to the entire tick. This approach chooses to apply changes in the ntp_tick_length() only to the next tick, which allows us to calculate the freq correction before using the new tick length, which avoids accummulating error. Credit to Miroslav for pointing this out and providing the original patch this functionality has been pulled out from, along with the rational. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Rework frequency adjustments to work better w/ nohz	John Stultz	2014-07-23	1	-110/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing timekeeping_adjust logic has always been complicated to understand. Further, since it was developed prior to NOHZ becoming common, its not surprising it performs poorly when NOHZ is enabled. Since Miroslav pointed out the problematic nature of the existing code in the NOHZ case, I've tried to refactor the code to perform better. The problem with the previous approach was that it tried to adjust for the total cumulative error using a scaled dampening factor. This resulted in large errors to be corrected slowly, while small errors were corrected quickly. With NOHZ the timekeeping code doesn't know how far out the next tick will be, so this results in bad over-correction to small errors, and insufficient correction to large errors. Inspired by Miroslav's patch, I've refactored the code to try to address the correction in two steps. 1) Check the future freq error for the next tick, and if the frequency error is large, try to make sure we correct it so it doesn't cause much accumulated error. 2) Then make a small single unit adjustment to correct any cumulative error that has collected over time. This method performs fairly well in the simulator Miroslav created. Major credit to Miroslav for pointing out the issue, providing the original patch to resolve this, a simulator for testing, as well as helping debug and resolve issues in my implementation so that it performed closer to his original implementation. Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reported-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Minor fixup for timespec64->timespec assignment	John Stultz	2014-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the GENERIC_TIME_VSYSCALL_OLD update_vsyscall implementation, we take the tk_xtime() value, which returns a timespec64, and store it in a timespec. This luckily is ok, since the only architectures that use GENERIC_TIME_VSYSCALL_OLD are ia64 and ppc64, which are both 64 bit systems where timespec64 is the same as a timespec. Even so, for cleanliness reasons, use the conversion function to assign the proper type. Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC	Thomas Gleixner	2014-07-23	1	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tracers want a correlated time between the kernel instrumentation and user space. We really do not want to export sched_clock() to user space, so we need to provide something sensible for this. Using separate data structures with an non blocking sequence count based update mechanism allows us to do that. The data structure required for the readout has a sequence counter and two copies of the timekeeping data. On the update side: smp_wmb(); tkf->seq++; smp_wmb(); update(tkf->base[0], tk); smp_wmb(); tkf->seq++; smp_wmb(); update(tkf->base[1], tk); On the reader side: do { seq = tkf->seq; smp_rmb(); idx = seq & 0x01; now = now(tkf->base[idx]); smp_rmb(); } while (seq != tkf->seq) So if a NMI hits the update of base[0] it will use base[1] which is still consistent, but this timestamp is not guaranteed to be monotonic across an update. The timestamp is calculated by: now = base_mono + clock_delta * slope So if the update lowers the slope, readers who are forced to the not yet updated second array are still using the old steeper slope. tmono ^ \| o n \| o n \| u \| o \|o \|12345678---> reader order o = old slope u = update n = new slope So reader 6 will observe time going backwards versus reader 5. While other CPUs are likely to be able observe that, the only way for a CPU local observation is when an NMI hits in the middle of the update. Timestamps taken from that NMI context might be ahead of the following timestamps. Callers need to be aware of that and deal with it. V2: Got rid of clock monotonic raw and reorganized the data structures. Folded in the barrier fix from Mathieu. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Use tk_read_base as argument for timekeeping_get_ns()	Thomas Gleixner	2014-07-23	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \|	All the function needs is in the tk_read_base struct. No functional change for the current code, just a preparatory patch for the NMI safe accessor to clock monotonic which will use struct tk_read_base as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Create struct tk_read_base and use it in struct timekeeper	Thomas Gleixner	2014-07-23	1	-66/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Restructure the timekeeper some more	Thomas Gleixner	2014-07-23	1	-20/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Access to time requires to touch two cachelines at minimum 1) The timekeeper data structure 2) The clocksource data structure The access to the clocksource data structure can be avoided as almost all clocksource implementations ignore the argument to the read callback, which is a pointer to the clocksource. But the core needs to touch it to access the members @read and @mask. So we are better off by copying the @read function pointer and the @mask from the clocksource to the core data structure itself. For the most used ktime_get() access all required data including the @read and @mask copies fits together with the sequence counter into a single 64 byte cacheline. For the other time access functions we touch in the current code three cache lines in the worst case. But with the clocksource data copies we can reduce that to two adjacent cachelines, which is more efficient than disjunct cache lines. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	clocksource: Get rid of cycle_last	Thomas Gleixner	2014-07-23	1	-12/+11
\| \| \| \| \| \| \| \| \|	cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	clocksource: Make delta calculation a function	Thomas Gleixner	2014-07-23	1	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \|	We want to move the TSC sanity check into core code to make NMI safe accessors to clock monotonic[_raw] possible. For this we need to sanity check the delta calculation. Create a helper function and convert all sites to use it. [ Build fix from jstultz ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Provide ktime_get_raw()	Thomas Gleixner	2014-07-23	1	-0/+25
\| \| \| \| \| \| \|	Provide a ktime_t based interface for raw monotonic time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Simplify timekeeping_clocktai()	Thomas Gleixner	2014-07-23	1	-31/+0
\| \| \| \| \| \| \| \|	timekeeping_clocktai() is not used in fast pathes, so the extra timespec conversion is not problematic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Remove timekeeper.total_sleep_time	Thomas Gleixner	2014-07-23	1	-11/+3
\| \| \| \| \| \| \|	No more users. Remove it Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Simplify getboottime()	Thomas Gleixner	2014-07-23	1	-8/+3
\| \| \| \| \| \| \| \| \|	Subtracting plain nsec values and converting to timespec is simpler than the whole timespec math. Not really fastpath code, so the division is not an issue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Use ktime_get_boottime() for get_monotonic_boottime()	Thomas Gleixner	2014-07-23	1	-34/+0
\| \| \| \| \| \| \| \|	get_monotonic_boottime() is not used in fast pathes, so the extra timespec conversion is not problematic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Remove monotonic_to_bootbased	Thomas Gleixner	2014-07-23	1	-15/+0
\| \| \| \| \| \| \|	No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Remove ktime_get_monotonic_offset()	Thomas Gleixner	2014-07-23	1	-18/+0
\| \| \| \| \| \| \|	No more users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Provide ktime_mono_to_any()	Thomas Gleixner	2014-07-23	1	-0/+20
\| \| \| \| \| \| \| \|	ktime based conversion function to map a monotonic time stamp to a different CLOCK. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping; Use ktime based data for ktime_get_update_offsets_tick()	Thomas Gleixner	2014-07-23	1	-6/+6
\| \| \| \| \| \| \|	No need to juggle with timespecs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Use ktime_t data for ktime_get_update_offsets_now()	Thomas Gleixner	2014-07-23	1	-6/+4
\| \| \| \| \| \| \|	No need to juggle with timespecs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Use ktime_t based data for ktime_get_clocktai()	Thomas Gleixner	2014-07-23	1	-15/+0
\| \| \| \| \|	Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping; Use ktime_t based data for ktime_get_boottime()	Thomas Gleixner	2014-07-23	1	-17/+0
\| \| \| \| \|	Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Use ktime_t based data for ktime_get_real()	Thomas Gleixner	2014-07-23	1	-15/+0
\| \| \| \| \| \| \|	Speed up the readout. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
*	timekeeping: Provide ktime_get_with_offset()	Thomas Gleixner	2014-07-23	1	-0/+27
\| \| \| \| \| \| \| \|	Provide a helper function which lets us implement ktime_t based interfaces for real, boot and tai clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>