summaryrefslogtreecommitdiffstats
path: root/drivers/cpufreq/cpufreq_ondemand.c
Commit message (Collapse)AuthorAgeFilesLines
* cpufreq: governor: Move per-CPU data to the common codeRafael J. Wysocki2016-03-091-16/+10
| | | | | | | | | | | | | | | | | | | | After previous changes there is only one piece of code in the ondemand governor making references to per-CPU data structures, but it can be easily modified to avoid doing that, so modify it accordingly and move the definition of per-CPU data used by the ondemand and conservative governors to the common code. Next, change that code to access the per-CPU data structures directly rather than via a governor callback. This causes the ->get_cpu_cdbs governor callback to become unnecessary, so drop it along with the macro and function definitions related to it. Finally, drop the definitions of struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s that aren't necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Make governor private data per-policyRafael J. Wysocki2016-03-091-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s are only used for a limited set of CPUs. Namely, if a policy is shared between multiple CPUs, those fields will only be used for one of them (policy->cpu). This means that they really are per-policy rather than per-CPU and holding room for them in per-CPU data structures is generally wasteful. Also moving those fields into per-policy data structures will allow some significant simplifications to be made going forward. For this reason, introduce struct cs_policy_dbs_info and struct od_policy_dbs_info to hold those fields. Define each of the new structures as an extension of struct policy_dbs_info (such that struct policy_dbs_info is embedded in each of them) and introduce new ->alloc and ->free governor callbacks to allocate and free those structures, respectively, such that ->alloc() will return a pointer to the struct policy_dbs_info embedded in the allocated data structure and ->free() will take that pointer as its argument. With that, modify the code accessing the data fields in question in per-CPU data objects to look for them in the new structures via the struct policy_dbs_info pointer available to it and drop them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: Rework the handling of powersave bias updatesRafael J. Wysocki2016-03-091-17/+13
| | | | | | | | | | | | | | | | | | | The ondemand_powersave_bias_init() function used for resetting data fields related to the powersave bias tunable of the ondemand governor works by walking all of the online CPUs in the system and updating the od_cpu_dbs_info_s structures for all of them. However, if governor tunables are per policy, the update should not touch the CPUs that are not associated with the given dbs_data. Moreover, since the data fields in question are only ever used for policy->cpu in each policy governed by ondemand, the update can be limited to those specific CPUs. Rework the code to take the above observations into account. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Fix CPU load information updates via ->storeRafael J. Wysocki2016-03-091-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ->store() callbacks of some tunable sysfs attributes of the ondemand and conservative governors trigger immediate updates of the CPU load information for all CPUs "governed" by the given dbs_data by walking the cpu_dbs_info structures for all online CPUs in the system and updating them. This is questionable for two reasons. First, it may lead to a lot of extra overhead on a system with many CPUs if the given dbs_data is only associated with a few of them. Second, if governor tunables are per-policy, the CPUs associated with the other sets of governor tunables should not be updated. To address this issue, use the observation that in all of the places in question the update operation may be carried out in the same way (because all of the tunables involved are now located in struct dbs_data and readily available to the common code) and make the code in those places invoke the same (new) helper function that will carry out the update correctly. That new function always checks the ignore_nice_load tunable value and updates the CPUs' prev_cpu_nice data fields if that's set, which wasn't done by the original code in store_io_is_busy(), but it should have been done in there too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: Drop one more callback from struct od_opsRafael J. Wysocki2016-03-091-2/+1
| | | | | | | | | | The ->powersave_bias_init_cpu callback in struct od_ops is only used in one place and that invocation may be replaced with a direct call to the function pointed to by that callback, so change the code accordingly and drop the callback. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Drop unused governor callback and data fieldsRafael J. Wysocki2016-03-091-3/+0
| | | | | | | | | | | | | | After some previous changes, the ->get_cpu_dbs_info_s governor callback and the "governor" field in struct dbs_governor (whose value represents the governor type) are not used any more, so drop them. Also drop the unused gov_ops field from struct dbs_governor. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Add a ->start callback for governorsRafael J. Wysocki2016-03-091-0/+10
| | | | | | | | | | | To avoid having to check the governor type explicitly in the common code in order to initialize data structures specific to the governor type properly, add a ->start callback to struct dbs_governor and use it to initialize those data structures for the ondemand and conservative governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Move io_is_busy to struct dbs_dataRafael J. Wysocki2016-03-091-7/+5
| | | | | | | | | | | | | | | The io_is_busy governor tunable is only used by the ondemand governor and is located in the ondemand-specific data structure, but it is looked at by the common governor code that has to do ugly things to get to that value, so move it to struct dbs_data and modify ondemand accordingly. Since the conservative governor never touches that field, it will be always 0 for that governor and it won't have any effect on the results of computations in that case. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: Drop unused callback from struct od_opsRafael J. Wysocki2016-03-091-1/+0
| | | | | | | | | | The ->freq_increase callback in struct od_ops is never invoked, so drop it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: Simplify od_update() slightlyRafael J. Wysocki2016-03-091-8/+5
| | | | | | | | | | Drop some lines of code from od_update() by arranging the statements in there in a more logical way. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Use microseconds in sample delay computationsRafael J. Wysocki2016-03-091-15/+13
| | | | | | | | | | Do not convert microseconds to jiffies and the other way around in governor computations related to the sampling rate and sample delay and drop delay_for_sampling_rate() which isn't of any use then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: Simplify conditionals in od_dbs_timer()Rafael J. Wysocki2016-03-091-13/+11
| | | | | | | | | | Reduce the indentation level in the conditionals in od_dbs_timer() and drop the delay variable from it. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Move rate_mult to struct policy_dbsRafael J. Wysocki2016-03-091-8/+15
| | | | | | | | | | | | | | | | The rate_mult field in struct od_cpu_dbs_info_s is used by the code shared with the conservative governor and to access it that code has to do an ugly governor type check. However, first of all it is ever only used for policy->cpu, so it is per-policy rather than per-CPU and second, it is initialized to 1 by cpufreq_governor_start(), so if the conservative governor never modifies it, it will have no effect on the results of any computations. For these reasons, move rate_mult to struct policy_dbs_info (as a common field). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Get rid of the ->gov_check_cpu callbackRafael J. Wysocki2016-03-091-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | The way the ->gov_check_cpu governor callback is used by the ondemand and conservative governors is not really straightforward. Namely, the governor calls dbs_check_cpu() that updates the load information for the policy and the invokes ->gov_check_cpu() for the governor. To get rid of that entanglement, notice that cpufreq_governor_limits() doesn't need to call dbs_check_cpu() directly. Instead, it can simply reset the sample delay to 0 which will cause a sample to be taken immediately. The result of that is practically equivalent to calling dbs_check_cpu() except that it will trigger a full update of governor internal state and not just the ->gov_check_cpu() part. Following that observation, make cpufreq_governor_limits() reset the sample delay and turn dbs_check_cpu() into a function that will simply evaluate the load and return the result called dbs_update(). That function can now be called by governors from the routines that previously were pointed to by ->gov_check_cpu and those routines can be called directly by each governor instead of dbs_check_cpu(). This way ->gov_check_cpu becomes unnecessary, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: Rearrange od_dbs_timer() to avoid updating delayViresh Kumar2016-03-091-5/+4
| | | | | | | | | Avoid extra checks in od_dbs_timer() by rearranging updates to the local delay variable in it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: conservative: Update sample_delay_ns immediatelyViresh Kumar2016-03-091-69/+0
| | | | | | | | | | | | | | | | | | | | | The ondemand governor already updates sample_delay_ns immediately on updates to the sampling rate, but conservative doesn't do that. It was left out earlier as the code was really too complex to get that done easily. Things are sorted out very well now, however, and the conservative governor can be modified to follow ondemand in that respect. Moreover, since the code needed to implement that in the conservative governor would be identical to the corresponding ondemand governor's code, make that code common and change both governors to use it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Create and traverse list of policy_dbs to avoid deadlockViresh Kumar2016-03-091-59/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | The dbs_data_mutex lock is currently used in two places. First, cpufreq_governor_dbs() uses it to guarantee mutual exclusion between invocations of governor operations from the core. Second, it is used by ondemand governor's update_sampling_rate() to ensure the stability of data structures walked by it. The second usage is quite problematic, because update_sampling_rate() is called from a governor sysfs attribute's ->store callback and that leads to a deadlock scenario involving cpufreq_governor_exit() which runs under dbs_data_mutex. Thus it is better to rework the code so update_sampling_rate() doesn't need to acquire dbs_data_mutex. To that end, rework update_sampling_rate() to walk a list of policy_dbs objects supported by the dbs_data one it has been called for (instead of walking cpu_dbs_info object for all CPUs). The list manipulation is protected with dbs_data->mutex which also is held around the execution of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in that function any more. Reported-by: Juri Lelli <juri.lelli@arm.com> Reported-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: New sysfs show/store callbacks for governor tunablesViresh Kumar2016-03-091-47/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use the same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs). Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too. To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog + rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Move common tunables to 'struct dbs_data'Viresh Kumar2016-03-091-23/+18
| | | | | | | | | | | | There are a few common tunables shared between the ondemand and conservative governors. Move them to struct dbs_data to simplify code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Create generic macro for common tunablesViresh Kumar2016-03-091-4/+4
| | | | | | | | | | | | | | | | Some tunables are present in governor-specific structures, whereas one (min_sampling_rate) is located directly in struct dbs_data. There is a special macro for creating its sysfs attribute and the show/store callbacks, but since more tunables are going to be moved to struct dbs_data, a new generic macro for such cases will be useful, so add it and use it for min_sampling_rate. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Rearrange governor data structuresRafael J. Wysocki2016-03-091-7/+11
| | | | | | | | | | | | | | | | The struct policy_dbs_info objects representing per-policy governor data are not accessible directly from the corresponding policy objects. To access them, one has to get a pointer to the struct cpu_dbs_info of policy->cpu and use the policy_dbs field of that which isn't really straightforward. To address that rearrange the governor data structures so the governor_data pointer in struct cpufreq_policy will point to struct policy_dbs_info (instead of struct dbs_data) and that will contain a pointer to struct dbs_data. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Drop cpu argument from dbs_check_cpu()Rafael J. Wysocki2016-03-091-4/+2
| | | | | | | | | Since policy->cpu is always passed as the second argument to dbs_check_cpu(), it is not really necessary to pass it, because the function can obtain that value via its first argument just fine. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Rename cpu_common_dbs_info to policy_dbs_infoRafael J. Wysocki2016-03-091-14/+14
| | | | | | | | | | | | | | The struct cpu_common_dbs_info structure represents the per-policy part of the governor data (for the ondemand and conservative governors), but its name doesn't reflect its purpose. Rename it to struct policy_dbs_info and rename variables related to it accordingly. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Drop the gov pointer from struct dbs_dataRafael J. Wysocki2016-03-091-1/+1
| | | | | | | | | | | | | Since it is possible to obtain a pointer to struct dbs_governor from a pointer to the struct governor embedded in it with the help of container_of(), the additional gov pointer in struct dbs_data isn't really necessary. Drop that pointer and make the code using it reach the dbs_governor object via policy->governor. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Rework cpufreq_governor_dbs()Rafael J. Wysocki2016-03-091-10/+1
| | | | | | | | | | | | | | Since it is possible to obtain a pointer to struct dbs_governor from a pointer to the struct governor embedded in it via container_of(), the second argument of cpufreq_governor_init() is not necessary. Accordingly, cpufreq_governor_dbs() doesn't need its second argument either and the ->governor callbacks for both the ondemand and conservative governors may be set to cpufreq_governor_dbs() directly. Make that happen. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Rename some data types and variablesRafael J. Wysocki2016-03-091-4/+4
| | | | | | | | | | | | The ondemand and conservative governors are represented by struct common_dbs_data whose name doesn't reflect the purpose it is used for, so rename it to struct dbs_governor and rename variables of that type accordingly. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Put governor structure into common_dbs_dataRafael J. Wysocki2016-03-091-13/+15
| | | | | | | | | | | | | | | | | | | For the ondemand and conservative governors (generally, governors that use the common code in cpufreq_governor.c), there are two static data structures representing the governor, the struct governor structure (the interface to the cpufreq core) and the struct common_dbs_data one (the interface to the cpufreq_governor.c code). There's no fundamental reason why those two structures have to be separate. Moreover, if the struct governor one is included into struct common_dbs_data, it will be possible to reach the latter from the policy via its policy->governor pointer, so it won't be necessary to pass a separate pointer to it around. For this reason, embed struct governor in struct common_dbs_data. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Use common mutex for dbs_data protectionRafael J. Wysocki2016-03-091-3/+2
| | | | | | | | | | | | | | | | | | | | | | Every governor relying on the common code in cpufreq_governor.c has to provide its own mutex in struct common_dbs_data. However, there actually is no need to have a separate mutex per governor for this purpose, they may be using the same global mutex just fine. Accordingly, introduce a single common mutex for that and drop the mutex field from struct common_dbs_data. That at least will ensure that the mutex is always present and initialized regardless of what the particular governors do. Another benefit is that the common code does not need a pointer to a governor-related structure to get to the mutex which sometimes helps. Finally, it makes the code generally easier to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: governor: Replace timers with utilization update callbacksRafael J. Wysocki2016-03-091-20/+23
| | | | | | | | | | | | | | | Instead of using a per-CPU deferrable timer for queuing up governor work items, register a utilization update callback that will be invoked from the scheduler on utilization changes. The sampling rate is still the same as what was used for the deferrable timers and the added irq_work overhead should be offset by the eliminated timers overhead, so in theory the functional impact of this patch should not be significant. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
* cpufreq: Clean up default and fallback governor setupRafael J. Wysocki2016-02-051-18/+18
| | | | | | | | | | | The preprocessor magic used for setting the default cpufreq governor (and for using the performance governor as a fallback one for that matter) is really nasty, so replace it with __weak functions and overrides. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* cpufreq: ondemand: update update_sampling_rate() to make it more efficientViresh Kumar2015-12-091-1/+11
| | | | | | | | | | | | | Currently update_sampling_rate() runs over each online CPU and cancels/queues timers on all policy->cpus every time. This should be done just once for any cpu belonging to a policy. Create a cpumask and keep on clearing it as and when we process policies, so that we don't have to traverse through all CPUs of the same policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: replace per-CPU delayed work with timersViresh Kumar2015-12-091-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq governors evaluate load at sampling rate and based on that they update frequency for a group of CPUs belonging to the same cpufreq policy. This is required to be done in a single thread for all policy->cpus, but because we don't want to wakeup idle CPUs to do just that, we use deferrable work for this. If we would have used a single delayed deferrable work for the entire policy, there were chances that the CPU required to run the handler can be in idle and we might end up not changing the frequency for the entire group with load variations. And so we were forced to keep per-cpu works, and only the one that expires first need to do the real work and others are rescheduled for next sampling time. We have been using the more complex solution until now, where we used a delayed deferrable work for this, which is a combination of a timer and a work. This could be made lightweight by keeping per-cpu deferred timers with a single work item, which is scheduled by the first timer that expires. This patch does just that and here are important changes: - The timer handler will run in irq context and so we need to use a spin_lock instead of the timer_mutex. And so a separate timer_lock is created. This also makes the use of the mutex and lock quite clear, as we know what exactly they are protecting. - A new field 'skip_work' is added to track when the timer handlers can queue a work. More comments present in code. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()Viresh Kumar2015-12-071-3/+2
| | | | | | | | Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and dbs_data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: ondemand: Work is guaranteed to be pendingViresh Kumar2015-12-071-3/+0
| | | | | | | | | We are guaranteed to have works scheduled for policy->cpus, as the policy isn't stopped yet. And so there is no need to check that again. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: ondemand: Update sampling rate only for concerned policiesViresh Kumar2015-12-071-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | We are comparing policy->governor against cpufreq_gov_ondemand to make sure that we update sampling rate only for the concerned CPUs. But that isn't enough. In case of governor_per_policy, there can be multiple instances of ondemand governor and we will always end up updating all of them with current code. What we rather need to do, is to compare dbs_data with poilcy->governor_data, which will match only for the policies governed by dbs_data. This code is also racy as the governor might be getting stopped at that time and we may end up scheduling work for a policy, which we have just disabled. Fix that by protecting the entire function with &od_dbs_cdata.mutex, which will prevent against races with policy START/STOP/etc. After these locks are in place, we can safely get the policy via per-cpu dbs_info. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()Viresh Kumar2015-10-281-9/+1
| | | | | | | | | | | | | | | | 'timer_mutex' is required to sync work-handlers of policy->cpus. update_sampling_rate() is just canceling the works and queuing them again. This isn't protecting anything at all in update_sampling_rate() and is not gonna be of any use. Even if a work-handler is already running for a CPU, cancel_delayed_work_sync() will wait for it to finish. Drop these unnecessary locks. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: split out common part of {cs|od}_dbs_timer()Viresh Kumar2015-07-211-22/+14
| | | | | | | | | | | | | | Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated. Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()). Shouldn't make any functional change. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Keep single copy of information common to policy->cpusViresh Kumar2015-07-211-17/+21
| | | | | | | | | | | | | | | | Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone. The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: rename cur_policy as policyViresh Kumar2015-07-171-9/+10
| | | | | | | | | Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Name delayed-work as dworkViresh Kumar2015-07-171-4/+4
| | | | | | | | | Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Serialize governor callbacksViresh Kumar2015-06-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several races reported in cpufreq core around governors (only ondemand and conservative) by different people. There are at least two race scenarios present in governor code: (a) Concurrent access/updates of governor internal structures. It is possible that fields such as 'dbs_data->usage_count', etc. are accessed simultaneously for different policies using same governor structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And because of this we can dereference bad pointers. For example consider a system with two CPUs with separate 'struct cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave. CPU0 switching to powersave and CPU1 to ondemand: CPU0 CPU1 store* store* cpufreq_governor_exit() cpufreq_governor_init() dbs_data = cdata->gdbs_data; if (!--dbs_data->usage_count) kfree(dbs_data); dbs_data->usage_count++; *Bad pointer dereference* There are other races possible between EXIT and START/STOP/LIMIT as well. Its really complicated. (b) Switching governor state in bad sequence: For example trying to switch a governor to START state, when the governor is in EXIT state. There are some checks present in __cpufreq_governor() but they aren't sufficient as they compare events against 'policy->governor_enabled', where as we need to take governor's state into account, which can be used by multiple policies. These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core. The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events. This patch is trying to solve only the first problem. There are two types of data we need to protect, - 'struct common_dbs_data': No matter what, there is going to be a single copy of this per governor. - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we will have per-policy copy of this data, otherwise a single copy. Because of such complexities, the mutex present in 'struct dbs_data' is insufficient to solve our problem. For example we need to protect fetching of 'dbs_data' from different structures at the beginning of cpufreq_governor_dbs(), to make sure it isn't currently being updated. This can be fixed if we can guarantee serialization of event parsing code for an individual governor. This is best solved with a mutex per governor, and the placeholder for that is 'struct common_dbs_data'. And so this patch moves the mutex from 'struct dbs_data' to 'struct common_dbs_data' and takes it at the beginning and drops it at the end of cpufreq_governor_dbs(). Tested with and without following configuration options: CONFIG_LOCKDEP_SUPPORT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: register notifier from cs_init()Viresh Kumar2015-06-151-2/+2
| | | | | | | | | | Notifiers are required only for conservative governor and the common governor code is unnecessarily polluted with that. Handle that from cs_init/exit() instead of cpufreq_governor_dbs(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: ondemand: Eliminate the deadband effectStratos Karafotis2014-07-211-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, ondemand calculates the target frequency proportional to load using the formula: Target frequency = C * load where C = policy->cpuinfo.max_freq / 100 Though, in many cases, the minimum available frequency is pretty high and the above calculation introduces a dead band from load 0 to 100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target frequency is always calculated to less than policy->cpuinfo.min_freq and the minimum frequency is selected. For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000 and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU starts to scale up at a load above 47. On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000 and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale at load above 25. Change the calculation of target frequency to eliminate the above effect using the formula: Target frequency = A + B * load where A = policy->cpuinfo.min_freq and B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100 This will map load values 0 to 100 linearly to cpuinfo.min_freq to cpuinfo.max_freq. Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the closest frequency in frequency_table. This is necessary to avoid selection of minimum frequency only when load equals to 0. It will also help for selection of frequencies using a more 'fair' criterion. Tables below show the difference in selected frequency for specific values of load without and with this patch. On Intel i7-3770 @ 3.40GHz: Without With Load Target Selected Target Selected 0 0 1600000 1600000 1600000 5 170050 1600000 1690050 1700000 10 340100 1600000 1780100 1700000 15 510150 1600000 1870150 1900000 20 680200 1600000 1960200 2000000 25 850250 1600000 2050250 2100000 30 1020300 1600000 2140300 2100000 35 1190350 1600000 2230350 2200000 40 1360400 1600000 2320400 2400000 45 1530450 1600000 2410450 2400000 50 1700500 1900000 2500500 2500000 55 1870550 1900000 2590550 2600000 60 2040600 2100000 2680600 2600000 65 2210650 2400000 2770650 2800000 70 2380700 2400000 2860700 2800000 75 2550750 2600000 2950750 3000000 80 2720800 2800000 3040800 3000000 85 2890850 2900000 3130850 3100000 90 3060900 3100000 3220900 3300000 95 3230950 3300000 3310950 3300000 100 3401000 3401000 3401000 3401000 On ARM quad core 1500MHz Krait: Without With Load Target Selected Target Selected 0 0 384000 384000 384000 5 75600 384000 440400 486000 10 151200 384000 496800 486000 15 226800 384000 553200 594000 20 302400 384000 609600 594000 25 378000 384000 666000 702000 30 453600 486000 722400 702000 35 529200 594000 778800 810000 40 604800 702000 835200 810000 45 680400 702000 891600 918000 50 756000 810000 948000 918000 55 831600 918000 1004400 1026000 60 907200 918000 1060800 1026000 65 982800 1026000 1117200 1134000 70 1058400 1134000 1173600 1134000 75 1134000 1134000 1230000 1242000 80 1209600 1242000 1286400 1242000 85 1285200 1350000 1342800 1350000 90 1360800 1458000 1399200 1350000 95 1436400 1458000 1455600 1458000 100 1512000 1512000 1512000 1512000 Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait (Android smartphone). Benchmarks on Intel i7 shows a performance improvement on low and medium work loads with lower power consumption. Specifics: Phoronix Linux Kernel Compilation 3.1: Time: -0.40%, energy: -0.07% Phoronix Apache: Time: -4.98%, energy: -2.35% Phoronix FFMPEG: Time: -6.29%, energy: -4.02% Also, running mp3 decoding (very low load) shows no differences with and without this patch. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: ondemand: Remove redundant return statementStratos Karafotis2013-11-011-1/+0
| | | | | | | | | | After commit dfa5bb622555 (cpufreq: ondemand: Change the calculation of target frequency), this return statement is no longer needed. Reported-by: Henrik Nilsson <Karl.Henrik.Nilsson@gmail.com> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governors: Remove duplicate check of target freq in supported rangeStratos Karafotis2013-08-281-3/+0
| | | | | | | | | | | Function __cpufreq_driver_target() checks if target_freq is within policy->min and policy->max range. generic_powersave_bias_target() also checks if target_freq is valid via a cpufreq_frequency_table_target() call. So, drop the unnecessary duplicate check in *_check_cpu(). Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge back earlier 'pm-cpufreq' materialRafael J. Wysocki2013-08-141-49/+14
|\
| * cpufreq: Use sizeof(*ptr) convetion for computing sizesViresh Kumar2013-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chapter 14 of Documentation/CodingStyle says: The preferred form for passing a size of a struct is the following: p = kmalloc(sizeof(*p), ...); The alternative form where struct name is spelled out hurts readability and introduces an opportunity for a bug when the pointer variable type is changed but the corresponding sizeof that is passed to a memory allocator is not. This wasn't followed consistently in drivers/cpufreq, let's make it more consistent by always following this rule. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: Give consistent names to cpufreq_policy objectsViresh Kumar2013-08-071-5/+5
| | | | | | | | | | | | | | | | They are called policy, cur_policy, new_policy, data, etc. Just call them policy wherever possible. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: Clean up header files included in the coreViresh Kumar2013-08-071-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses the following issues in the header files in the cpufreq core: - Include headers in ascending order, so that we don't add same many times by mistake. - <asm/> must be included after <linux/>, so that they override whatever they need to. - Remove unnecessary includes. - Don't include files already included by cpufreq.h or cpufreq_governor.h. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * cpufreq: ondemand: Change the calculation of target frequencyStratos Karafotis2013-07-261-32/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ondemand governor calculates load in terms of frequency and increases it only if load_freq is greater than up_threshold multiplied by the current or average frequency. This appears to produce oscillations of frequency between min and max because, for example, a relatively small load can easily saturate minimum frequency and lead the CPU to the max. Then, it will decrease back to the min due to small load_freq. Change the calculation method of load and target frequency on the basis of the following two observations: - Load computation should not depend on the current or average measured frequency. For example, absolute load of 80% at 100MHz is not necessarily equivalent to 8% at 1000MHz in the next sampling interval. - It should be possible to increase the target frequency to any value present in the frequency table proportional to the absolute load, rather than to the max only, so that: Target frequency = C * load where we take C = policy->cpuinfo.max_freq / 100. Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait. Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an increase ~1.5% in performance. cpufreq_stats (time_in_state) shows that middle frequencies are used more, with this patch. Highest and lowest frequencies were used less by ~9%. [rjw: We have run multiple other tests on kernels with this change applied and in the vast majority of cases it turns out that the resulting performance improvement also leads to reduced consumption of energy. The change is additionally justified by the overall simplification of the code in question.] Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>