summaryrefslogtreecommitdiffstats
path: root/include/linux/crush
diff options
context:
space:
mode:
authorIlya Dryomov <ilya.dryomov@inktank.com>2014-03-19 16:58:37 +0200
committerSage Weil <sage@inktank.com>2014-04-04 21:07:26 -0700
commite2b149cc4ba00766aceb87950c6de72ea7fc8b2e (patch)
tree5f3d7b5dd55b7f75c412db786e1e6f4915ef9ed8 /include/linux/crush
parent6ed1002f368c63ef79d7f659fcb4368a90098132 (diff)
downloadlinux-e2b149cc4ba00766aceb87950c6de72ea7fc8b2e.tar.gz
linux-e2b149cc4ba00766aceb87950c6de72ea7fc8b2e.tar.bz2
linux-e2b149cc4ba00766aceb87950c6de72ea7fc8b2e.zip
crush: add chooseleaf_vary_r tunable
The current crush_choose_firstn code will re-use the same 'r' value for the recursive call. That means that if we are hitting a collision or rejection for some reason (say, an OSD that is marked out) and need to retry, we will keep making the same (bad) choice in that recursive selection. Introduce a tunable that fixes that behavior by incorporating the parent 'r' value into the recursive starting point, so that a different path will be taken in subsequent placement attempts. Note that this was done from the get-go for the new crush_choose_indep algorithm. This was exposed by a user who was seeing PGs stuck in active+remapped after reweight-by-utilization because the up set mapped to a single OSD. Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Diffstat (limited to 'include/linux/crush')
-rw-r--r--include/linux/crush/crush.h6
1 files changed, 6 insertions, 0 deletions
diff --git a/include/linux/crush/crush.h b/include/linux/crush/crush.h
index acaa5615d634..75f36a6c7f67 100644
--- a/include/linux/crush/crush.h
+++ b/include/linux/crush/crush.h
@@ -173,6 +173,12 @@ struct crush_map {
* apply to a collision: in that case we will retry as we used
* to. */
__u32 chooseleaf_descend_once;
+
+ /* if non-zero, feed r into chooseleaf, bit-shifted right by (r-1)
+ * bits. a value of 1 is best for new clusters. for legacy clusters
+ * that want to limit reshuffling, a value of 3 or 4 will make the
+ * mappings line up a bit better with previous mappings. */
+ __u8 chooseleaf_vary_r;
};