summaryrefslogtreecommitdiffstats
path: root/fs/exofs/exofs.h
diff options
context:
space:
mode:
authorBoaz Harrosh <bharrosh@panasas.com>2010-02-01 13:35:51 +0200
committerBoaz Harrosh <bharrosh@panasas.com>2010-02-28 03:43:08 -0800
commit5d952b8391692553c31e620a92d6e09262a9a307 (patch)
treeb3a1a0490fc98b6304685d64bb4774235ec94a2d /fs/exofs/exofs.h
parentd9c740d2253e75db8cef8f87a3125c450f3ebd82 (diff)
downloadlinux-5d952b8391692553c31e620a92d6e09262a9a307.tar.gz
linux-5d952b8391692553c31e620a92d6e09262a9a307.tar.bz2
linux-5d952b8391692553c31e620a92d6e09262a9a307.zip
exofs: RAID0 support
We now support striping over mirror devices. Including variable sized stripe_unit. Some limits: * stripe_unit must be a multiple of PAGE_SIZE * stripe_unit * stripe_count is maximum upto 32-bit (4Gb) Tested RAID0 over mirrors, RAID0 only, mirrors only. All check. Design notes: * I'm not using a vectored raid-engine mechanism yet. Following the pnfs-objects-layout data-map structure, "Mirror" is just a private case of "group_width" == 1, and RAID0 is a private case of "Mirrors" == 1. The performance lose of the general case over the particular special case optimization is totally negligible, also considering the extra code size. * In general I added a prepare_stripes() stage that divides the to-be-io pages to the participating devices, the previous exofs_ios_write/read, now becomes _write/read_mirrors and a new write/read upper layer loops on all devices calling _write/read_mirrors. Effectively the prepare_stripes stage is the all secret. Also truncate need fixing to accommodate for striping. * In a RAID0 arrangement, in a regular usage scenario, if all inode layouts will start at the same device, the small files fill up the first device and the later devices stay empty, the farther the device the emptier it is. To fix that, each inode will start at a different stripe_unit, according to it's obj_id modulus number-of-stripe-units. And will then span all stripe-units in the same incrementing order wrapping back to the beginning of the device table. We call it a stripe-units moving window. Special consideration was taken to keep all devices in a mirror arrangement identical. So a broken osd-device could just be cloned from one of the mirrors and no FS scrubbing is needed. (We do that by rotating stripe-unit at a time and not a single device at a time.) TODO: We no longer verify object_length == inode->i_size in exofs_iget. (since i_size is stripped on multiple objects now). I should introduce a multiple-device attribute reading, and use it in exofs_iget. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Diffstat (limited to 'fs/exofs/exofs.h')
-rw-r--r--fs/exofs/exofs.h11
1 files changed, 11 insertions, 0 deletions
diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
index 09e331935514..0d8a34b21ae1 100644
--- a/fs/exofs/exofs.h
+++ b/fs/exofs/exofs.h
@@ -58,6 +58,14 @@
struct exofs_layout {
osd_id s_pid; /* partition ID of file system*/
+ /* Our way of looking at the data_map */
+ unsigned stripe_unit;
+ unsigned mirrors_p1;
+
+ unsigned group_width;
+
+ enum exofs_inode_layout_gen_functions lay_func;
+
unsigned s_numdevs; /* Num of devices in array */
struct osd_dev *s_ods[0]; /* Variable length */
};
@@ -133,6 +141,9 @@ struct exofs_io_state {
struct exofs_per_dev_state {
struct osd_request *or;
struct bio *bio;
+ loff_t offset;
+ unsigned length;
+ unsigned dev;
} per_dev[];
};