summaryrefslogtreecommitdiffstats
path: root/fs/gfs2/incore.h
Commit message (Collapse)AuthorAgeFilesLines
* GFS2: Automatically adjust glock min hold timeBob Peterson2011-07-151-1/+1
| | | | | | | | This patch is a performance improvement for GFS2 in a clustered environment. It makes the glock hold time self-adjusting. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Cache dir hash table in a contiguous bufferSteven Whitehouse2011-07-151-0/+1
| | | | | | | | | | | | | | | | | | This patch adds a cache for the hash table to the directory code in order to help simplify the way in which the hash table is accessed. This is intended to be a first step towards introducing some performance improvements in the directory code. There are two follow ups that I'm hoping to see fairly shortly. One is to simplify the hash table reading code now that we always read the complete hash table, whether we want one entry or all of them. The other is to introduce readahead on the heads of the hash chains which are referred to from the table. The hash table is a maximum of 128k in size, so it is not worth trying to read it in small chunks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix race during filesystem mountSteven Whitehouse2011-07-121-0/+2
| | | | | | | | | | | | | | | | | | | | There is a potential race during filesystem mounting which has recently been reported. It occurs when the userland gfs_controld is able to process requests fast enough that it tries to use the sysfs interface before the lock module is properly initialised. This is a pretty unusual case as normally the lock module initialisation is very quick compared with gfs_controld. This patch adds an interruptible completion which is used to ensure that userland will wait for the initialisation of the lock module to complete. There are other potential solutions to this problem, but this is the quickest at this stage and has been tested both with and without mount.gfs2 present in the system. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: David Booher <dbooher@adams.net>
* GFS2: Use UUID field in generic superblockSteven Whitehouse2011-05-101-1/+0
| | | | | | | The VFS superblock structure now has a UUID field, so we can use that in preference to the UUID field in the GFS2 superblock now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Make writeback more responsive to system conditionsSteven Whitehouse2011-04-201-4/+0
| | | | | | | | | | | | | | | This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Optimise glock lru and end of life inodesSteven Whitehouse2011-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | | The GLF_LRU flag introduced in the previous patch can be used to check if a glock is on the lru list when a new holder is queued and if so remove it, without having first to get the lru_lock. The main purpose of this patch however is to optimise the glocks left over when an inode at end of life is being evicted. Previously such glocks were left with the GLF_LFLUSH flag set, so that when reclaimed, each one required a log flush. This patch resets the GLF_LFLUSH flag when there is nothing left to flush thus preventing later log flushes as glocks are reused or demoted. In order to do this, we need to keep track of the number of revokes which are outstanding, and also to clear the GLF_LFLUSH bit after a log commit when only revokes have been processed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Improve tracing support (adds two flags)Steven Whitehouse2011-04-201-0/+2
| | | | | | | | | | | | | | This adds support for two new flags. One keeps track of whether the glock is on the LRU list or not. The other isn't really a flag as such, but an indication of whether the glock has an attached object or not. This indication is reported without any locking, which is ok since we do not dereference the object pointer but merely report whether it is NULL or not. Also, this fixes one place where a tracepoint was missing, which was at the point we remove deallocated blocks from the journal. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: introduce AIL lockDave Chinner2011-03-111-0/+1
| | | | | | | | | | | | | | | | | | | The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: quota allows exceeding hard limitAbhijith Das2011-03-091-0/+1
| | | | | | | | | | | | | | | | | Immediately after being synced to disk, cached quotas are zeroed out and a subsequent access of the cached quotas results in incorrect zero values. This meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage values it found in the cached quotas and comparison against warn/limits never triggered a quota violation. This patch adds a new flag QDF_REFRESH that is set after a sync so that the cached quotas are forcefully refreshed from disk on a subsequent access on seeing this flag set. Resolves: rhbz#675944 Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Use RCU for glock hash tableSteven Whitehouse2011-01-211-1/+4
| | | | | | | | | | | | | | | | | | | This has a number of advantages: - Reduces contention on the hash table lock - Makes the code smaller and simpler - Should speed up glock dumps when under load - Removes ref count changing in examine_bucket - No longer need hash chain lock in glock_put() in common case There are some further changes which this enables and which we may do in the future. One is to look at using SLAB_RCU, and another is to look at using a per-cpu counter for the per-sb glock counter, since that is touched twice in the lifetime of each glock (but only used at umount time). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* headers: kobject.h reduxAlexey Dobriyan2011-01-101-0/+1
| | | | | | | | Remove kobject.h from files which don't need it, notably, sched.h and fs.h. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* GFS2: Merge glock state fields into a bitfieldSteven Whitehouse2010-11-301-5/+7
| | | | | | | | | | | | | | | | | | We can only merge the fields into a bitfield if the locking rules for them are the same. In this case gl_spin covers all of the fields (write side) but a couple of them are used with GLF_LOCK as the read side lock, which should be ok since we know that the field in question won't be changing at the time. The gl_req setting has to be done earlier (in glock.c) in order to place it under gl_spin. The gl_reply setting also has to be brought under gl_spin in order to comply with the new rules. This saves 4*sizeof(unsigned int) per glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
* GFS2: Improve journal allocation via sysfsSteven Whitehouse2010-09-291-1/+1
| | | | | | | | | | | | | | | | Recently a feature was added to GFS2 to allow journal id allocation via sysfs. This patch builds upon that so that a negative journal id will be treated as an error code to be passed back as the return code from mount. This allows termination of the mount process if there is a failure. Also, the process has been updated so that the kernel will wait for a journal id, even in the "spectator" case. This is required in order to avoid mounting a filesystem in case there is an error while joining the cluster. In the spectator case, 0 is written into the file to indicate that all is well, and that mount should continue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove upgrade mount optionSteven Whitehouse2010-09-241-1/+0
| | | | | | | | | This option has never done anything useful. Also at the same time this cleans up the sb checks which are done at mount time. The debug option will be accepted, but ignored in future. Since it didn't do anything, there didn't seem much point in retaining it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove localcaching mount optionSteven Whitehouse2010-09-231-1/+0
| | | | | | | | | | | | | | | This option defaulted to on for lock_nolock mounts and off otherwise. The only function was to avoid the revalidation of dentries. In the cluster case, that is entirely pointless and liable to cause coherency problems. The patch changes the revalidation to depend upon whether the fs is a local or cluster fs (i.e. it follows the existing default behaviour). I very much doubt anybody ever used this option as there is no reason to. Even so we will continue to accept it on the mount command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove ignore_local_fs mount argumentSteven Whitehouse2010-09-231-1/+0
| | | | | | | | This is been a no-op for a very long time now. I'm pretty sure nobody uses it, but just in case we'll still accept it on the command line, but ignore it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Don't enforce min hold time when two demotes occur in rapid successionSteven Whitehouse2010-09-201-0/+1
| | | | | | | | | | | | | | | | | | | | | Due to the design of the VFS, it is quite usual for operations on GFS2 to consist of a lookup (requiring a shared lock) followed by an operation requiring an exclusive lock. If a remote node has cached an exclusive lock, then it will receive two demote events in rapid succession firstly for a shared lock and then to unlocked. The existing min hold time code was triggering in this case, even if the node was otherwise idle since the state change time was being updated by the initial demote. This patch introduces logic to skip the min hold timer in the case that a "double demote" of this kind has occurred. The min hold timer will still be used in all other cases. A new glock flag is introduced which is used to keep track of whether there have been any newly queued holders since the last glock state change. The min hold time is only applied if the flag is set. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Abhijith Das <adas@redhat.com>
* GFS2: fallocate supportBenjamin Marzinski2010-09-201-0/+1
| | | | | | | | | | | | | | This patch adds support for fallocate to gfs2. Since the gfs2 does not support uninitialized data blocks, it must write out zeros to all the blocks. However, since it does not need to lock any pages to read from, gfs2 can write out the zero blocks much more efficiently. On a moderately full filesystem, fallocate works around 5 times faster on average. The fallocate call also allows gfs2 to add blocks to the file without changing the filesize, which will make it possible for gfs2 to preallocate space for the rindex file, so that gfs2 can grow a completely full filesystem. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove i_disksizeSteven Whitehouse2010-09-201-1/+0
| | | | | | | | | With the update of the truncate code, ip->i_disksize and inode->i_size are merely copies of each other. This means we can remove ip->i_disksize and use inode->i_size exclusively reducing the size of a GFS2 inode by 8 bytes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2010-08-071-2/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_*cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
| * gfs2: use workqueue instead of slow-workTejun Heo2010-07-231-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Workqueue can now handle high concurrency. Convert gfs to use workqueue instead of slow-work. * Steven pointed out that recovery path might be run from allocation path and thus requires forward progress guarantee without memory allocation. Create and use gfs_recovery_wq with rescuer. Please note that forward progress wasn't guaranteed with slow-work. * Updated to use non-reentrant workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
* | GFS2: Wait for journal id on mount if not specified on mount command lineSteven Whitehouse2010-07-291-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements a wait for the journal id in the case that it has not been specified on the command line. This is to allow the future removal of the mount.gfs2 helper. The journal id would instead be directly communicated by gfs_controld to the file system. Here is a comparison of the two systems: Current: 1. mount calls mount.gfs2 2. mount.gfs2 connects to gfs_controld to retrieve the journal id 3. mount.gfs2 adds the journal id to the mount command line and calls the mount system call 4. gfs_controld receives the status of the mount request via a uevent Proposed: 1. mount calls the mount system call (no mount.gfs2 helper) 2. gfs_controld receives a uevent for a gfs2 fs which it doesn't know about already 3. gfs_controld assigns a journal id to it via sysfs 4. the mount system call then completes as normal (sending a uevent according to status) The advantage of the proposed system is that it is completely backward compatible with the current system both at the kernel and at the userland levels. The "first" parameter can also be set the same way, with the restriction that it must be set before the journal id is assigned. In addition, if mount becomes stuck waiting for a reply from gfs_controld which never arrives, then it is killable and will abort the mount gracefully. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add some useful messagesSteven Whitehouse2010-05-061-0/+1
| | | | | | | | | | | | | The following patch adds a message to indicate when barriers have been disabled due to a block device which doesn't support them. You could already tell this via the mount options in /proc/mounts, but all the other filesystems also log a message at the same time. Also, the same mechanisms are used to indicate when the lock demote interface has been used (only ever used for debugging) which is a request from our support team. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Various gfs2_logd improvementsBenjamin Marzinski2010-05-051-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Allow the number of committed revokes to temporarily be negativeBenjamin Marzinski2010-03-111-1/+1
| | | | | | | | | | | | | | | | | | GFS2 tracks the number of revokes and unrevokes that are part of committed transactions via sd_log_commited_revoke. It is possible for one process to add revokes during its transaction, while another process unrevokes them during its transaction. If the second process finishes its transaction first, sd_log_commited_revoke will be decremented by the number of unrevokes that the second process did, without first being incremented by the number of revokes the first process did. This is fine, since all started transactions must be completed before the journal can be flushed. However, sd_log_commited_revoke is an unsigned integer, and log_refund() causes an assertion failure if it would go negative at the end of a transaction. This patch makes sd_log_commited_revoke a signed integer and allows it to go negative. __gfs2_log_flush() still checks that it mataches the actual number of revokes. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove loopy umount codeSteven Whitehouse2010-03-011-1/+0
| | | | | | | | | | | | | As a consequence of the previous patch, we can now remove the loop which used to be required due to the circular dependency between the inodes and glocks. Instead we can just invalidate the inodes, and then clear up any glocks which are left. Also we no longer need the rwsem since there is no longer any danger of the inode invalidation calling back into the glock code (and from there back into the inode code). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Metadata address space clean upSteven Whitehouse2010-03-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the start of GFS2, an "extra" inode has been used to store the metadata belonging to each inode. The only reason for using this inode was to have an extra address space, the other fields were unused. This means that the memory usage was rather inefficient. The reason for keeping each inode's metadata in a separate address space is that when glocks are requested on remote nodes, we need to be able to efficiently locate the data and metadata which relating to that glock (inode) in order to sync or sync and invalidate it (depending on the remotely requested lock mode). This patch adds a new type of glock, which has in addition to its normal fields, has an address space. This applies to all inode and rgrp glocks (but to no other glock types which remain as before). As a result, we no longer need to have the second inode. This results in three major improvements: 1. A saving of approx 25% of memory used in caching inodes 2. A removal of the circular dependency between inodes and glocks 3. No confusion between "normal" and "metadata" inodes in super.c Although the first of these is the more immediately apparent, the second is just as important as it now enables a number of clean ups at umount time. Those will be the subject of future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Wait for unlock completion on umountSteven Whitehouse2010-02-031-0/+2
| | | | | | | | | | This patch adds a wait on umount between the point at which we dispose of all glocks and the point at which we unmount the lock protocol. This ensures that we've received all the replies to our unlock requests before we stop the locking. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com>
* GFS2: add barrier/nobarrier mount optionsChristoph Hellwig2009-12-031-0/+1
| | | | | | | | | | | Currently gfs2 issues barrier unconditionally. There are various reasons to disable them, be that just for testing or for stupid devices flushing large battert backed caches. Add a nobarrier option that matches xfs and btrfs for this. Also add a symmetric barrier option to turn it back on at remount time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Improve statfs and quota usabilityBenjamin Marzinski2009-12-031-0/+4
| | | | | | | | | | | | | | GFS2 now has three new mount options, statfs_quantum, quota_quantum and statfs_percent. statfs_quantum and quota_quantum simply allow you to set the tunables of the same name. Setting setting statfs_quantum to 0 will also turn on the statfs_slow tunable. statfs_percent accepts an integer between 0 and 100. Numbers between 1 and 100 will cause GFS2 to do any early sync when the local number of blocks free changes by at least statfs_percent from the totoal number of blocks free. Setting statfs_percent to 0 disables this. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove unused sysfs fileSteven Whitehouse2009-09-091-1/+0
| | | | | | | | The /sys/fs/gfs2/<fsname>/lock_module/id file has been unused for some time now, so we can remove it. We still accept the mount option though, as userspace still sends that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove no_formal_ino generating codeSteven Whitehouse2009-08-271-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inum structure used throughout GFS2 has two fields. One no_addr is the disk block number of the inode in question and is used everywhere as the inode number. The other, no_formal_ino, is used only as the generation number for NFS. Historically the no_formal_ino field was set using a complicated system of one global and one per-node file containing inode numbers in order to ensure that each no_formal_ino was unique. Also this code made no provision for what would happen when eventually the (64 bit) numbers ran out. Now I know that is pretty unlikely to happen given the large space of numbers, but it is possible nevertheless. The only guarantee required for no_formal_ino is that, for any single inode, the same number doesn't get reused too quickly. We already have a generation number which is kept in the inode and initialised from a counter in the resource group (almost no overhead, since we have to touch the resource group anyway in order to allocate an inode in the first place). Aside from ensuring that we never use the value 0 in the no_formal_ino field, we can use that counter directly. As a result of that change, we lose about 200 lines of code and also gain about 10 creates/sec on the postmark benchmark (on my test machine). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add "-o errors=panic|withdraw" mount optionsBob Peterson2009-08-241-0/+7
| | | | | | | | | | | | | | This patch adds "-o errors=panic" and "-o errors=withdraw" to the gfs2 mount options. The "errors=withdraw" option is today's current behaviour, meaning to withdraw from the file system if a non-serious gfs2 error occurs. The new "errors=panic" option tells gfs2 to force a kernel panic if a non-serious gfs2 file system error occurs. This may be useful, for example, where fabric-level fencing is used that has no way to reboot (such as fence_scsi). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: remove dcache entries for remote deleted inodesBenjamin Marzinski2009-07-301-0/+2
| | | | | | | | | | | | | When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Be more aggressive in reclaiming unlinked inodesSteven Whitehouse2009-05-211-1/+1
| | | | | | | | | | | This patch increases the frequency with which gfs2 looks for unlinked, but still allocated inodes. Its the equivalent operation to ext3's orphan list, but done with bitmaps in the resource groups. This also fixes a bug where a field in the rgrp was too small. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add a rgrp bitmap full flagSteven Whitehouse2009-05-211-0/+3
| | | | | | | | | | | | | | | | | During block allocation, it is useful to know if sections of disk are full on a finer grained basis than a single resource group. This can make a performance difference when resource groups have larger numbers of bitmap blocks, since we no longer have to search them all block by block in each individual bitmap. The full flag is set on a per-bitmap basis when it has been searched and found to have no free space. It is then skipped in subsequent searches until the flag is reset. The resetting occurs if we have to drop the glock on the resource group for any reason, or if we deallocate some blocks within that resource group and thus free up some space. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Improve resource group error handlingSteven Whitehouse2009-05-201-3/+4
| | | | | | | | | | | | | | | This patch improves the error handling in the case where we discover that the summary information in the resource group doesn't match the bitmap information while in the process of allocating blocks. Originally this resulted in a kernel bug, but this patch changes that so that we return -EIO and print some messages explaining what went wrong, and how to fix it. We also remember locally not to try and allocate from the same rgrp again, so that a subsequent allocation in a different rgrp should succeed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Umount recovery race fixSteven Whitehouse2009-05-191-9/+5
| | | | | | | | | | | | | | | | | | | This patch fixes a race condition where we can receive recovery requests part way through processing a umount. This was causing problems since the recovery thread had already gone away. Looking in more detail at the recovery code, it was really trying to implement a slight variation on a work queue, and that happens to align nicely with the recently introduced slow-work subsystem. As a result I've updated the code to use slow-work, rather than its own home grown variety of work queue. When using the wait_on_bit() function, I noticed that the wait function that was supplied as an argument was appearing in the WCHAN field, so I've updated the function names in order to produce more meaningful output. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Add commit= mount optionSteven Whitehouse2009-05-131-0/+1
| | | | | | | | | | | | | | | It has always been possible to adjust the gfs2 log commit interval, but only from the sysfs interface. This adds a mount option, commit=<nn>, which will be familar to ext3 users. The sysfs interface continues to be available as well, although this might be removed in the future. Also this patch cleans up some duplicated structures in the GFS2 sysfs code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix locking bug in failed shared to exclusive conversionBenjamin Marzinski2009-03-241-0/+1
| | | | | | | | | | | | | | | | After calling out to the dlm, GFS2 sets the new state of a glock to gl_target in gdlm_ast(). However, gl_target is not always the lock state that was requested. If a conversion from shared to exclusive fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead of gl->gl_target, so that it can reacquire the lock in exlusive the next time around. In this case, setting the lock to gl_target in gdlm_ast() will make GFS2 think that it has the glock in exclusive mode, when really, it doesn't have the glock locked at all. This patch adds a new field to the gfs2_glock structure, gl_req, to track the mode that was requested. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Expose UUID via sysfs/ueventSteven Whitehouse2009-03-241-0/+1
| | | | | | | | | | | | | Since we have a UUID, we ought to expose it to the user via sysfs and uevents. We already have the fs name in both of these places (a combination of the lock proto and lock table name) so if we add the UUID as well, we have a full set. For older filesystems (i.e. those created before mkfs.gfs2 was writing UUIDs by default) the sysfs file will appear zero length, and no UUID env var will be added to the uevents. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Support generation of discard requestsSteven Whitehouse2009-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | This patch allows GFS2 to generate discard requests for blocks which are no longer useful to the filesystem (i.e. those which have been freed as the result of an unlink operation). The requests are generated at the time which those blocks become available for reuse in the filesystem. In order to use this new feature, you have to specify the "discard" mount option. The code coalesces adjacent blocks into a single extent when generating the discard requests, thus generating the minimum number. If an error occurs when the request has been sent to the block device, then it will print a message and turn off the requests for that filesystem. If the problem is temporary, then you can use remount to turn the option back on again. There is also a nodiscard mount option so that you can use remount to turn discard requests off, if required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove unused field from glockSteven Whitehouse2009-03-241-1/+0
| | | | | | | | The time stamp field is unused in the glock now that we are using a shrinker, so that we can remove it and save sizeof(unsigned long) bytes in each glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Merge lock_dlm module into GFS2Steven Whitehouse2009-03-241-5/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Remove "double" locking in quotaSteven Whitehouse2009-03-241-1/+0
| | | | | | | | We only really need a single spin lock for the quota data, so lets just use the lru lock for now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
* GFS2: change gfs2_quota_scan into a shrinkerAbhijith Das2009-03-241-3/+3
| | | | | | | | Deallocation of gfs2_quota_data objects now happens on-demand through a shrinker instead of routinely deallocating through the quotad daemon. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Kill two daemons with one patchSteven Whitehouse2009-01-051-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Fix "truncate in progress" hangSteven Whitehouse2009-01-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Following on from the recent clean up of gfs2_quotad, this patch moves the processing of "truncate in progress" inodes from the glock workqueue into gfs2_quotad. This fixes a hang due to the "truncate in progress" processing requiring glocks in order to complete. It might seem odd to use gfs2_quotad for this particular item, but we have to use a pre-existing thread since creating a thread implies a GFP_KERNEL memory allocation which is not allowed from the glock workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd may deadlock if used for this operation. gfs2_scand and gfs2_glockd are both scheduled for removal at some (hopefully not too distant) future point. That leaves only gfs2_quotad whose workload is generally fairly light and is easily adapted for this extra task. Also, as a result of this change, it opens the way for a future patch to make the reading of the inode's information asynchronous with respect to the glock workqueue, which is another improvement that has been on the list for some time now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Clean up & move gfs2_quotadSteven Whitehouse2009-01-051-3/+1
| | | | | | | | | | | | | | | | | | | | | | This patch is a clean up of gfs2_quotad prior to giving it an extra job to do in addition to the current portfolio of updating the quota and statfs information from time to time. As a result it has been moved into quota.c allowing one of the functions it calls to be made static. Also the clean up allows the two existing functions to have separate timeouts and also to coexist with its future role of dealing with the "truncate in progress" inode flag. The (pointless) setting of gfs2_quotad_secs is removed since we arrange to only wake up quotad when one of the two timers expires. In addition the struct gfs2_quota_data is moved into a slab cache, mainly for easier debugging. It should also be possible to use a shrinker in the future, rather than the current scheme of scanning the quota data entries from time to time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* GFS2: Banish struct gfs2_rgrpd_hostSteven Whitehouse2009-01-051-8/+4
| | | | | | | | | This patch moves the final field so that we can get rid of struct gfs2_rgrpd_host, as promised some time ago. Also by rearranging the fields slightly, we are able to reduce the size of the gfs2_rgrpd structure at the same time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>