summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'xfs-writepage-rework-4.6' into for-nextDave Chinner2016-03-072-467/+270
|\
| * xfs: ioends require logically contiguous file offsetsDarrick J. Wong2016-03-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to create a new ioend if the current writepage call isn't logically contiguous with the range contained in the previous ioend. Hopefully writepage gets called in order of increasing file offset. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * xfs: don't chain ioends during writepage submissionDave Chinner2016-02-152-124/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we can build a long ioend chain during ->writepages that gets attached to the writepage context. IO submission only then occurs when we finish all the writepage processing. This means we can have many ioends allocated and pending, and this violates the mempool guarantees that we need to give about forwards progress. i.e. we really should only have one ioend being built at a time, otherwise we may drain the mempool trying to allocate a new ioend and that blocks submission, completion and freeing of ioends that are already in progress. To prevent this situation from happening, we need to submit ioends for IO as soon as they are ready for dispatch rather than queuing them for later submission. This means the ioends have bios built immediately and they get queued on any plug that is current active. Hence if we schedule away from writeback, the ioends that have been built will make forwards progress due to the plug flushing on context switch. This will also prevent context switches from creating unnecessary IO submission latency. We can't completely avoid having nested IO allocation - when we have a block size smaller than a page size, we still need to hold the ioend submission until after we have marked the current page dirty. Hence we may need multiple ioends to be held while the current page is completely mapped and made ready for IO dispatch. We cannot avoid this problem - the current code already has this ioend chaining within a page so we can mostly ignore that it occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * xfs: factor mapping out of xfs_do_writepageDave Chinner2016-02-151-110/+122
| | | | | | | | | | | | | | | | | | | | | | Separate out the bufferhead based mapping from the writepage code so that we have a clear separation of the page operations and the bufferhead state. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * xfs: xfs_cluster_write is redundantDave Chinner2016-02-151-208/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_cluster_write() is not necessary now that xfs_vm_writepages() aggregates writepage calls across a single mapping. This means we no longer need to do page lookups in xfs_cluster_write, so writeback only needs to look up th epage cache once per page being written. This also removes a large amount of mostly duplicate code between xfs_do_writepage() and xfs_convert_page(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * xfs: Introduce writeback context for writepagesDave Chinner2016-02-152-105/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_vm_writepages() calls generic_writepages to writeback a range of a file, but then xfs_vm_writepage() clusters pages itself as it does not have any context it can pass between->writepage calls from __write_cache_pages(). Introduce a writeback context for xfs_vm_writepages() and call __write_cache_pages directly with our own writepage callback so that we can pass that context to each writepage invocation. This encapsulates the current mapping, whether it is valid or not, the current ioend and it's IO type and the ioend chain being built. This requires us to move the ioend submission up to the level where the writepage context is declared. This does mean we do not submit IO until we packaged the entire writeback range, but with the block plugging in the writepages call this is the way IO is submitted, anyway. It also means that we need to handle discontiguous page ranges. If the pages sent down by write_cache_pages to the writepage callback are discontiguous, we need to detect this and put each discontiguous page range into individual ioends. This is needed to ensure that the ioend accurately represents the range of the file that it covers so that file size updates during IO completion set the size correctly. Failure to take into account the discontiguous ranges results in files being too small when writeback patterns are non-sequential. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * xfs: remove xfs_cancel_ioendDave Chinner2016-02-151-47/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently have code to cancel ioends being built because we change bufferhead state as we build the ioend. On error, this needs to be unwound and so we have cancelling code that walks the buffers on the ioend chain and undoes these state changes. However, the IO submission path already handles state changes for buffers when a submission error occurs, so we don't really need a separate cancel function to do this - we can simply submit the ioend chain with the specific error and it will be cancelled rather than submitted. Hence we can remove the explicit cancel code and just rely on submission to deal with the error correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * xfs: remove nonblocking mode from xfs_vm_writepageDave Chinner2016-02-151-17/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the nonblocking optimisation done for mapping lookups during writeback. It's not clear that leaving a hole in the writeback range just because we couldn't get a lock is really a win, as it makes us do another small random IO later on rather than a large sequential IO now. As this gets in the way of sane error handling later on, just remove for the moment and we can re-introduce an equivalent optimisation in future if we see problems due to extent map lock contention. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | Merge branch 'xfs-buf-macro-cleanup-4.6' into for-nextDave Chinner2016-03-078-56/+26
|\ \
| * | xfs: remove XFS_BUF_ZEROFLAGS macroDave Chinner2016-02-103-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The places where we use this macro already clear unnecessary IO flags (e.g. through xfs_bwrite()) or never have unexpected IO flags set on them in the first place (e.g. iclog buffers). Remove the macro from these locations, and where necessary clear only the specific flags that are conditional in the current buffer context. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove XBF_STALE flag wrapper macrosDave Chinner2016-02-103-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove XBF_WRITE flag wrapper macrosDave Chinner2016-02-103-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove XBF_READ flag wrapper macrosDave Chinner2016-02-102-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove XBF_ASYNC flag wrapper macrosDave Chinner2016-02-104-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove XBF_DONE flag wrapper macrosDave Chinner2016-02-108-15/+11
| |/ | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | Merge branch 'xfs-gut-icdinode-4.6' into for-nextDave Chinner2016-03-0723-308/+429
|\ \
| * | xfs: mode di_mode to vfs inodeDave Chinner2016-02-0918-73/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: move di_changecount to VFS inodeDave Chinner2016-02-096-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can store the di_changecount in the i_version field of the VFS inode and remove another 8 bytes from the xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: move inode generation count to VFS inodeDave Chinner2016-02-099-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull another 4 bytes out of the xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: use vfs inode nlink field everywhereDave Chinner2016-02-099-54/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VFS tracks the inode nlink just like the xfs_icdinode. We can remove the variable from the icdinode and use the VFS inode variable everywhere, reducing the size of the xfs_icdinode by a further 4 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: reinitialise recycled VFS inode correctlyDave Chinner2016-02-091-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are going to keep certain on-disk information in the VFS inode rather than in a separate XFS specific stucture, so we have to be careful of the VFS code clearing that information when we re-initialise reclaimable cached inodes during lookup. If we don't do this, then we lose critical information from the inode and that results in corruption being detected. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: move v1 inode conversion to xfs_inode_from_diskDave Chinner2016-02-095-27/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we don't have to carry an di_onlink variable around anymore, move the inode conversion from v1 inode format to v2 inode format into xfs_inode_from_disk(). This means we can remove the di_onlink fields from the struct xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: cull unnecessary icdinode fieldsDave Chinner2016-02-094-71/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the struct xfs_icdinode is not directly related to the on-disk format, we can cull things in it we really don't need to store: - magic number never changes - padding is not necessary - next_unlinked is never used - inode number is redundant - uuid is redundant - lsn is accessed directly from dinode - inode CRC is only accessed directly from dinode Hence we can remove these from the struct xfs_icdinode and redirect the code that uses them to the xfs_dinode appripriately. This reduces the size of the struct icdinode from 152 bytes to 88 bytes, and removes a fair chunk of unnecessary code, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove timestamps from incore inodeDave Chinner2016-02-0911-143/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct xfs_inode has two copies of the current timestamps in it, one in the vfs inode and one in the struct xfs_icdinode. Now that we no longer log the struct xfs_icdinode directly, we don't need to keep the timestamps in this structure. instead we can copy them straight out of the VFS inode when formatting the inode log item or the on-disk inode. This reduces the struct xfs_inode in size by 24 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: introduce inode log format objectDave Chinner2016-02-098-44/+218
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently carry around and log an entire inode core in the struct xfs_inode. A lot of the information in the inode core is duplicated in the VFS inode, but we cannot remove this duplication of infomration because the inode core is logged directly in xfs_inode_item_format(). Add a new function xfs_inode_item_format_core() that copies the inode core data into a struct xfs_icdinode that is pulled directly from the log vector buffer. This means we no longer directly copy the inode core, but copy the structures one member at a time. This will be slightly less efficient than copying, but will allow us to remove duplicate and unnecessary items from the struct xfs_inode. To enable us to do this, call the new structure a xfs_log_dinode, so that we know it's different to the physical xfs_dinode and the in-core xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | Merge branch 'xfs-misc-fixes-4.6' into for-nextDave Chinner2016-03-0712-49/+62
|\ \
| * | xfs: fix xfs_log_ticket leak in xfs_end_io() after fs shutdownBrian Foster2016-02-081-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the filesystem has shut down, xfs_end_io() currently sets an error on the ioend and proceeds to ioend destruction. The ioend might contain a truncate transaction if the I/O extended the size of the file. This transaction is only cleaned up in xfs_setfilesize_ioend(), however, which is skipped in this case. This results in an xfs_log_ticket leak message when the associate cache slab is destroyed (e.g., on rmmod). This was originally reproduced by xfs/141 on a distro kernel. The problem is reproducible on an upstream kernel, but not easily detected in current upstream if the xfs_log_ticket cache happens to be merged with another cache. This can be reproduced more deterministically with the 'slab_nomerge' kernel boot option. Update xfs_end_io() to proceed with normal end I/O processing after an error is set on an ioend due to fs shutdown. The I/O type-based processing is already designed to handle an I/O error and ensure that the ioend is cleaned up correctly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: clean up unwritten buffers on write failureBrian Foster2016-02-081-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xfs_vm_write_failed() handler is currently responsible for cleaning up any delalloc blocks over the range of a failed write beyond EOF. Failure to do so results in warning messages and other inconsistencies between buffer and extent state. The ->releasepage() handler currently warns in the event of a page being released with either unwritten or delalloc buffers, as neither is ever expected by the time a page is released. As has been reproduced by generic/083 on a -bsize=1k fs, it is currently possible to trigger the ->releasepage() warning for a page with unwritten buffers when a filesystem is near ENOSPC. This is reproduced by the following sequence: $ mkfs.xfs -f -b size=1k -d size=100m <dev> $ mount <dev> /mnt/ $ $ xfs_io -fc "falloc -k 0 1k" /mnt/file $ dd if=/dev/zero of=/mnt/enospc conv=notrunc oflag=append $ $ xfs_io -c "pwrite 512 1k" /mnt/file $ xfs_io -d -c "pwrite 16k 1k" /mnt/file The first pwrite command attempts a block unaligned write across an unwritten block and a hole. The delalloc for the hole fails with ENOSPC and the subsequent error handling does not clean up the unwritten buffer that was instantiated during the first ->get_block() call. The second pwrite triggers a warning as part of the inode mapping invalidation that occurs prior to direct I/O. The releasepage() handler detects the unwritten buffer at this time, warns and prevents the release of the page. To deal with this problem, update xfs_vm_write_failed() to clean up unwritten as well as delalloc buffers that are beyond EOF and within the range of the failed write. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: move struct xfs_attr_shortform to xfs_da_format.hDarrick J. Wong2016-02-083-16/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the shortform attr structure definition to the same place as the other attribute structure definitions for consistency and also so that xfs/122 verifies the structure size. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: Make xfsaild freezeable againMichal Hocko2016-02-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hendik has reported suspend failures due to xfsaild blocking the freezer to settle down. Jan 17 19:59:56 linux-6380 kernel: PM: Syncing filesystems ... done. Jan 17 19:59:56 linux-6380 kernel: PM: Preparing system for sleep (mem) Jan 17 19:59:56 linux-6380 kernel: Freezing user space processes ... (elapsed 0.001 seconds) done. Jan 17 19:59:56 linux-6380 kernel: Freezing remaining freezable tasks ... Jan 17 19:59:56 linux-6380 kernel: Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0): Jan 17 19:59:56 linux-6380 kernel: xfsaild/dm-5 S 00000000 0 1293 2 0x00000080 Jan 17 19:59:56 linux-6380 kernel: f0ef5f00 00000046 00000200 00000000 ffff9022 c02d3800 00000000 00000032 Jan 17 19:59:56 linux-6380 kernel: ee0b2400 00000032 f71e0d00 f36fabc0 f0ef2d00 f0ef6000 f0ef2d00 f12f90c0 Jan 17 19:59:56 linux-6380 kernel: f0ef5f0c c0844e44 00000000 f0ef5f6c f811e0be 00000000 00000000 f0ef2d00 Jan 17 19:59:56 linux-6380 kernel: Call Trace: Jan 17 19:59:56 linux-6380 kernel: [<c0844e44>] schedule+0x34/0x90 Jan 17 19:59:56 linux-6380 kernel: [<f811e0be>] xfsaild+0x5de/0x600 [xfs] Jan 17 19:59:56 linux-6380 kernel: [<c0286cbb>] kthread+0x9b/0xb0 Jan 17 19:59:56 linux-6380 kernel: [<c0848a79>] ret_from_kernel_thread+0x21/0x38 The issue has been there for quite some time but it has been made visible by only by 24ba16bb3d49 ("xfs: clear PF_NOFREEZE for xfsaild kthread") because the suspend started seeing xfsaild. The above commit has missed that the !xfs_ail_min branch might call schedule with TASK_INTERRUPTIBLE without calling try_to_freeze so the pm suspend would wake up the kernel thread over and over again without any progress. What we want here is to use freezable_schedule instead to hide the thread from the suspend. While we are here also change schedule_timeout to freezable variant to prevent from spurious wakeups by suspend. [dchinner: re-add set_freezeable call so the freezer will account properly for this kthread. ] Reported-by: Hendrik Woltersdorf <hendrikw@arcor.de> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: remove unused function definitionsEric Sandeen2016-02-083-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Old leftovers. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: move buffer invalidation to xfs_btree_free_blockChristoph Hellwig2016-02-084-14/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... instead of leaving it in the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: factor btree block freeing into a helperChristoph Hellwig2016-02-081-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: handle errors from ->free_blocks in xfs_btree_kill_irootChristoph Hellwig2016-02-081-3/+6
| |/ | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | Merge branch 'xfs-dio-fix-4.6' into for-nextDave Chinner2016-03-079-208/+143
|\ \
| * | ext4: Fix data exposure after failed AIO DIOJan Kara2016-02-293-32/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When AIO DIO fails e.g. due to IO error, we must not convert unwritten extents as that will expose uninitialized data. Handle this case by clearing unwritten flag from io_end in case of error and thus preventing extent conversion. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: fold xfs_vm_do_dio into xfs_vm_direct_IOChristoph Hellwig2016-02-081-22/+14
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: don't use ioends for direct write completionsChristoph Hellwig2016-02-082-150/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need to communicate two bits of information to the direct I/O completion handler: (1) do we need to convert any unwritten extents in the range (2) do we need to check if we need to update the inode size based on the range passed to the completion handler We can use the private data passed to the get_block handler and the completion handler as a simple bitmask to communicate this information instead of the current complicated infrastructure reusing the ioends from the buffer I/O path, and thus avoiding a memory allocation and a context switch for any non-trivial direct write. As a nice side effect we also decouple the direct I/O path implementation from that of the buffered I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
| * | direct-io: always call ->end_io if non-NULLChristoph Hellwig2016-02-086-14/+35
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | This way we can pass back errors to the file system, and allow for cleanup required for all direct I/O invocations. Also allow the ->end_io handlers to return errors on their own, so that I/O completion errors can be passed on to the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | Merge branch 'xfs-get-next-dquot-4.6' into for-nextDave Chinner2016-03-0713-86/+394
|\ \
| * | xfs: Split default quota limits by quota typeCarlos Maiolino2016-02-085-42/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Default quotas are globally set due historical reasons. IRIX only supported user and project quotas, and default quota was only applied to user quotas. In Linux, when a default quota is set, all different quota types inherits the same default value. An user with a quota limit larger than the default quota value, will still be limited to the default value because the group quotas also inherits the default quotas. Unless the group which the user belongs to have a custom quota limit set. This patch aims to split the default quota value by quota type. Allowing each quota type having different default values. Default time limits are still set globally. XFS does not set a per-user/group timer, but a single global timer. For changing this behavior, some changes should be made in user-space tools another bugs being fixed. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: wire up Q_XGETNEXTQUOTA / get_nextdqblkEric Sandeen2016-02-085-9/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add code to allow the Q_XGETNEXTQUOTA quotactl to quickly find all active quotas by examining the quota inode, and skipping over unallocated or uninitialized regions. Userspace can then use this interface rather than i.e. a getpwent() loop when asked to report all active quotas. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: Factor xfs_seek_hole_data into helperEric Sandeen2016-02-082-25/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor xfs_seek_hole_data into an unlocked helper which takes an xfs inode rather than a file for internal use. Also allow specification of "end" - the vfs lseek interface is defined such that any offset past eof/i_size shall return -ENXIO, but we will use this for quota code which does not maintain i_size, and we want to be able to SEEK_DATA past i_size as well. So the lseek path can send in i_size, and the quota code can determine its own ending offset. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: get quota inode from mp & flags rather than dqpEric Sandeen2016-02-082-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow us to get the appropriate quota inode from any mp & quota flags, not necessarily associated with a particular dqp. Needed for when we are searching for the next active ID with quotas and we want to examine the quota inode. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | xfs: don't overflow quota ID when initializing dqblkEric Sandeen2016-02-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quota IDs are unsigned, and so we can pass in values up to 2^32-1. But if we try to initialize a block containing values over MAX_INT, curid will overflow and assert. curid holds a quota ID, so give it the proper xfs_dqid_t type (and remove the now-impossible ASSERT). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | quota: add new quotactl Q_GETNEXTQUOTAEric Sandeen2016-02-082-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Q_GETNEXTQUOTA is exactly like Q_GETQUOTA, except that it will return quota information for the id equal to or greater than the id requested. In other words, if the requested id has no quota, the command will return quota information for the next higher id which does have a quota set. If no higher id has an active quota, -ESRCH is returned. This allows filesystems to do efficient iteration in kernelspace, much like extN filesystems do in userspace when asked to report all active quotas. This does require a new data structure for userspace, as the current structure does not include an ID for the returned quota information. Today, Ext4 with a hidden quota inode requires getpwent-style iterations, and for systems which have i.e. LDAP backends, this can be very slow, or even impossible if iteration is not allowed in the configuration. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | quota: add new quotactl Q_XGETNEXTQUOTAEric Sandeen2016-02-083-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Q_XGETNEXTQUOTA is exactly like Q_XGETQUOTA, except that it will return quota information for the id equal to or greater than the id requested. In other words, if the requested id has no quota, the command will return quota information for the next higher id which does have a quota set. If no higher id has an active quota, -ESRCH is returned. This allows filesystems to do efficient iteration in kernelspace, much like extN filesystems do in userspace when asked to report all active quotas. The patch adds a d_id field to struct qc_dqblk so that we can pass back the id of the quota which was found, and return it to userspace. Today, filesystems such as XFS require getpwent-style iterations, and for systems which have i.e. LDAP backends, this can be very slow, or even impossible if iteration is not allowed in the configuration. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | quota: remove unused cmd argument from quota_quotaon()Eric Sandeen2016-02-081-2/+2
| |/ | | | | | | | | | | | | | | | | | | The cmd argument to quota_quotaon() via Q_QUOTAON quotactl is not used, so remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | Merge branch 'xfs-rt-fixes-4.6' into for-nextDave Chinner2016-03-075-2/+42
|\ \
| * | xfs: RT bitmap and summary buffers need verifiersDave Chinner2016-02-093-3/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Buffers without verifiers issue runtime warnings on XFS. We don't have anything we can actually verify in the RT buffers (no CRCs, not magic numbers, etc), but we still need verifiers to avoid the warnings. Add a set of dummy verifier operations for the realtime buffers and apply them in the appropriate places. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>