From 0d085a529b427d97710e6a41f8a4f23e1757cd12 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Tue, 23 Sep 2014 15:36:27 +1000 Subject: xfs: ensure WB_SYNC_ALL writeback handles partial pages correctly XFS has been having trouble with stray delayed allocation extents beyond EOF for a long time. Recent changes to the collapse range code has triggered erroneous EBUSY errors on page invalidtion for block size smaller than page size filesystems. These have been caused by dirty buffers beyond EOF on a partial page which do not get written to disk during a sync. The issue is that write-ahead in xfs_cluster_write() finds such a partial page and handles it by leaving the page dirty but pushing it into a writeback state. This used to work just fine, as the write_cache_pages() code would then find the dirty partial page in the next mapping tree lookup as the dirty tag is still set. Unfortunately, when we moved to a mark and sweep approach to writeback to fix other writeback sync issues, we broken this. THe act of marking the page as under writeback now clears the TOWRITE tag in the radix tree, even though the page is still dirty. This causes the TOWRITE tag to be cleared, and hence the next lookup on the mapping tree does not find the dirty partial page and so doesn't try to write it again. This same writeback bug was found recently in ext4 and fixed in commit 1c8349a ("ext4: fix data integrity sync in ordered mode") without communication to the wider filesystem community. We can use exactly the same fix here so the TOWRITE flag is not cleared on partial page writes. cc: stable@vger.kernel.org # dependent on 1c8349a17137b93f0a83f276c764a6df1b9a116e Root-cause-found-by: Brian Foster Signed-off-by: Dave Chinner Reviewed-by: Brian Foster Signed-off-by: Dave Chinner --- fs/xfs/xfs_aops.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index b984647c24db..2f502537a39c 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -434,10 +434,22 @@ xfs_start_page_writeback( { ASSERT(PageLocked(page)); ASSERT(!PageWriteback(page)); - if (clear_dirty) + + /* + * if the page was not fully cleaned, we need to ensure that the higher + * layers come back to it correctly. That means we need to keep the page + * dirty, and for WB_SYNC_ALL writeback we need to ensure the + * PAGECACHE_TAG_TOWRITE index mark is not removed so another attempt to + * write this page in this writeback sweep will be made. + */ + if (clear_dirty) { clear_page_dirty_for_io(page); - set_page_writeback(page); + set_page_writeback(page); + } else + set_page_writeback_keepwrite(page); + unlock_page(page); + /* If no buffers on the page are to be written, finish it here */ if (!buffers) end_page_writeback(page); -- cgit v1.2.3 From 2c845f5a5f238f42376b6551a7f7716952c8f509 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Tue, 23 Sep 2014 15:37:09 +1000 Subject: xfs: track collapse via file offset rather than extent index The collapse range implementation uses a transaction per extent shift. The progress of the overall operation is tracked via the current extent index of the in-core extent list. This is racy because the ilock must be dropped and reacquired for each transaction according to locking and log reservation rules. Therefore, writeback to prior regions of the file is possible and can change the extent count. This changes the extent to which the current index refers and causes the collapse to fail mid operation. To avoid this problem, the entire file is currently written back before the collapse operation starts. To eliminate the need to flush the entire file, use the file offset (fsb) to track the progress of the overall extent shift operation rather than the extent index. Modify xfs_bmap_shift_extents() to unconditionally convert the start_fsb parameter to an extent index and return the file offset of the extent where the shift left off, if further extents exist. The bulk of ths function can remain based on extent index as ilock is held by the caller. xfs_collapse_file_space() now uses the fsb output as the starting point for the subsequent shift. Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_bmap.c | 85 +++++++++++++++++++++++++----------------------- fs/xfs/libxfs/xfs_bmap.h | 7 ++-- fs/xfs/xfs_bmap_util.c | 12 +++---- 3 files changed, 53 insertions(+), 51 deletions(-) (limited to 'fs') diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 86df952d3e24..4b3f1b92cddd 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5406,20 +5406,21 @@ error0: /* * Shift extent records to the left to cover a hole. * - * The maximum number of extents to be shifted in a single operation - * is @num_exts, and @current_ext keeps track of the current extent - * index we have shifted. @offset_shift_fsb is the length by which each - * extent is shifted. If there is no hole to shift the extents - * into, this will be considered invalid operation and we abort immediately. + * The maximum number of extents to be shifted in a single operation is + * @num_exts. @start_fsb specifies the file offset to start the shift and the + * file offset where we've left off is returned in @next_fsb. @offset_shift_fsb + * is the length by which each extent is shifted. If there is no hole to shift + * the extents into, this will be considered invalid operation and we abort + * immediately. */ int xfs_bmap_shift_extents( struct xfs_trans *tp, struct xfs_inode *ip, - int *done, xfs_fileoff_t start_fsb, xfs_fileoff_t offset_shift_fsb, - xfs_extnum_t *current_ext, + int *done, + xfs_fileoff_t *next_fsb, xfs_fsblock_t *firstblock, struct xfs_bmap_free *flist, int num_exts) @@ -5431,6 +5432,7 @@ xfs_bmap_shift_extents( struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp; xfs_extnum_t nexts = 0; + xfs_extnum_t current_ext; xfs_fileoff_t startoff; int error = 0; int i; @@ -5451,7 +5453,8 @@ xfs_bmap_shift_extents( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; - ASSERT(current_ext != NULL); + ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); ifp = XFS_IFORK_PTR(ip, whichfork); if (!(ifp->if_flags & XFS_IFEXTENTS)) { @@ -5462,20 +5465,18 @@ xfs_bmap_shift_extents( } /* - * If *current_ext is 0, we would need to lookup the extent - * from where we would start shifting and store it in gotp. + * Look up the extent index for the fsb where we start shifting. We can + * henceforth iterate with current_ext as extent list changes are locked + * out via ilock. + * + * gotp can be null in 2 cases: 1) if there are no extents or 2) + * start_fsb lies in a hole beyond which there are no extents. Either + * way, we are done. */ - if (!*current_ext) { - gotp = xfs_iext_bno_to_ext(ifp, start_fsb, current_ext); - /* - * gotp can be null in 2 cases: 1) if there are no extents - * or 2) start_fsb lies in a hole beyond which there are - * no extents. Either way, we are done. - */ - if (!gotp) { - *done = 1; - return 0; - } + gotp = xfs_iext_bno_to_ext(ifp, start_fsb, ¤t_ext); + if (!gotp) { + *done = 1; + return 0; } if (ifp->if_flags & XFS_IFBROOT) { @@ -5487,36 +5488,32 @@ xfs_bmap_shift_extents( /* * There may be delalloc extents in the data fork before the range we - * are collapsing out, so we cannot - * use the count of real extents here. Instead we have to calculate it - * from the incore fork. + * are collapsing out, so we cannot use the count of real extents here. + * Instead we have to calculate it from the incore fork. */ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t); - while (nexts++ < num_exts && *current_ext < total_extents) { + while (nexts++ < num_exts && current_ext < total_extents) { - gotp = xfs_iext_get_ext(ifp, *current_ext); + gotp = xfs_iext_get_ext(ifp, current_ext); xfs_bmbt_get_all(gotp, &got); startoff = got.br_startoff - offset_shift_fsb; /* - * Before shifting extent into hole, make sure that the hole - * is large enough to accomodate the shift. + * Before shifting extent into hole, make sure that the hole is + * large enough to accommodate the shift. */ - if (*current_ext) { - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, - *current_ext - 1), &left); - + if (current_ext > 0) { + xfs_bmbt_get_all(xfs_iext_get_ext(ifp, current_ext - 1), + &left); if (startoff < left.br_startoff + left.br_blockcount) error = -EINVAL; } else if (offset_shift_fsb > got.br_startoff) { /* - * When first extent is shifted, offset_shift_fsb - * should be less than the stating offset of - * the first extent. + * When first extent is shifted, offset_shift_fsb should + * be less than the stating offset of the first extent. */ error = -EINVAL; } - if (error) goto del_cursor; @@ -5531,7 +5528,7 @@ xfs_bmap_shift_extents( } /* Check if we can merge 2 adjacent extents */ - if (*current_ext && + if (current_ext && left.br_startoff + left.br_blockcount == startoff && left.br_startblock + left.br_blockcount == got.br_startblock && @@ -5539,7 +5536,7 @@ xfs_bmap_shift_extents( left.br_blockcount + got.br_blockcount <= MAXEXTLEN) { blockcount = left.br_blockcount + got.br_blockcount; - xfs_iext_remove(ip, *current_ext, 1, 0); + xfs_iext_remove(ip, current_ext, 1, 0); logflags |= XFS_ILOG_CORE; if (cur) { error = xfs_btree_delete(cur, &i); @@ -5551,7 +5548,7 @@ xfs_bmap_shift_extents( } XFS_IFORK_NEXT_SET(ip, whichfork, XFS_IFORK_NEXTENTS(ip, whichfork) - 1); - gotp = xfs_iext_get_ext(ifp, --*current_ext); + gotp = xfs_iext_get_ext(ifp, --current_ext); xfs_bmbt_get_all(gotp, &got); /* Make cursor point to the extent we will update */ @@ -5585,13 +5582,18 @@ xfs_bmap_shift_extents( logflags |= XFS_ILOG_DEXT; } - (*current_ext)++; + current_ext++; total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t); } /* Check if we are done */ - if (*current_ext == total_extents) + if (current_ext == total_extents) *done = 1; + else if (next_fsb) { + gotp = xfs_iext_get_ext(ifp, current_ext); + xfs_bmbt_get_all(gotp, &got); + *next_fsb = got.br_startoff; + } del_cursor: if (cur) @@ -5600,5 +5602,6 @@ del_cursor: if (logflags) xfs_trans_log_inode(tp, ip, logflags); + return error; } diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index b879ca56a64c..44db6db86402 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -178,9 +178,8 @@ int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx, xfs_extnum_t num); uint xfs_default_attroffset(struct xfs_inode *ip); int xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip, - int *done, xfs_fileoff_t start_fsb, - xfs_fileoff_t offset_shift_fsb, xfs_extnum_t *current_ext, - xfs_fsblock_t *firstblock, struct xfs_bmap_free *flist, - int num_exts); + xfs_fileoff_t start_fsb, xfs_fileoff_t offset_shift_fsb, + int *done, xfs_fileoff_t *next_fsb, xfs_fsblock_t *firstblock, + struct xfs_bmap_free *flist, int num_exts); #endif /* __XFS_BMAP_H__ */ diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 1707980f9a4b..1e96d778bb9e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1456,18 +1456,18 @@ xfs_collapse_file_space( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; int error; - xfs_extnum_t current_ext = 0; struct xfs_bmap_free free_list; xfs_fsblock_t first_block; int committed; xfs_fileoff_t start_fsb; + xfs_fileoff_t next_fsb; xfs_fileoff_t shift_fsb; ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); trace_xfs_collapse_file_space(ip); - start_fsb = XFS_B_TO_FSB(mp, offset + len); + next_fsb = XFS_B_TO_FSB(mp, offset + len); shift_fsb = XFS_B_TO_FSB(mp, len); /* @@ -1525,10 +1525,10 @@ xfs_collapse_file_space( * We are using the write transaction in which max 2 bmbt * updates are allowed */ - error = xfs_bmap_shift_extents(tp, ip, &done, start_fsb, - shift_fsb, ¤t_ext, - &first_block, &free_list, - XFS_BMAP_MAX_SHIFT_EXTENTS); + start_fsb = next_fsb; + error = xfs_bmap_shift_extents(tp, ip, start_fsb, shift_fsb, + &done, &next_fsb, &first_block, &free_list, + XFS_BMAP_MAX_SHIFT_EXTENTS); if (error) goto out; -- cgit v1.2.3 From ddb19e3180fa42362a04e86771d758be1de0bb13 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Tue, 23 Sep 2014 15:38:09 +1000 Subject: xfs: refactor shift-by-merge into xfs_bmse_merge() helper The extent shift mechanism in xfs_bmap_shift_extents() is complicated and handles several different, non-deterministic scenarios. These include extent shifts, extent merges and potential btree updates in either of the former scenarios. Refactor the code to be more linear and readable. The loop logic in xfs_bmap_shift_extents() and some initial error checking is adjusted slightly. The associated btree lookup and update/delete operations are condensed into single blocks of code. This reduces the number of btree-specific blocks and facilitates the separation of the merge operation into a new xfs_bmse_merge() and xfs_bmse_can_merge() helpers. This is a code refactor only. The behavior of extent shift and collapse range is not modified. Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_bmap.c | 242 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 167 insertions(+), 75 deletions(-) (limited to 'fs') diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 4b3f1b92cddd..532c4aab20f2 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5403,6 +5403,120 @@ error0: return error; } +/* + * Determine whether an extent shift can be accomplished by a merge with the + * extent that precedes the target hole of the shift. + */ +STATIC bool +xfs_bmse_can_merge( + struct xfs_bmbt_irec *left, /* preceding extent */ + struct xfs_bmbt_irec *got, /* current extent to shift */ + xfs_fileoff_t shift) /* shift fsb */ +{ + xfs_fileoff_t startoff; + + startoff = got->br_startoff - shift; + + /* + * The extent, once shifted, must be adjacent in-file and on-disk with + * the preceding extent. + */ + if ((left->br_startoff + left->br_blockcount != startoff) || + (left->br_startblock + left->br_blockcount != got->br_startblock) || + (left->br_state != got->br_state) || + (left->br_blockcount + got->br_blockcount > MAXEXTLEN)) + return false; + + return true; +} + +/* + * A bmap extent shift adjusts the file offset of an extent to fill a preceding + * hole in the file. If an extent shift would result in the extent being fully + * adjacent to the extent that currently precedes the hole, we can merge with + * the preceding extent rather than do the shift. + * + * This function assumes the caller has verified a shift-by-merge is possible + * with the provided extents via xfs_bmse_can_merge(). + */ +STATIC int +xfs_bmse_merge( + struct xfs_inode *ip, + int whichfork, + xfs_fileoff_t shift, /* shift fsb */ + int current_ext, /* idx of gotp */ + struct xfs_bmbt_rec_host *gotp, /* extent to shift */ + struct xfs_bmbt_rec_host *leftp, /* preceding extent */ + struct xfs_btree_cur *cur, + int *logflags) /* output */ +{ + struct xfs_ifork *ifp; + struct xfs_bmbt_irec got; + struct xfs_bmbt_irec left; + xfs_filblks_t blockcount; + int error, i; + + ifp = XFS_IFORK_PTR(ip, whichfork); + xfs_bmbt_get_all(gotp, &got); + xfs_bmbt_get_all(leftp, &left); + blockcount = left.br_blockcount + got.br_blockcount; + + ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_bmse_can_merge(&left, &got, shift)); + + /* + * Merge the in-core extents. Note that the host record pointers and + * current_ext index are invalid once the extent has been removed via + * xfs_iext_remove(). + */ + xfs_bmbt_set_blockcount(leftp, blockcount); + xfs_iext_remove(ip, current_ext, 1, 0); + + /* + * Update the on-disk extent count, the btree if necessary and log the + * inode. + */ + XFS_IFORK_NEXT_SET(ip, whichfork, + XFS_IFORK_NEXTENTS(ip, whichfork) - 1); + *logflags |= XFS_ILOG_CORE; + if (!cur) { + *logflags |= XFS_ILOG_DEXT; + return 0; + } + + /* lookup and remove the extent to merge */ + error = xfs_bmbt_lookup_eq(cur, got.br_startoff, got.br_startblock, + got.br_blockcount, &i); + if (error) + goto out_error; + XFS_WANT_CORRUPTED_GOTO(i == 1, out_error); + + error = xfs_btree_delete(cur, &i); + if (error) + goto out_error; + XFS_WANT_CORRUPTED_GOTO(i == 1, out_error); + + /* lookup and update size of the previous extent */ + error = xfs_bmbt_lookup_eq(cur, left.br_startoff, left.br_startblock, + left.br_blockcount, &i); + if (error) + goto out_error; + XFS_WANT_CORRUPTED_GOTO(i == 1, out_error); + + left.br_blockcount = blockcount; + + error = xfs_bmbt_update(cur, left.br_startoff, left.br_startblock, + left.br_blockcount, left.br_state); + if (error) + goto out_error; + + return 0; + +out_error: + return error; +} + /* * Shift extent records to the left to cover a hole. * @@ -5427,6 +5541,7 @@ xfs_bmap_shift_extents( { struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_rec_host *gotp; + struct xfs_bmbt_rec_host *leftp; struct xfs_bmbt_irec got; struct xfs_bmbt_irec left; struct xfs_mount *mp = ip->i_mount; @@ -5438,7 +5553,6 @@ xfs_bmap_shift_extents( int i; int whichfork = XFS_DATA_FORK; int logflags = 0; - xfs_filblks_t blockcount = 0; int total_extents; if (unlikely(XFS_TEST_ERROR( @@ -5464,6 +5578,13 @@ xfs_bmap_shift_extents( return error; } + if (ifp->if_flags & XFS_IFBROOT) { + cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); + cur->bc_private.b.firstblock = *firstblock; + cur->bc_private.b.flist = flist; + cur->bc_private.b.flags = 0; + } + /* * Look up the extent index for the fsb where we start shifting. We can * henceforth iterate with current_ext as extent list changes are locked @@ -5476,14 +5597,17 @@ xfs_bmap_shift_extents( gotp = xfs_iext_bno_to_ext(ifp, start_fsb, ¤t_ext); if (!gotp) { *done = 1; - return 0; + goto del_cursor; } + xfs_bmbt_get_all(gotp, &got); - if (ifp->if_flags & XFS_IFBROOT) { - cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); - cur->bc_private.b.firstblock = *firstblock; - cur->bc_private.b.flist = flist; - cur->bc_private.b.flags = 0; + /* + * If the first extent is shifted, offset_shift_fsb cannot be larger + * than the starting offset of the first extent. + */ + if (current_ext == 0 && got.br_startoff < offset_shift_fsb) { + error = -EINVAL; + goto del_cursor; } /* @@ -5493,30 +5617,41 @@ xfs_bmap_shift_extents( */ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t); while (nexts++ < num_exts && current_ext < total_extents) { - - gotp = xfs_iext_get_ext(ifp, current_ext); - xfs_bmbt_get_all(gotp, &got); startoff = got.br_startoff - offset_shift_fsb; - /* - * Before shifting extent into hole, make sure that the hole is - * large enough to accommodate the shift. - */ + /* grab the left extent and check for a potential merge */ if (current_ext > 0) { - xfs_bmbt_get_all(xfs_iext_get_ext(ifp, current_ext - 1), - &left); - if (startoff < left.br_startoff + left.br_blockcount) + leftp = xfs_iext_get_ext(ifp, current_ext - 1); + xfs_bmbt_get_all(leftp, &left); + + /* make sure hole is large enough for shift */ + if (startoff < left.br_startoff + left.br_blockcount) { error = -EINVAL; - } else if (offset_shift_fsb > got.br_startoff) { - /* - * When first extent is shifted, offset_shift_fsb should - * be less than the stating offset of the first extent. - */ - error = -EINVAL; + goto del_cursor; + } + + if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) { + error = xfs_bmse_merge(ip, whichfork, + offset_shift_fsb, current_ext, gotp, + leftp, cur, &logflags); + if (error) + goto del_cursor; + + /* + * The extent was merged so adjust the extent + * index and move onto the next. + */ + current_ext--; + goto next; + } } - if (error) - goto del_cursor; + /* + * We didn't merge the extent so do the shift. Update the start + * offset in the in-core extent and btree, if necessary. + */ + xfs_bmbt_set_startoff(gotp, startoff); + logflags |= XFS_ILOG_CORE; if (cur) { error = xfs_bmbt_lookup_eq(cur, got.br_startoff, got.br_startblock, @@ -5525,53 +5660,8 @@ xfs_bmap_shift_extents( if (error) goto del_cursor; XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); - } - - /* Check if we can merge 2 adjacent extents */ - if (current_ext && - left.br_startoff + left.br_blockcount == startoff && - left.br_startblock + left.br_blockcount == - got.br_startblock && - left.br_state == got.br_state && - left.br_blockcount + got.br_blockcount <= MAXEXTLEN) { - blockcount = left.br_blockcount + - got.br_blockcount; - xfs_iext_remove(ip, current_ext, 1, 0); - logflags |= XFS_ILOG_CORE; - if (cur) { - error = xfs_btree_delete(cur, &i); - if (error) - goto del_cursor; - XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); - } else { - logflags |= XFS_ILOG_DEXT; - } - XFS_IFORK_NEXT_SET(ip, whichfork, - XFS_IFORK_NEXTENTS(ip, whichfork) - 1); - gotp = xfs_iext_get_ext(ifp, --current_ext); - xfs_bmbt_get_all(gotp, &got); - - /* Make cursor point to the extent we will update */ - if (cur) { - error = xfs_bmbt_lookup_eq(cur, got.br_startoff, - got.br_startblock, - got.br_blockcount, - &i); - if (error) - goto del_cursor; - XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); - } - xfs_bmbt_set_blockcount(gotp, blockcount); - got.br_blockcount = blockcount; - } else { - /* We have to update the startoff */ - xfs_bmbt_set_startoff(gotp, startoff); got.br_startoff = startoff; - } - - logflags |= XFS_ILOG_CORE; - if (cur) { error = xfs_bmbt_update(cur, got.br_startoff, got.br_startblock, got.br_blockcount, @@ -5582,18 +5672,20 @@ xfs_bmap_shift_extents( logflags |= XFS_ILOG_DEXT; } - current_ext++; +next: + /* update total extent count and grab the next record */ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t); + if (++current_ext >= total_extents) + break; + gotp = xfs_iext_get_ext(ifp, current_ext); + xfs_bmbt_get_all(gotp, &got); } /* Check if we are done */ if (current_ext == total_extents) *done = 1; - else if (next_fsb) { - gotp = xfs_iext_get_ext(ifp, current_ext); - xfs_bmbt_get_all(gotp, &got); + else if (next_fsb) *next_fsb = got.br_startoff; - } del_cursor: if (cur) -- cgit v1.2.3 From a979bdfea10a61dce0055b4d416d640f4f5f495e Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Tue, 23 Sep 2014 15:39:04 +1000 Subject: xfs: refactor single extent shift into xfs_bmse_shift_one() helper xfs_bmap_shift_extents() has a variety of conditions and error checks that make the logic difficult to follow and indent heavy. Refactor the loop body of this function into a new xfs_bmse_shift_one() helper. This simplifies the error checks, eliminates index decrement on merge hack by pushing the index increment down into the helper, and makes the code more readable by reducing multiple levels of indentation. This is a code refactor only. The behavior of extent shift and collapse range is not modified. Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_bmap.c | 164 ++++++++++++++++++++++++++--------------------- 1 file changed, 91 insertions(+), 73 deletions(-) (limited to 'fs') diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 532c4aab20f2..69bf8d868b64 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5517,6 +5517,88 @@ out_error: return error; } +/* + * Shift a single extent. + */ +STATIC int +xfs_bmse_shift_one( + struct xfs_inode *ip, + int whichfork, + xfs_fileoff_t offset_shift_fsb, + int *current_ext, + struct xfs_bmbt_rec_host *gotp, + struct xfs_btree_cur *cur, + int *logflags) +{ + struct xfs_ifork *ifp; + xfs_fileoff_t startoff; + struct xfs_bmbt_rec_host *leftp; + struct xfs_bmbt_irec got; + struct xfs_bmbt_irec left; + int error; + int i; + + ifp = XFS_IFORK_PTR(ip, whichfork); + + xfs_bmbt_get_all(gotp, &got); + startoff = got.br_startoff - offset_shift_fsb; + + /* + * If this is the first extent in the file, make sure there's enough + * room at the start of the file and jump right to the shift as there's + * no left extent to merge. + */ + if (*current_ext == 0) { + if (got.br_startoff < offset_shift_fsb) + return -EINVAL; + goto shift_extent; + } + + /* grab the left extent and check for a large enough hole */ + leftp = xfs_iext_get_ext(ifp, *current_ext - 1); + xfs_bmbt_get_all(leftp, &left); + + if (startoff < left.br_startoff + left.br_blockcount) + return -EINVAL; + + /* check whether to merge the extent or shift it down */ + if (!xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) + goto shift_extent; + + return xfs_bmse_merge(ip, whichfork, offset_shift_fsb, *current_ext, + gotp, leftp, cur, logflags); + +shift_extent: + /* + * Increment the extent index for the next iteration, update the start + * offset of the in-core extent and update the btree if applicable. + */ + (*current_ext)++; + xfs_bmbt_set_startoff(gotp, startoff); + *logflags |= XFS_ILOG_CORE; + if (!cur) { + *logflags |= XFS_ILOG_DEXT; + return 0; + } + + error = xfs_bmbt_lookup_eq(cur, got.br_startoff, got.br_startblock, + got.br_blockcount, &i); + if (error) + return error; + XFS_WANT_CORRUPTED_GOTO(i == 1, out_error); + + got.br_startoff = startoff; + error = xfs_bmbt_update(cur, got.br_startoff, got.br_startblock, + got.br_blockcount, got.br_state); + if (error) + return error; + + return 0; + +out_error: + return error; +} + /* * Shift extent records to the left to cover a hole. * @@ -5541,16 +5623,12 @@ xfs_bmap_shift_extents( { struct xfs_btree_cur *cur = NULL; struct xfs_bmbt_rec_host *gotp; - struct xfs_bmbt_rec_host *leftp; struct xfs_bmbt_irec got; - struct xfs_bmbt_irec left; struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp; xfs_extnum_t nexts = 0; xfs_extnum_t current_ext; - xfs_fileoff_t startoff; int error = 0; - int i; int whichfork = XFS_DATA_FORK; int logflags = 0; int total_extents; @@ -5599,16 +5677,6 @@ xfs_bmap_shift_extents( *done = 1; goto del_cursor; } - xfs_bmbt_get_all(gotp, &got); - - /* - * If the first extent is shifted, offset_shift_fsb cannot be larger - * than the starting offset of the first extent. - */ - if (current_ext == 0 && got.br_startoff < offset_shift_fsb) { - error = -EINVAL; - goto del_cursor; - } /* * There may be delalloc extents in the data fork before the range we @@ -5617,75 +5685,25 @@ xfs_bmap_shift_extents( */ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t); while (nexts++ < num_exts && current_ext < total_extents) { - startoff = got.br_startoff - offset_shift_fsb; - - /* grab the left extent and check for a potential merge */ - if (current_ext > 0) { - leftp = xfs_iext_get_ext(ifp, current_ext - 1); - xfs_bmbt_get_all(leftp, &left); - - /* make sure hole is large enough for shift */ - if (startoff < left.br_startoff + left.br_blockcount) { - error = -EINVAL; - goto del_cursor; - } - - if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) { - error = xfs_bmse_merge(ip, whichfork, - offset_shift_fsb, current_ext, gotp, - leftp, cur, &logflags); - if (error) - goto del_cursor; - - /* - * The extent was merged so adjust the extent - * index and move onto the next. - */ - current_ext--; - goto next; - } - } - - /* - * We didn't merge the extent so do the shift. Update the start - * offset in the in-core extent and btree, if necessary. - */ - xfs_bmbt_set_startoff(gotp, startoff); - logflags |= XFS_ILOG_CORE; - if (cur) { - error = xfs_bmbt_lookup_eq(cur, got.br_startoff, - got.br_startblock, - got.br_blockcount, - &i); - if (error) - goto del_cursor; - XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor); - - got.br_startoff = startoff; - error = xfs_bmbt_update(cur, got.br_startoff, - got.br_startblock, - got.br_blockcount, - got.br_state); - if (error) - goto del_cursor; - } else { - logflags |= XFS_ILOG_DEXT; - } + error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb, + ¤t_ext, gotp, cur, &logflags); + if (error) + goto del_cursor; -next: /* update total extent count and grab the next record */ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t); - if (++current_ext >= total_extents) + if (current_ext >= total_extents) break; gotp = xfs_iext_get_ext(ifp, current_ext); - xfs_bmbt_get_all(gotp, &got); } /* Check if we are done */ - if (current_ext == total_extents) + if (current_ext == total_extents) { *done = 1; - else if (next_fsb) + } else if (next_fsb) { + xfs_bmbt_get_all(gotp, &got); *next_fsb = got.br_startoff; + } del_cursor: if (cur) -- cgit v1.2.3 From f71721d061e872a39b2680d13f309c1eb6893438 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Tue, 23 Sep 2014 15:39:05 +1000 Subject: xfs: writeback and inval. file range to be shifted by collapse The collapse range operation currently writes the entire file before starting the collapse to avoid changes in the in-core extent list due to writeback causing the extent count to change. Now that collapse range is fsb based rather than extent index based it can sustain changes in the extent list during the shift sequence without disruption. Modify xfs_collapse_file_space() to writeback and invalidate pages associated with the range of the file to be shifted. xfs_free_file_space() currently has similar behavior, but the space free need only affect the region of the file that is freed and this could change in the future. Also update the comments to reflect the current implementation. We retain the eofblocks trim permanently as a best option for dealing with delalloc extents. We don't shift delalloc extents because this scenario only occurs with post-eof preallocation (since data must be flushed such that the cache can be invalidated and data can be shifted). That means said space must also be initialized before being shifted into the accessible region of the file only to be immediately truncated off as the last part of the collapse. In other words, the eofblocks trim will happen anyways, we just run it first to ensure the file remains in a consistent state throughout the collapse. Finally, detect and fail explicitly in the event of a delalloc extent during the extent shift. The implementation does not support delalloc extents and the caller is expected to prevent this scenario in advance as is done by collapse. Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_bmap.c | 4 ++++ fs/xfs/xfs_bmap_util.c | 32 +++++++++++++++++++------------- 2 files changed, 23 insertions(+), 13 deletions(-) (limited to 'fs') diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 69bf8d868b64..79c981984dca 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5543,6 +5543,10 @@ xfs_bmse_shift_one( xfs_bmbt_get_all(gotp, &got); startoff = got.br_startoff - offset_shift_fsb; + /* delalloc extents should be prevented by caller */ + XFS_WANT_CORRUPTED_GOTO(!isnullstartblock(got.br_startblock), + out_error); + /* * If this is the first extent in the file, make sure there's enough * room at the start of the file and jump right to the shift as there's diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 1e96d778bb9e..eae763f4600e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1470,27 +1470,33 @@ xfs_collapse_file_space( next_fsb = XFS_B_TO_FSB(mp, offset + len); shift_fsb = XFS_B_TO_FSB(mp, len); - /* - * Writeback the entire file and force remove any post-eof blocks. The - * writeback prevents changes to the extent list via concurrent - * writeback and the eofblocks trim prevents the extent shift algorithm - * from running into a post-eof delalloc extent. - * - * XXX: This is a temporary fix until the extent shift loop below is - * converted to use offsets and lookups within the ILOCK rather than - * carrying around the index into the extent list for the next - * iteration. - */ - error = filemap_write_and_wait(VFS_I(ip)->i_mapping); + error = xfs_free_file_space(ip, offset, len); if (error) return error; + + /* + * Trim eofblocks to avoid shifting uninitialized post-eof preallocation + * into the accessible region of the file. + */ if (xfs_can_free_eofblocks(ip, true)) { error = xfs_free_eofblocks(mp, ip, false); if (error) return error; } - error = xfs_free_file_space(ip, offset, len); + /* + * Writeback and invalidate cache for the remainder of the file as we're + * about to shift down every extent from the collapse range to EOF. The + * free of the collapse range above might have already done some of + * this, but we shouldn't rely on it to do anything outside of the range + * that was freed. + */ + error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, + offset + len, -1); + if (error) + return error; + error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping, + (offset + len) >> PAGE_CACHE_SHIFT, -1); if (error) return error; -- cgit v1.2.3 From 8b5279e33f241a074a9c8649bba8f77a2167b798 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Tue, 23 Sep 2014 15:39:05 +1000 Subject: xfs: only writeback and truncate pages for the freed range xfs_free_file_space() only affects the range of the file for which space is being freed. It currently writes and truncates the page cache from the start offset of the free to EOF. Modify xfs_free_file_space() to write back and truncate page cache of just the range being freed. Signed-off-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/xfs_bmap_util.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index eae763f4600e..809ae7d395c3 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1205,6 +1205,7 @@ xfs_free_file_space( xfs_bmap_free_t free_list; xfs_bmbt_irec_t imap; xfs_off_t ioffset; + xfs_off_t iendoffset; xfs_extlen_t mod=0; xfs_mount_t *mp; int nimap; @@ -1233,12 +1234,13 @@ xfs_free_file_space( inode_dio_wait(VFS_I(ip)); rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE); - ioffset = offset & ~(rounding - 1); - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - ioffset, -1); + ioffset = round_down(offset, rounding); + iendoffset = round_up(offset + len, rounding) - 1; + error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, ioffset, + iendoffset); if (error) goto out; - truncate_pagecache_range(VFS_I(ip), ioffset, -1); + truncate_pagecache_range(VFS_I(ip), ioffset, iendoffset); /* * Need to zero the stuff we're not freeing, on disk. -- cgit v1.2.3