| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:
- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.
- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.
- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.
- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.
These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For configurations that do not enable DAX filesystems or drivers, do not
require the DAX core to be built.
Given that the 'direct_access' method has been removed from
'block_device_operations', we can also go ahead and remove the
block-related dax helper functions from fs/block_dev.c to
drivers/dax/super.c. This keeps dax details out of the block layer and
lets the DAX core be built as a module in the FS_DAX=n case.
Filesystems need to include dax.h to call bdev_dax_supported().
Cc: linux-xfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fix typos and add the following to the scripts/spelling.txt:
intialisation||initialisation
intialised||initialised
intialise||initialise
This commit does not intend to change the British spelling itself.
Link: http://lkml.kernel.org/r/1481573103-11329-18-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
__vmalloc* allows users to provide gfp flags for the underlying
allocation. This API is quite popular
$ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
77
The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space. About half of users don't
use this flag, though. This signals that we make the API unnecessarily
too complex.
This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
are simplified and drop the flag.
Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull xfs updates from Darrick Wong:
"Here are the XFS changes for 4.12. The big new feature for this
release is the new space mapping ioctl that we've been discussing
since LSF2016, but other than that most of the patches are larger bug
fixes, memory corruption prevention, and other cleanups.
Summary:
- various code cleanups
- introduce GETFSMAP ioctl
- various refactoring
- avoid dio reads past eof
- fix memory corruption and other errors with fragmented directory blocks
- fix accidental userspace memory corruptions
- publish fs uuid in superblock
- make fstrim terminatable
- fix race between quotaoff and in-core inode creation
- avoid use-after-free when finishing up w/ buffer heads
- reserve enough space to handle bmap tree resizing during cow remap"
* tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
xfs: fix use-after-free in xfs_finish_page_writeback
xfs: reserve enough blocks to handle btree splits when remapping
xfs: wait on new inodes during quotaoff dquot release
xfs: update ag iterator to support wait on new inodes
xfs: support ability to wait on new inodes
xfs: publish UUID in struct super_block
xfs: Allow user to kill fstrim process
xfs: better log intent item refcount checking
xfs: fix up quotacheck buffer list error handling
xfs: remove xfs_trans_ail_delete_bulk
xfs: don't use bool values in trace buffers
xfs: fix getfsmap userspace memory corruption while setting OF_LAST
xfs: fix __user annotations for xfs_ioc_getfsmap
xfs: corruption needs to respect endianess too!
xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
xfs: simplify validation of the unwritten extent bit
xfs: remove unused values from xfs_exntst_t
xfs: remove the unused XFS_MAXLINK_1 define
xfs: more do_div cleanups
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 28b783e47ad7 ("xfs: bufferhead chains are invalid after
end_page_writeback") fixed one use-after-free issue by
pre-calculating the loop conditionals before calling bh->b_end_io()
in the end_io processing loop, but it assigned 'next' pointer before
checking end offset boundary & breaking the loop, at which point the
bh might be freed already, and caused use-after-free.
This is caught by KASAN when running fstests generic/127 on sub-page
block size XFS.
[ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50
[ 2747.868840] ==================================================================
[ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698
...
[ 2747.918245] Call Trace:
[ 2747.920975] dump_stack+0x63/0x84
[ 2747.924673] kasan_object_err+0x21/0x70
[ 2747.928950] kasan_report+0x271/0x530
[ 2747.933064] ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.938409] ? end_page_writeback+0xce/0x110
[ 2747.943171] __asan_report_load8_noabort+0x19/0x20
[ 2747.948545] xfs_destroy_ioend+0x3d3/0x4e0 [xfs]
[ 2747.953724] xfs_end_io+0x1af/0x2b0 [xfs]
[ 2747.958197] process_one_work+0x5ff/0x1000
[ 2747.962766] worker_thread+0xe4/0x10e0
[ 2747.966946] kthread+0x2d3/0x3d0
[ 2747.970546] ? process_one_work+0x1000/0x1000
[ 2747.975405] ? kthread_create_on_node+0xc0/0xc0
[ 2747.980457] ? syscall_return_slowpath+0xe6/0x140
[ 2747.985706] ? do_page_fault+0x30/0x80
[ 2747.989887] ret_from_fork+0x2c/0x40
[ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104
[ 2748.001155] Allocated:
[ 2748.003782] PID = 8327
[ 2748.006411] save_stack_trace+0x1b/0x20
[ 2748.010688] save_stack+0x46/0xd0
[ 2748.014383] kasan_kmalloc+0xad/0xe0
[ 2748.018370] kasan_slab_alloc+0x12/0x20
[ 2748.022648] kmem_cache_alloc+0xb8/0x1b0
[ 2748.027024] alloc_buffer_head+0x22/0xc0
[ 2748.031399] alloc_page_buffers+0xd1/0x250
[ 2748.035968] create_empty_buffers+0x30/0x410
[ 2748.040730] create_page_buffers+0x120/0x1b0
[ 2748.045493] __block_write_begin_int+0x17a/0x1800
[ 2748.050740] iomap_write_begin+0x100/0x2f0
[ 2748.055308] iomap_zero_range_actor+0x253/0x5c0
[ 2748.060362] iomap_apply+0x157/0x270
[ 2748.064347] iomap_zero_range+0x5a/0x80
[ 2748.068624] iomap_truncate_page+0x6b/0xa0
[ 2748.073227] xfs_setattr_size+0x1f7/0xa10 [xfs]
[ 2748.078312] xfs_vn_setattr_size+0x68/0x140 [xfs]
[ 2748.083589] xfs_file_fallocate+0x4ac/0x820 [xfs]
[ 2748.088838] vfs_fallocate+0x2cf/0x780
[ 2748.093021] SyS_fallocate+0x48/0x80
[ 2748.097006] do_syscall_64+0x18a/0x430
[ 2748.101186] return_from_SYSCALL_64+0x0/0x6a
[ 2748.105948] Freed:
[ 2748.108189] PID = 8327
[ 2748.110816] save_stack_trace+0x1b/0x20
[ 2748.115093] save_stack+0x46/0xd0
[ 2748.118788] kasan_slab_free+0x73/0xc0
[ 2748.122969] kmem_cache_free+0x7a/0x200
[ 2748.127247] free_buffer_head+0x41/0x80
[ 2748.131524] try_to_free_buffers+0x178/0x250
[ 2748.136316] xfs_vm_releasepage+0x2e9/0x3d0 [xfs]
[ 2748.141563] try_to_release_page+0x100/0x180
[ 2748.146325] invalidate_inode_pages2_range+0x7da/0xcf0
[ 2748.152087] xfs_shift_file_space+0x37d/0x6e0 [xfs]
[ 2748.157557] xfs_collapse_file_space+0x49/0x120 [xfs]
[ 2748.163223] xfs_file_fallocate+0x2a7/0x820 [xfs]
[ 2748.168462] vfs_fallocate+0x2cf/0x780
[ 2748.172642] SyS_fallocate+0x48/0x80
[ 2748.176629] do_syscall_64+0x18a/0x430
[ 2748.180810] return_from_SYSCALL_64+0x0/0x6a
Fixed it by checking on offset against end & breaking out first,
dereference bh only if there're still bufferheads to process.
Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
handle adding 1 extent. This is problematic if we fragment free space,
have to do CoW, and then have to perform multiple bmap btree expansions.
Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
handle btree splits, so log recovery fails after our first error causes
the filesystem to go down.
Therefore, refactor the transaction block reservation macros until we
have a macro that works for our deferred (re)mapping activities, and fix
both problems by using that macro.
With 1k blocks we can hit this fairly often in g/187 if the scratch fs
is big enough.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.
To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.
Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.
The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Copy the uuid of the filesystem to struct super_block s_uuid field,
as several other filesystems already do. Copy regardless of the nouuid
mount option, because other filesystems also do not guaranty uniqueness
of the s_uuid field in super_block struct.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
fstrim can take really long time on big, slow device or on file system
with a lots of allocation groups. Currently there is no way for the user
to cancell the operation. This patch makes it possible for the user to
kill fstrim pocess by adding the check for fatal_signal_pending() in
xfs_trim_extents().
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use ASSERTs on the log intent item refcounts so that we fail noisily if
anyone tries to double-free the item.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.
Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
xfs_iflush_done uses an on-stack variable length array to pass the log
items to be deleted to xfs_trans_ail_delete_bulk. On-stack VLAs are a
nasty gcc extension that can lead to unbounded stack allocations, but
fortunately we can easily avoid them by simply open coding
xfs_trans_ail_delete_bulk in xfs_iflush_done, which is the only caller
of it except for the single-item xfs_trans_ail_delete.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Using bool values produces sparse warnings of this form:
fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2307:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
Just use a char instead to fix those up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
At the end of a getfsmap call, we will set FMR_OF_LAST in the last
struct fsmap that was handed in by userspace if we've truly run out of
space mapping record (as opposed to simply running out of space in the
user array). Unfortunately, fmh_entries is the wrong check for whether
or not we've filled out anything in the user array because the ioctl
provides that fmh_count==0 sets fmh_entries without filling out the user
array. Therefore we end up writing things into user memory areas that we
weren't given, and kaboom.
Since Christoph amended the getfsmap structure to track the number of
fsmap entries we've actually filled out, use that as part of deciding if
we have to set the OF_LAST flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
By passing the whole fsmap_head structure and an index we can get the
user point annotations right for the embedded variable sized array
in struct fsmap_head.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change idx to unsigned int]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
At least if we want to be able to recognize the pattern. Add a missing
byte swap to the corruption injection case in xlog_sync.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Found by sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.
We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.
Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.
This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format. The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We only ever use the normal and unwritten states. And the actual
ondisk format (this enum isn't despite being in xfs_format.h) only
has space for the unwritten bit anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On some architectures do_div does the pointer compare
trick to make sure that we've sent it an unsigned 64-bit
number. (Why unsigned? I don't know.)
Fix up the few places that squawk about this; in
xfs_bmap_wants_extents() we just used a bare int64_t so change
that to unsigned.
In xfs_adjust_extent_unmap_boundaries() all we wanted was the
mod, and we have an xfs-specific function to handle that w/o
side effects, which includes proper casting for do_div.
In xfs_daddr_to_ag[b]no, we were using the wrong type anyway;
XFS_BB_TO_FSBT returns a block in the filesystem, so use
xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness
from that type as a bonus.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The kbuild test robot caught this; in debug code we have another
caller of do_div with a 32-bit dividend (j) which is caught now
that we are using the kernel-supplied do_div.
None of the values used here are 64-bit; just use simple division.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The trailing newlines wil lead to extra newlines in the trace file
which looks like the following output, so remove them.
>kworker/4:1H-1508 [004] .... 47879.101608: xfs_discard_extent: dev 8:0
>
>kworker/u16:2-238 [004] .... 47879.101725: xfs_extent_busy_clear: dev 8:0
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix the getfsmap tracepoints too]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.
This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 88..95 0 (88..95) 8
1: [8..15]: 80..87 0 (80..87) 8
2: [16..39]: 168..191 0 (168..191) 24
3: [40..63]: 5242952..5242975 1 (72..95) 24
Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.
Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.
Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Carlos had a case where "find" seemed to start spinning
forever and never return.
This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:
0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...
i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.
At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size. During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes. If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.
At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.
The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block. This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.
The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.
There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.
Huge appreciation for Carlos for debugging and isolating
the problem.
Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that reflink operations don't set the firstblock value we don't
need the workarounds for non-NULL firstblock values without a prior
allocation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar
to what we do in the space allocator. But the reflink code doesn't touch
the allocation btree unlike the normal space allocator, so we couldn't
care less about the state of the AGFL.
So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in
the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new helper to be used for reflink extent list additions instead of
funneling them through xfs_bmapi_write and overloading the firstblock
member in struct xfs_bmalloca and struct xfs_alloc_args.
With some small changes to xfs_bmap_remap_alloc this also means we do
not need a xfs_bmalloca structure for this case at all.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For the reflink case we'd much rather pass the required arguments than
faking up a struct xfs_bmalloca.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We never do COW operations for the attr fork, so don't pretend we handle
them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Lockdep complains about use of the iolock in inode reclaim context
because it doesn't understand that reclaim has the last reference to
the inode, and thus an iolock->reclaim->iolock deadlock is not
possible.
The iolock is technically not necessary in xfs_inactive() and was
only added to appease an assert in xfs_free_eofblocks(), which can
be called from other non-reclaim contexts. Therefore, just kill the
assert and drop the use of the iolock from reclaim context to quiet
lockdep.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Long ago, all this gunk was added with a lament about problems
with gcc's do_div, and a fun recommendation in the changelog:
egcs-2.91.66 is the recommended compiler version for building XFS.
All this special stuff was needed to work around an old gcc bug,
apparently, and it's been there ever since.
There should be no need for this anymore, so remove it.
Remove the special 32-bit xfs_do_mod as well; just let the
kernel's do_div() handle all this.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ndquots is a 32-bit value, and we don't care
about the remainder; there is no reason to use do_div
here, it seems to be the result of a decade+ historical
accident.
Worse, the do_div implementation in userspace breaks
when fed a 32-bit dividend, so we commented it out there
in any case.
Change to simple division, and then we can change
userspace to match, and mandate a 64-bit dividend in
the do_div() in userspace as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Apparently FIEMAP for xattrs has been broken since we switched to
the iomap backend because of an incorrect check for xattr presence.
Also fix the broken locking.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
No one cares about the low-level helper anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use the realtime bitmap to return free space information via getfsmap.
Eventually this will be superseded by the realtime rmapbt code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the reverse-mapping btree isn't available, fall back to the
free space btrees to provide partial reverse mapping information.
The online scrub tool can make use of even partial information to
speed up the data block scan.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add _query_range and _query_all functions to the realtime bitmap
allocator. These two functions are similar in usage to the btree
functions with the same name and will be used for getfsmap and scrub.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Create a helper function that will query all records in a btree.
This will be used by the online repair functions to examine every
record in a btree to rebuild a second btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Implement a query_range function for the bnobt and cntbt. This will
be used for getfsmap fallback if there is no rmapbt and by the online
scrub and repair code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the free space btrees. Remove the debugging asserts
so that we can make queries starting from block 0.
While we're at it, merge the redundant "if (btnum ==" hunks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel. struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This function has been removed ever since at least 3.12-era. No need to
keep its declaration in the header so nuke it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
"xfs_iread: validation failed for inode 96 failed"
One "failed" seems like enough.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|