| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 64a93dbf25d3a1368bb58ddf0f61d0a92d7479e3 ]
Partially revert commit 2ce209c42c01 ("NFS: Wait for requests that are
locked on the commit list"), since it can lead to deadlocks between
commit requests and nfs_join_page_group().
For now we should assume that any locked requests on the commit list are
either about to be removed and committed by another task, or the writes
they describe are about to be retransmitted. In either case, we should
not need to worry.
Fixes: 2ce209c42c01 ("NFS: Wait for requests that are locked on the commit list")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 08ca8b21f760c0ed5034a5c122092eec22ccf8f4 ]
When a subrequest is being detached from the subgroup, we want to
ensure that it is not holding the group lock, or in the process
of waiting for the group lock.
Fixes: 5b2b5187fa85 ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit add42de31721fa29ed77a7ce388674d69f9d31a4 upstream.
When we detach a subrequest from the list, we must also release the
reference it holds to the parent.
Fixes: 5b2b5187fa85 ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 221203ce6406273cf00e5c6397257d986c003ee6 upstream.
Instead of making assumptions about the commit verifier contents, change
the commit code to ensure we always check that the verifier was set
by the XDR code.
Fixes: f54bcf2ecee9 ("pnfs: Prepare for flexfiles by pulling out common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 0df68ced55443243951d02cc497be31fadf28173 upstream.
If we suffer a fatal error upon writing a file, which causes us to
need to revalidate the entire mapping, then we should also revalidate
the file size.
Fixes: d2ceb7e57086 ("NFS: Don't use page_file_mapping after removing the page")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 14bebe3c90b326d2a0df78aed5e9de090c71d878 ]
When flushing out dirty pages, the fact that we may hit fatal errors
is not a reason to stop writeback. Those errors are reported through
fsync(), not through the flush mechanism.
Fixes: a6598813a4c5b ("NFS: Don't write back further requests if there...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 33ea5aaa87cdae0f9af4d6b7ee4f650a1a36fd1d ]
When xfstests testing, there are some WARNING as below:
WARNING: CPU: 0 PID: 6235 at fs/nfs/inode.c:122 nfs_clear_inode+0x9c/0xd8
Modules linked in:
CPU: 0 PID: 6235 Comm: umount.nfs
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : nfs_clear_inode+0x9c/0xd8
lr : nfs_evict_inode+0x60/0x78
sp : fffffc000f68fc00
x29: fffffc000f68fc00 x28: fffffe00c53155c0
x27: fffffe00c5315000 x26: fffffc0009a63748
x25: fffffc000f68fd18 x24: fffffc000bfaaf40
x23: fffffc000936d3c0 x22: fffffe00c4ff5e20
x21: fffffc000bfaaf40 x20: fffffe00c4ff5d10
x19: fffffc000c056000 x18: 000000000000003c
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000040 x14: 0000000000000228
x13: fffffc000c3a2000 x12: 0000000000000045
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 0000000000000000 x6 : fffffc00084b027c
x5 : fffffc0009a64000 x4 : fffffe00c0e77400
x3 : fffffc000c0563a8 x2 : fffffffffffffffb
x1 : 000000000000764e x0 : 0000000000000001
Call trace:
nfs_clear_inode+0x9c/0xd8
nfs_evict_inode+0x60/0x78
evict+0x108/0x380
dispose_list+0x70/0xa0
evict_inodes+0x194/0x210
generic_shutdown_super+0xb0/0x220
nfs_kill_super+0x40/0x88
deactivate_locked_super+0xb4/0x120
deactivate_super+0x144/0x160
cleanup_mnt+0x98/0x148
__cleanup_mnt+0x38/0x50
task_work_run+0x114/0x160
do_notify_resume+0x2f8/0x308
work_pending+0x8/0x14
The nrequest should be increased/decreased only if PG_INODE_REF flag
was setted.
But in the nfs_inode_remove_request function, it maybe decrease when
no PG_INODE_REF flag, this maybe lead nrequests count error.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit df3accb849607a86278a37c35e6b313635ccc48b ]
Allow the caller to pass error information when cleaning up a failed
I/O request so that we can conditionally take action to cancel the
request altogether if the error turned out to be fatal.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d2ceb7e57086750ea6198a31fd942d98099a0786 ]
If nfs_page_async_flush() removes the page from the mapping, then we can't
use page_file_mapping() on it as nfs_updatepate() is wont to do when
receiving an error. Instead, push the mapping to the stack before the page
is possibly truncated.
Fixes: 8fc75bed96bb ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 8fc75bed96bb94e23ca51bd9be4daf65c57697bf upstream.
Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().
Fixes: c373fff7bd25 ("NFSv4: Don't special case "launder"")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
| |
If the writes are being rescheduled due to a pNFS error, then we really
want to immediately start a new flush. The O_DIRECT code already does
this, so we only need to worry about buffered writes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
|
|
|
|
|
|
| |
Rather than doing this in the generic NFS client code. Let's put this
with the other v4 stuff so it's all in one place.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
|
|
|
|
|
|
| |
This doesn't really need to be in the generic NFS client code, and I
think it makes more sense to keep the v4 code in one place.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- xprtrdma: Fix corner cases when handling device removal # v4.12+
- xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+
Features:
- New sunrpc tracepoint for RPC pings
- Finer grained NFSv4 attribute checking
- Don't unnecessarily return NFS v4 delegations
Other bugfixes and cleanups:
- Several other small NFSoRDMA cleanups
- Improvements to the sunrpc RTT measurements
- A few sunrpc tracepoint cleanups
- Various fixes for NFS v4 lock notifications
- Various sunrpc and NFS v4 XDR encoding cleanups
- Switch to the ida_simple API
- Fix NFSv4.1 exclusive create
- Forget acl cache after setattr operation
- Don't advance the nfs_entry readdir cookie if xdr decoding fails"
* tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits)
NFS: advance nfs_entry cookie only after decoding completes successfully
NFSv3/acl: forget acl cache after setattr
NFSv4.1: Fix exclusive create
NFSv4: Declare the size up to date after it was set.
nfs: Use ida_simple API
NFSv4: Fix the nfs_inode_set_delegation() arguments
NFSv4: Clean up CB_GETATTR encoding
NFSv4: Don't ask for attributes when ACCESS is protected by a delegation
NFSv4: Add a helper to encode/decode struct timespec
NFSv4: Clean up encode_attrs
NFSv4; Clean up XDR encoding of type bitmap4
NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group
SUNRPC: Add a helper for encoding opaque data inline
SUNRPC: Add helpers for decoding opaque and string types
NFSv4: Ignore change attribute invalidations if we hold a delegation
NFS: More fine grained attribute tracking
NFS: Don't force unnecessary cache invalidation in nfs_update_inode()
NFS: Don't redirty the attribute cache in nfs_wcc_update_inode()
NFS: Don't force a revalidation of all attributes if change is missing
NFS: Convert NFS_INO_INVALID flags to unsigned long
...
|
| |
| |
| |
| |
| |
| |
| |
| | |
When we've changed the file size, then ensure we declare it to be
up to date in the inode attributes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by
a call to nfs_post_op_update_inode_locked(), then it will not be cleared
until all the attributes have been revalidated. This means, for instance,
that NFSv4 writes will always force a full attribute revalidation.
Track the ctime, mtime, size and change attribute separately from the
other attributes so that we can have nfs_post_op_update_inode_locked()
set them correctly, and later have the cache consistency bitmask be
able to clear them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull wait_var_event updates from Ingo Molnar:
"This introduces the new wait_var_event() API, which is a more flexible
waiting primitive than wait_on_atomic_t().
All wait_on_atomic_t() users are migrated over to the new API and
wait_on_atomic_t() is removed. The migration fixes one bug and should
result in no functional changes for the other usecases"
* 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/wait: Improve __var_waitqueue() code generation
sched/wait: Remove the wait_on_atomic_t() API
sched/wait, arch/mips: Fix and convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/btrfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, drivers/media: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait: Introduce wait_var_event()
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
wait_var_event() API
The old wait_on_atomic_t() is going to get removed, use the more
flexible wait_var_event() API instead.
No change in functionality.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to
ensure that all outstanding COMMIT requests to the inode in question are
complete. Currently we may exit early from both nfs_commit_inode() and
nfs_write_inode() even if there are COMMIT requests in flight, or unstable
writes on the commit list.
In order to get the right semantics w.r.t. sync_inode(), we don't need
to have nfs_commit_inode() reset the inode dirty flags when called from
nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that
nfs_write_inode() leaves them in the right state if there are outstanding
commits, or stable pages.
Reported-by: Scott Mayhew <smayhew@redhat.com>
Fixes: dc4fd9ab01ab ("nfs: don't wait on commit in nfs_commit_inode()...")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable bugfixes:
- Fix breakages in the nfsstat utility due to the inclusion of the
NFSv4 LOOKUPP operation
- Fix a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall()
due to nfs_idmap_legacy_upcall() being called without an 'aux'
parameter
- Fix a refcount leak in the standard O_DIRECT error path
- Fix a refcount leak in the pNFS O_DIRECT fallback to MDS path
- Fix CPU latency issues with nfs_commit_release_pages()
- Fix the LAYOUTUNAVAILABLE error case in the file layout type
- NFS: Fix a race between mmap() and O_DIRECT
Features:
- Support the statx() mask and query flags to enable optimisations
when the user is requesting only attributes that are already up to
date in the inode cache, or is specifying the AT_STATX_DONT_SYNC
flag
- Add a module alias for the SCSI pNFS layout type
Bugfixes:
- Automounting when resolving a NFSv4 referral should preserve the
RDMA transport protocol settings
- Various other RDMA bugfixes from Chuck
- pNFS block layout fixes
- Always set NFS_LOCK_LOST when a lock is lost"
* tag 'nfs-for-4.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (69 commits)
NFS: Fix a race between mmap() and O_DIRECT
NFS: Remove a redundant call to unmap_mapping_range()
pnfs/blocklayout: Ensure disk address in block device map
pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectors
lockd: Fix server refcounting
SUNRPC: Fix null rpc_clnt dereference in rpc_task_queued tracepoint
SUNRPC: Micro-optimize __rpc_execute
SUNRPC: task_run_action should display tk_callback
sunrpc: Format RPC events consistently for display
SUNRPC: Trace xprt_timer events
xprtrdma: Correct some documenting comments
xprtrdma: Fix "bytes registered" accounting
xprtrdma: Instrument allocation/release of rpcrdma_req/rep objects
xprtrdma: Add trace points to instrument QP and CQ access upcalls
xprtrdma: Add trace points in the client-side backchannel code paths
xprtrdma: Add trace points for connect events
xprtrdma: Add trace points to instrument MR allocation and recovery
xprtrdma: Add trace points to instrument memory invalidation
xprtrdma: Add trace points in reply decoder path
xprtrdma: Add trace points to instrument memory registration
..
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The commit list can get very large, and so we need a cond_resched()
in nfs_commit_release_pages() in order to ensure we don't hog the CPU
for excessive periods of time.
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull inode->i_version rework from Jeff Layton:
"This pile of patches is a rework of the inode->i_version field. We
have traditionally incremented that field on every inode data or
metadata change. Typically this increment needs to be logged on disk
even when nothing else has changed, which is rather expensive.
It turns out though that none of the consumers of that field actually
require this behavior. The only real requirement for all of them is
that it be different iff the inode has changed since the last time the
field was checked.
Given that, we can optimize away most of the i_version increments and
avoid dirtying inode metadata when the only change is to the i_version
and no one is querying it. Queries of the i_version field are rather
rare, so we can help write performance under many common workloads.
This patch series converts existing accesses of the i_version field to
a new API, and then converts all of the in-kernel filesystems to use
it. The last patch in the series then converts the backend
implementation to a scheme that optimizes away a large portion of the
metadata updates when no one is looking at it.
In my own testing this series significantly helps performance with
small I/O sizes. I also got this email for Christmas this year from
the kernel test robot (a 244% r/w bandwidth improvement with XFS over
DAX, with 4k writes):
https://lkml.org/lkml/2017/12/25/8
A few of the earlier patches in this pile are also flowing to you via
other trees (mm, integrity, and nfsd trees in particular)".
* tag 'iversion-v4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: (22 commits)
fs: handle inode->i_version more efficiently
btrfs: only dirty the inode in btrfs_update_time if something was changed
xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing
fs: only set S_VERSION when updating times if necessary
IMA: switch IMA over to new i_version API
xfs: convert to new i_version API
ufs: use new i_version API
ocfs2: convert to new i_version API
nfsd: convert to new i_version API
nfs: convert to new i_version API
ext4: convert to new i_version API
ext2: convert to new i_version API
exofs: switch to new i_version API
btrfs: convert to new i_version API
afs: convert to new i_version API
affs: convert to new i_version API
fat: convert to new i_version API
fs: don't take the i_lock in inode_inc_iversion
fs: new API for handling inode->i_version
ntfs: remove i_version handling
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For NFS, we just use the "raw" API since the i_version is mostly
managed by the server. The exception there is when the client
holds a write delegation, but we only need to bump it once
there anyway to handle CB_GETATTR.
Tested-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If there were no commit requests, then nfs_commit_inode() should not
wait on the commit or mark the inode dirty, otherwise the following
BUG_ON can be triggered:
[ 1917.130762] kernel BUG at fs/inode.c:578!
[ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
[ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1
[ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000
[ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80
[ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64)
[ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000
[ 1917.130828] CFAR: c00000000011f594 SOFTE: 1
GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
[ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350
[ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350
[ 1917.130873] Call Trace:
[ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable)
[ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270
[ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0
[ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
[ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
[ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180
[ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150
[ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100
[ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
[ 1917.130919] Instruction dump:
[ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
[ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
|
|
|
|
|
|
|
|
| |
Add a jump target so that a bit of exception handling can be better reused
at the end of this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
They aren't used.
2/ Make nfs_context_set_write_error() a "static inline" in internal.h
so we can...
3/ Use nfs_context_set_write_error() instead of mapping_set_error()
if nfs_pageio_add_request() fails before sending any request.
NFS generally keeps errors in the open_context, not the mapping,
so this is more consistent.
4/ If filemap_write_and_write_range() reports any error, still
check ctx->error. The value in ctx->error is likely to be
more useful. As part of this, NFS_CONTEXT_ERROR_WRITE is
cleared slightly earlier, before nfs_file_fsync_commit() is called,
rather than at the start of that function.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tools like tcpdump and rpcdebug can be very useful. But there are
plenty of environments where they are difficult or impossible to
use. For example, we've had customers report I/O failures during
workloads so heavy that collecting network traffic or enabling
RPC debugging are themselves onerous.
The kernel's static tracepoints are lightweight (less likely to
introduce timing changes) and efficient (the trace data is compact).
They also work in scenarios where capturing network traffic is not
possible due to lack of hardware support (some InfiniBand HCAs) or
where data or network privacy is a concern.
Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT
is initiated, and when it completes. Record the arguments and
results of each operation, which are not shown by existing sunrpc
module's tracepoints.
For instance, the recorded offset and count can be used to match an
"initiate" event to a "done" event. If an NFS READ result returns
fewer bytes than requested or zero, seeing the EOF flag can be
probative. Seeing an NFS4ERR_BAD_STATEID result is also indication
of a particular class of problems. The timing information attached
to each event record can often be useful as well.
Usage example:
[root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done
/sys/kernel/debug/tracing/events/nfs/*initiate*/filter
/sys/kernel/debug/tracing/events/nfs/*done/filter
Hit Ctrl^C to stop recording
^CKernel buffer statistics:
Note: "entries" are the entries left in the kernel ring buffer and are not
recorded in the trace data. They should all be zero.
CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 3680
oldest event ts: 78.367422
now ts: 100.124419
dropped events: 0
read events: 74
... and so on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
|
|
|
|
| |
If we skip a subrequest due to a zero refcount, we should still count
the byte range that it covered so that we accurately reconstruct the
original request size.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
|
|
|
|
|
| |
That can deadlock if this is the last reference since
nfs_page_group_destroy() calls nfs_page_group_sync_on_bit().
Note that even if the page was removed from the subpage list,
the req->wb_head could still be pointing to the old head.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
|
|
|
| |
It's pretty much a duplicate of nfs_scan_commit_list() that also
clears the PG_COMMIT_TO_DS flag.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
|
|
|
|
|
| |
Since the commit list is not ordered, it is possible for nfs_scan_commit_list
to hold a request that nfs_lock_and_join_requests() is waiting for, while
at the same time trying to grab a request that nfs_lock_and_join_requests
already holds.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit fbe77c30e9ab ("NFS: move rw_mode to nfs_pageio_header")
reintroduced some pointless code that commit 518662e0fcb9 ("NFS: fix
usage of mempools.") had recently removed.
Remove it again.
Cc: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
Now that the mirror allocation has been moved, the parameter can go.
Also remove the redundant symbol export.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
If a request is on the commit list, but is locked, we will currently skip
it, which can lead to livelocking when the commit count doesn't reduce
to zero.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
Switch from using the inode->i_lock for this to avoid contention with
other metadata manipulation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
Rather than forcing us to take the inode->i_lock just in order to bump
the number.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
The commit lists can get very large, so using the inode->i_lock can
end up affecting general metadata performance.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Split out the 2 cases so that we can treat the locking differently.
The issue is that the locking in the pageswapcache cache is highly
linked to the commit list locking.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
Hide the locking from nfs_lock_and_join_requests() so that we can
separate out the requirements for swapcache pages.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fix up the test in nfs_page_group_covers_page(). The simplest implementation
is to check that we have a set of intersecting or contiguous subrequests
that connect page offset 0 to nfs_page_length(req->wb_page).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
nfs_page_group_lock() is now always called with the 'nonblock'
parameter set to 'false'.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
At this point, we only expect ever to potentially see PG_REMOVE and
PG_TEARDOWN being set on the subrequests.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since nfs_page_group_destroy() does not take any locks on the requests
to be freed, we need to ensure that we don't inadvertently free the
request in nfs_destroy_unlinked_subrequests() while the last reference
is being released elsewhere.
Do this by:
1) Taking a reference to the request unless it is already being freed
2) Checking (under the page group lock) if PG_TEARDOWN is already set before
freeing an unreferenced request in nfs_destroy_unlinked_subrequests()
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When locking the entire group in order to remove subrequests,
the locks are always taken in order, and with the page group
lock being taken after the page head is locked. The intention
is that:
1) The lock on the group head guarantees that requests may not
be removed from the group (although new entries could be appended
if we're not holding the group lock).
2) It is safe to drop and retake the page group lock while iterating
through the list, in particular when waiting for a subrequest lock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We should no longer need the inode->i_lock, now that we've
straightened out the request locking. The locking schema is now:
1) Lock page head request
2) Lock the page group
3) Lock the subrequests one by one
Note that there is a subtle race with nfs_inode_remove_request() due
to the fact that the latter does not lock the page head, when removing
it from the struct page. Only the last subrequest is locked, hence
we need to re-check that the PagePrivate(page) is still set after
we've locked all the subrequests.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| | |
nfs_try_to_update_request() should be able to cope now.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| | |
Simplify the code, and avoid some flushes to disk.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests()
manipulate the inode flags adjusting the NFS_I(inode)->nrequests.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|