summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | NFS append COMMIT after synchronous COPYOlga Kornievskaia2017-05-084-39/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of messing with the commit path which has been causing issues, add a COMMIT op after the COPY and ask for stable copies in the first space. It saves a round trip, since after the COPY, the client sends a COMMIT anyway. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFSv4: Fix exclusive create attributes encodingTrond Myklebust2017-05-081-40/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using NFS4_CREATE_EXCLUSIVE4_1 mode, the client will overestimate the amount of space that it needs for the attributes because it does so before checking whether or not the server supports a given attribute. Fix by checking the attribute mask earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFSv4: Fix an rcu lock leakTrond Myklebust2017-05-081-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intention in the original patch was to release the lock when we put the inode, however something got screwed up. Reported-by: Jason Yan <yanaijie@huawei.com> Fixes: 7b410d9ce460f ("pNFS: Delay getting the layout header in..") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | nfs: use kmap/kunmap directlyFabian Frederick2017-05-051-55/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes useless nfs_readdir_get_array() and nfs_readdir_release_array() as suggested by Trond Myklebust nfs_readdir() calls nfs_revalidate_mapping() before readdir_search_pagecache() , nfs_do_filldir(), uncached_readdir() so mapping should be correct. While kmap() can't fail, all subsequent error checks were removed as well as unused labels. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: always treat the invocation of nfs_getattr as cache hit when noac is onHou Tao2017-05-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using 'ls -l' to display a large directory, if noac option is used, in function nfs_getattr() nfs_need_revalidate_inode() will always be true for NFSv3 and the nfs_entry cache of the directory will be flushed. The flush will lead to a fully reread of the directory entries from server. To prevent the unnecessary RPCs, we need to check whether or not the noac option is used, and always report the invocation of nfs_getattr() as cache hit instead cache miss when it's on. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and ↵Dave Wysochanski2017-05-051-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfs4_proc_async_renew If memory allocation fails for the callback data, we need to put the nfs_client or we end up with an elevated refcount. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSIONTrond Myklebust2017-05-052-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we are trunking, then RECLAIM_COMPLETE must handle that by calling nfs4_schedule_session_recovery() and then retrying. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
| * | | | | | | | | pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commitsFred Isaman2017-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Fix a typo in pnfs_generic_alloc_ds_commitsTrond Myklebust2017-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the layout segment is invalid, we want to just resend the remaining writes. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Fix a deadlock when coalescing writes and returning the layoutTrond Myklebust2017-05-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following deadlock: Process P1 Process P2 Process P3 ========== ========== ========== lock_page(page) lseg = pnfs_update_layout(inode) lo = NFS_I(inode)->layout pnfs_error_mark_layout_for_return(lo) lock_page(page) lseg = pnfs_update_layout(inode) In this scenario, - P1 has declared the layout to be in error, but P2 holds a reference to a layout segment on that inode, so the layoutreturn is deferred. - P2 is waiting for a page lock held by P3. - P3 is asking for a new layout segment, but is blocked waiting for the layoutreturn. The fix is to ensure that pnfs_error_mark_layout_for_return() does not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow the latter to call LAYOUTGET so that it can make progress and unblock P2. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Don't clear the layout return info if there are segments to returnTrond Myklebust2017-05-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout return info if there are new segments queued for return due to, for instance, a race between a LAYOUTRETURN and a failed I/O attempt. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Ensure we commit the layout if it has been invalidatedTrond Myklebust2017-04-293-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the layout is being invalidated on the server, then we must invoke nfs_commit_inode() to ensure any commits to the DS get cleared out. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Don't send COMMITs to the DSes if the server invalidated our layoutTrond Myklebust2017-04-291-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the layout was invalidated, then assume we should requeue all the pending writes for the DS in question. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure pathTrond Myklebust2017-04-292-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the attempt to write through pNFS fails, we need to use the same failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS flag is set or we have sufficient valid DSes, then we must retry through pNFS Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Ensure we check layout validity before marking it for returnTrond Myklebust2017-04-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pnfs_error_mark_layout_for_return needs to check that the layout is valid before calling pnfs_set_plh_return_info(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS4.1 handle interrupted slot reuse from ERR_DELAYOlga Kornievskaia2017-04-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the RPC slot was interrupted and server replied to the next operation on the "reused" slot with ERR_DELAY, don't clear out the "interrupted" flag until we properly recover. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFSv4: check return value of xdr_inline_decodePan Bian2017-04-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function xdr_inline_decode() will return a NULL pointer if the input buffer does not have long enough buffer to decode nbytes of data. However, in function decode_op_map(), the return value of xdr_inline_decode() is not validated before it is used. This patch adds a check to the return value of xdr_inline_decode(). Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()Artem Savkov2017-04-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling pnfs_put_lset on an IS_ERR pointer results in a NULL pointer dereference like the one below. At the same time the check of retvalue of filelayout_check_deviceid() sets lseg to error, but does not free it before that. [ 3000.636161] BUG: unable to handle kernel NULL pointer dereference at 000000000000003c [ 3000.636970] IP: pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.637420] PGD 4f23b067 [ 3000.637421] PUD 4a0f4067 [ 3000.637679] PMD 0 [ 3000.637937] [ 3000.638287] Oops: 0000 [#1] SMP [ 3000.638591] Modules linked in: nfs_layout_nfsv41_files nfsv3 nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill rpcsec_gss_krb5 nfsv4 nfs fscache binfmt_misc arc4 md4 nls_utf8 cifs ccm dns_resolver rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr virtio_balloon ppdev virtio_rng parport_pc i2c_piix4 parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_intel ata_piix ttm libata drm serio_raw [ 3000.645245] i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: xt_u32] [ 3000.646360] CPU: 1 PID: 26402 Comm: date Not tainted 4.11.0-rc7.1.el7.test.x86_64 #1 [ 3000.647092] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 3000.647638] task: ffff8800415ada00 task.stack: ffffc90000ff0000 [ 3000.648207] RIP: 0010:pnfs_put_lseg+0x29/0x100 [nfsv4] [ 3000.648696] RSP: 0018:ffffc90000ff39b8 EFLAGS: 00010246 [ 3000.649193] RAX: 0000000000000000 RBX: fffffffffffffff4 RCX: 00000000000d43be [ 3000.649859] RDX: 00000000000d43bd RSI: 0000000000000000 RDI: fffffffffffffff4 [ 3000.650530] RBP: ffffc90000ff39d8 R08: 000000000001e320 R09: ffffffffa05c35ce [ 3000.651203] R10: ffff88007fd1e320 R11: ffffea0001283d80 R12: 0000000001400040 [ 3000.651875] R13: ffff88004f77d9f0 R14: ffffc90000ff3cd8 R15: ffff8800417ade00 [ 3000.652546] FS: 00007fac4d5cd740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 3000.653304] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3000.653849] CR2: 000000000000003c CR3: 000000004f080000 CR4: 00000000000406e0 [ 3000.654527] Call Trace: [ 3000.654771] fl_pnfs_update_layout.constprop.20+0x10c/0x150 [nfs_layout_nfsv41_files] [ 3000.655505] filelayout_pg_init_write+0x21d/0x270 [nfs_layout_nfsv41_files] [ 3000.656195] __nfs_pageio_add_request+0x11c/0x490 [nfs] [ 3000.656698] nfs_pageio_add_request+0xac/0x260 [nfs] [ 3000.657180] nfs_do_writepage+0x109/0x2e0 [nfs] [ 3000.657616] nfs_writepages_callback+0x16/0x30 [nfs] [ 3000.658096] write_cache_pages+0x26f/0x510 [ 3000.658495] ? nfs_do_writepage+0x2e0/0x2e0 [nfs] [ 3000.658946] ? _raw_spin_unlock_bh+0x1e/0x20 [ 3000.659357] ? wb_wakeup_delayed+0x5f/0x70 [ 3000.659748] ? __mark_inode_dirty+0x2eb/0x360 [ 3000.660170] nfs_writepages+0x84/0xd0 [nfs] [ 3000.660575] ? nfs_updatepage+0x571/0xb70 [nfs] [ 3000.661012] do_writepages+0x1e/0x30 [ 3000.661358] __filemap_fdatawrite_range+0xc6/0x100 [ 3000.661819] filemap_write_and_wait_range+0x41/0x90 [ 3000.662292] nfs_file_fsync+0x34/0x1f0 [nfs] [ 3000.662704] vfs_fsync_range+0x3d/0xb0 [ 3000.663065] vfs_fsync+0x1c/0x20 [ 3000.663385] nfs4_file_flush+0x57/0x80 [nfsv4] [ 3000.663813] filp_close+0x2f/0x70 [ 3000.664132] __close_fd+0x9a/0xc0 [ 3000.664453] SyS_close+0x23/0x50 [ 3000.664785] do_syscall_64+0x67/0x180 [ 3000.665162] entry_SYSCALL64_slow_path+0x25/0x25 [ 3000.665600] RIP: 0033:0x7fac4d0e1e90 [ 3000.665946] RSP: 002b:00007ffd54e90c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 3000.666679] RAX: ffffffffffffffda RBX: 00007fac4d3b5400 RCX: 00007fac4d0e1e90 [ 3000.667349] RDX: 0000000000000000 RSI: 00007fac4d5d9000 RDI: 0000000000000001 [ 3000.668031] RBP: 0000000000000000 R08: 00007fac4d3b6a00 R09: 00007fac4d5cd740 [ 3000.668709] R10: 00007ffd54e909e0 R11: 0000000000000246 R12: 0000000000000000 [ 3000.669385] R13: 00007fac4d3b5e80 R14: 0000000000000000 R15: 0000000000000000 [ 3000.670061] Code: 00 00 66 66 66 66 90 55 48 85 ff 48 89 e5 41 56 41 55 41 54 53 48 89 fb 0f 84 97 00 00 00 f6 05 16 8f bc ff 10 0f 85 a6 00 00 00 <4c> 8b 63 48 48 8d 7b 38 49 8b 84 24 90 00 00 00 4c 8d a8 88 00 [ 3000.671831] RIP: pnfs_put_lseg+0x29/0x100 [nfsv4] RSP: ffffc90000ff39b8 [ 3000.672462] CR2: 000000000000003c Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFSv4: Don't special case "launder"Trond Myklebust2017-04-262-17/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the client receives a fatal server error from nfs_pageio_add_request(), then we should always truncate the page on which the error occurred. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Add a few more fatal I/O errors to nfs_error_is_fatal()Trond Myklebust2017-04-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EACCES, EDQUOT, EFBIG and ESTALE are all fatal errors as far as NFS I/O is concerned. They need to be reported back to the application. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFSv3: nfs3_nlm_alloc_call should be declared staticTrond Myklebust2017-04-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix compiler warnings. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Don't write back further requests if there is a pending write errorTrond Myklebust2017-04-251-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the server has already returned a fatal write error that the user has not yet received on this file, then don't write back the other pages. Instead, act as if they have been sent, and have returned with the same error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Fix use after free issues in pnfs_do_read()Trond Myklebust2017-04-251-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assumption should be that if the caller returns PNFS_ATTEMPTED, then hdr has been consumed, and so we should not be testing hdr->task.tk_status. If the caller returns PNFS_TRY_AGAIN, then we need to recoalesce and free hdr. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Ensure we check layout segment validity in the pg_init() callbackTrond Myklebust2017-04-254-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a layout segment cached in pgio->pg_lseg, we should check it for validity before reusing it in a new RPC request. Otherwise, if we recoalesce, we can end up looping forever. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Always wait for I/O completion before unlockBenjamin Coddington2017-04-213-6/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS attempts to wait for read and write completion before unlocking in order to ensure that the data returned was protected by the lock. When this waiting is interrupted by a signal, the unlock may be skipped, and messages similar to the following are seen in the kernel ring buffer: [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3: [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183 [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185 For NFSv3, the missing unlock will cause the server to refuse conflicting locks indefinitely. For NFSv4, the leftover lock will be removed by the server after the lease timeout. This patch fixes this issue by skipping the usual wait in nfs_iocounter_wait if the FL_CLOSE flag is set when signaled. Instead, the wait happens in the unlock RPC task on the NFS UOC rpc_waitqueue. For NFSv3, use lockd's new nlmclnt_operations along with nfs_async_iocounter_wait to defer NLM's unlock task until the lock context's iocounter reaches zero. For NFSv4, call nfs_async_iocounter_wait() directly from unlock's current rpc_call_prepare. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | lockd: Introduce nlmclnt_operationsBenjamin Coddington2017-04-215-3/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS would enjoy the ability to modify the behavior of the NLM client's unlock RPC task in order to delay the transmission of the unlock until IO that was submitted under that lock has completed. This ability can ensure that the NLM client will always complete the transmission of an unlock even if the waiting caller has been interrupted with fatal signal. For this purpose, a pointer to a struct nlmclnt_operations can be assigned in a nfs_module's nfs_rpc_ops that will install those nlmclnt_operations on the nlm_host. The struct nlmclnt_operations defines three callback operations that will be used in a following patch: nlmclnt_alloc_call - used to call back after a successful allocation of a struct nlm_rqst in nlmclnt_proc(). nlmclnt_unlock_prepare - used to call back during NLM unlock's rpc_call_prepare. The NLM client defers calling rpc_call_start() until this callback returns false. nlmclnt_release_call - used to call back when the NLM client's struct nlm_rqst is freed. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Add an iocounter wait function for async RPC tasksBenjamin Coddington2017-04-212-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By sleeping on a new NFS Unlock-On-Close waitqueue, rpc tasks may wait for a lock context's iocounter to reach zero. The rpc waitqueue is only woken when the open_context has the NFS_CONTEXT_UNLOCK flag set in order to mitigate spurious wake-ups for any iocounter reaching zero. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | locks: Set FL_CLOSE when removing flock locks on close()Benjamin Coddington2017-04-212-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks. NFS will check for this flag to ensure an unlock is sent in a following patch. Fuse handles flock and posix locks differently for FL_CLOSE, and so requires a fixup to retain the existing behavior for flock. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Move the flock open mode check into nfs_flock()Benjamin Coddington2017-04-212-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need to check lock exclusive/shared types against open mode when flock() is used on NFS, so move it into the flock-specific path instead of checking it for all locks. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS4: remove a redundant lock range checkBenjamin Coddington2017-04-211-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | flock64_to_posix_lock() is already doing this check Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: unexport nfs4_pnfs_v3_ds_connect_unloadTrond Myklebust2017-04-201-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Unexport pnfs_put_lseg_locked and _pnfs_return_layoutTrond Myklebust2017-04-201-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They are not used outside the NFSv4 module. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS: Remove unused layout driver callbacksTrond Myklebust2017-04-202-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | encode_layoutreturn and encode_layoutcommit are now unused. Let's remove them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | nfs: remove the objlayout driverChristoph Hellwig2017-04-207-1990/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The objlayout code has been in the tree, but it's been unmaintained and no server product for it actually ever shipped. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | pNFS/flexfiles: Check the result of nfs4_pnfs_ds_connectTrond Myklebust2017-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check in nfs4_ff_layout_prepare_ds() seems to be missing. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: a33e4b036d461 ("pNFS: return status from nfs4_pnfs_ds_connect") Cc: Weston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # v4.11
| * | | | | | | | | NFSv4: Fix a hang in OPEN related to server rebootTrond Myklebust2017-04-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the server fails to return the attributes as part of an OPEN reply, and then reboots, we can end up hanging. The reason is that the client attempts to send a GETATTR in order to pick up the missing OPEN call, but fails to release the slot first, causing reboot recovery to deadlock. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Fixes: 2e80dbe7ac51a ("NFSv4.1: Close callback races for OPEN, LAYOUTGET...") Cc: stable@vger.kernel.org # v4.8+
| * | | | | | | | | NFS: move rw_mode to nfs_pageio_headerBenjamin Coddington2017-04-204-12/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare(). Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: move nfs_pgarray_set() to open codeBenjamin Coddington2017-04-201-20/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 00bfa30abe86 ("NFS: Create a common pgio_alloc and pgio_release function"), nfs_pgarray_set() has only a single caller. Let's open code it. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Use GFP_NOIO for two allocations in writebackBenjamin Coddington2017-04-201-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent a deadlock that can occur if we wait on allocations that try to write back our pages. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: 00bfa30abe869 ("NFS: Create a common pgio_alloc and pgio_release...") Cc: stable@vger.kernel.org # 3.16+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Fix use after free in write error pathFred Isaman2017-04-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Fred Isaman <fred.isaman@gmail.com> Fixes: 0bcbf039f6b2b ("nfs: handle request add failure properly") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()Benjamin Coddington2017-04-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a7d42ddb3099727f58366fa006f850a219cce6c8 ("nfs: add mirroring support to pgio layer") moved pg_cleanup out of the path when there was non-sequental I/O that needed to be flushed. The result is that for layouts that have more than one layout segment per file, the pg_lseg is not cleared, so we can end up hitting the WARN_ON_ONCE(req_start >= seg_end) in pnfs_generic_pg_test since the pg_lseg will be pointing to that previously-flushed layout segment. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: fix usage of mempools.NeilBrown2017-04-202-25/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When passed GFP flags that allow sleeping (such as GFP_NOIO), mempool_alloc() will never return NULL, it will wait until memory is available. This means that we don't need to handle failure, but that we do need to ensure one thread doesn't call mempool_alloc() twice on the one pool without queuing or freeing the first allocation. If multiple threads did this during times of high memory pressure, the pool could be exhausted and a deadlock could result. pnfs_generic_alloc_ds_commits() attempts to allocate from the nfs_commit_mempool while already holding an allocation from that pool. This is not safe. So change nfs_commitdata_alloc() to take a flag that indicates whether failure is acceptable. In pnfs_generic_alloc_ds_commits(), accept failure and handle it as we currently do. Else where, do not accept failure, and do not handle it. Even when failure is acceptable, we want to succeed if possible. That means both - using an entry from the pool if there is one - waiting for direct reclaim is there isn't. We call mempool_alloc(GFP_NOWAIT) to achieve the first, then kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the second. Each of these can fail, but together they do the best they can without blocking indefinitely. The objects returned by kmem_cache_alloc() will still be freed by mempool_free(). This is safe as mempool_alloc() uses exactly the same function to allocate objects (since the mempool was created with mempool_create_slab_pool()). The object returned by mempool_alloc() and kmem_cache_alloc() are indistinguishable so mempool_free() will handle both identically, either adding to the pool or calling kmem_cache_free(). Also, don't test for failure when allocating from nfs_wdata_mempool. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Clean up nfs4_proc_get_lease_time()Anna Schumaker2017-04-201-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Clean up _nfs4_proc_exchange_id()Anna Schumaker2017-04-201-15/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Clean up nfs4_proc_bind_one_conn_to_session()Anna Schumaker2017-04-201-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Returning errors directly even lets us remove the goto Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Remove extra dprintk()s from nfs4namespace.cAnna Schumaker2017-04-201-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Clean up nfs4_get_rootfh()Anna Schumaker2017-04-201-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Remove extra dprintk()s from nfs4client.cAnna Schumaker2017-04-201-52/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Clean up nfs4_init_server()Anna Schumaker2017-04-201-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * | | | | | | | | NFS: Clean up nfs4_set_client()Anna Schumaker2017-04-201-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we cut out the dprintk()s, then we can return error codes directly and cut out the goto. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>