summaryrefslogtreecommitdiffstats
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* NFS: Fix a typo in nfs_init_timeout_values()Trond Myklebust2019-05-041-1/+1
| | | | | | | | | | | | [ Upstream commit 5a698243930c441afccec04e4d5dc8febfd2b775 ] Specifying a retrans=0 mount parameter to a NFS/TCP mount, is inadvertently causing the NFS client to rewrite any specified timeout parameter to the default of 60 seconds. Fixes: a956beda19a6 ("NFS: Allow the mount option retrans=0") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin (Microsoft) <sashal@kernel.org>
* NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.Tetsuo Handa2019-05-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream. syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family (which is embedded into user-visible "struct nfs_mount_data" structure) despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6) bytes of AF_INET6 address to rpc_sockaddr2uaddr(). Since "struct nfs_mount_data" structure is user-visible, we can't change "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore, assuming that everybody is using AF_INET family when passing address via "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET. [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75c Reported-by: syzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1 don't free interrupted slot on openOlga Kornievskaia2019-04-031-1/+2
| | | | | | | | | | | | | | commit 0cb98abb5bd13b9a636bde603d952d722688b428 upstream. Allow the async rpc task for finish and update the open state if needed, then free the slot. Otherwise, the async rpc unable to decode the reply. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Fixes: ae55e59da0e4 ("pnfs: Don't release the sequence slot...") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1: Reinitialise sequence results before retransmitting a requestTrond Myklebust2019-03-231-4/+8
| | | | | | | | | | | | | commit c1dffe0bf7f9c3d57d9f237a7cb2a81e62babd2b upstream. If we have to retransmit a request, we should ensure that we reinitialise the sequence results structure, since in the event of a signal we need to treat the request as if it had not been sent. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()Trond Myklebust2019-03-231-1/+1
| | | | | | | | | | | | | commit 8127d82705998568b52ac724e28e00941538083d upstream. If the I/O completion failed with a fatal error, then we should just exit nfs_pageio_complete_mirror() rather than try to recoalesce. Fixes: a7d42ddb3099 ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix an I/O request leakage in nfs_do_recoalesceTrond Myklebust2019-03-231-1/+0
| | | | | | | | | | | | | commit 4d91969ed4dbcefd0e78f77494f0cb8fada9048a upstream. Whether we need to exit early, or just reprocess the list, we must not lost track of the request which failed to get recoalesced. Fixes: 03d5eb65b538 ("NFS: Fix a memory leak in nfs_do_recoalesce") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix I/O request leakagesTrond Myklebust2019-03-231-5/+21
| | | | | | | | | | | | | | | | | | | | | commit f57dcf4c72113c745d83f1c65f7291299f65c14f upstream. When we fail to add the request to the I/O queue, we currently leave it to the caller to free the failed request. However since some of the requests that fail are actually created by nfs_pageio_add_request() itself, and are not passed back the caller, this leads to a leakage issue, which can again cause page locks to leak. This commit addresses the leakage by freeing the created requests on error, using desc->pg_completion_ops->error_cleanup() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer") Cc: stable@vger.kernel.org # v4.0: c18b96a1b862: nfs: clean up rest of reqs Cc: stable@vger.kernel.org # v4.0: d600ad1f2bdb: NFS41: pop some layoutget Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* keys: Fix dependency loop between construction record and auth keyDavid Howells2019-03-231-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 822ad64d7e46a8e2c8b8a796738d7b657cbb146d ] In the request_key() upcall mechanism there's a dependency loop by which if a key type driver overrides the ->request_key hook and the userspace side manages to lose the authorisation key, the auth key and the internal construction record (struct key_construction) can keep each other pinned. Fix this by the following changes: (1) Killing off the construction record and using the auth key instead. (2) Including the operation name in the auth key payload and making the payload available outside of security/keys/. (3) The ->request_key hook is given the authkey instead of the cons record and operation name. Changes (2) and (3) allow the auth key to naturally be cleaned up if the keyring it is in is destroyed or cleared or the auth key is unlinked. Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <james.morris@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* NFS: Don't use page_file_mapping after removing the pageBenjamin Coddington2019-03-231-5/+6
| | | | | | | | | | | | | | [ Upstream commit d2ceb7e57086750ea6198a31fd942d98099a0786 ] If nfs_page_async_flush() removes the page from the mapping, then we can't use page_file_mapping() on it as nfs_updatepate() is wont to do when receiving an error. Instead, push the mapping to the stack before the page is possibly truncated. Fixes: 8fc75bed96bb ("NFS: Fix up return value on fatal errors in nfs_page_async_flush()") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nfs: Fix NULL pointer dereference of dev_nameYao Liu2019-03-131-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 80ff00172407e0aad4b10b94ef0816fc3e7813cb ] There is a NULL pointer dereference of dev_name in nfs_parse_devname() The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs] ... Call Trace: ? ida_alloc_range+0x34b/0x3d0 ? nfs_clone_super+0x80/0x80 [nfs] ? nfs_free_parsed_mount_data+0x60/0x60 [nfs] mount_fs+0x52/0x170 ? __init_waitqueue_head+0x3b/0x50 vfs_kern_mount+0x6b/0x170 do_mount+0x216/0xdc0 ksys_mount+0x83/0xd0 __x64_sys_mount+0x25/0x30 do_syscall_64+0x65/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix this by adding a NULL check on dev_name Signed-off-by: Yao Liu <yotta.liu@ucloud.cn> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* NFS: nfs_compare_mount_options always compare auth flavors.Chris Perl2019-02-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 594d1644cd59447f4fceb592448d5cd09eb09b5e ] This patch removes the check from nfs_compare_mount_options to see if a `sec' option was passed for the current mount before comparing auth flavors and instead just always compares auth flavors. Consider the following scenario: You have a server with the address 192.168.1.1 and two exports /export/a and /export/b. The first export supports `sys' and `krb5' security, the second just `sys'. Assume you start with no mounts from the server. The following results in EIOs being returned as the kernel nfs client incorrectly thinks it can share the underlying `struct nfs_server's: $ mkdir /tmp/{a,b} $ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a $ sudo mount -t nfs -o vers=3 192.168.1.1:/export/b /tmp/b $ df >/dev/null df: ‘/tmp/b’: Input/output error Signed-off-by: Chris Perl <cperl@janestreet.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* NFS: Fix up return value on fatal errors in nfs_page_async_flush()Trond Myklebust2019-02-061-4/+5
| | | | | | | | | | | | | | | commit 8fc75bed96bb94e23ca51bd9be4daf65c57697bf upstream. Ensure that we return the fatal error value that caused us to exit nfs_page_async_flush(). Fixes: c373fff7bd25 ("NFSv4: Don't special case "launder"") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.12+ Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: fix a deadlock in nfs client initializationScott Mayhew2019-01-262-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c156618e15101a9cc8c815108fec0300a0ec6637 upstream. The following deadlock can occur between a process waiting for a client to initialize in while walking the client list during nfsv4 server trunking detection and another process waiting for the nfs_clid_init_mutex so it can initialize that client: Process 1 Process 2 --------- --------- spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTA->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); spin_lock(&nn->nfs_client_lock); list_add_tail(&CLIENTB->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); mutex_lock(&nfs_clid_init_mutex); nfs41_walk_client_list(clp, result, cred); nfs_wait_client_init_complete(CLIENTA); (waiting for nfs_clid_init_mutex) Make sure nfs_match_client() only evaluates clients that have completed initialization in order to prevent that deadlock. This patch also fixes v4.0 trunking behavior by not marking the client NFS_CS_READY until the clientid has been confirmed. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Qian Lu <luqia@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: don't dirty kernel pages read by direct-ioDave Kleikamp2018-12-211-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ad3cba223ac02dc769c3bbe88efe277bbb457566 ] When we use direct_IO with an NFS backing store, we can trigger a WARNING in __set_page_dirty(), as below, since we're dirtying the page unnecessarily in nfs_direct_read_completion(). To fix, replicate the logic in commit 53cbf3b157a0 ("fs: direct-io: don't dirtying pages for ITER_BVEC/ITER_KVEC direct read"). Other filesystems that implement direct_IO handle this; most use blockdev_direct_IO(). ceph and cifs have similar logic. mount 127.0.0.1:/export /nfs dd if=/dev/zero of=/nfs/image bs=1M count=200 losetup --direct-io=on -f /nfs/image mkfs.btrfs /dev/loop0 mount -t btrfs /dev/loop0 /mnt/ kernel: WARNING: CPU: 0 PID: 8067 at fs/buffer.c:580 __set_page_dirty+0xaf/0xd0 kernel: Modules linked in: loop(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) fuse(E) tun(E) ip6t_rpfilter(E) ipt_REJECT(E) nf_ kernel: snd_seq(E) snd_seq_device(E) snd_pcm(E) video(E) snd_timer(E) snd(E) soundcore(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) cdrom(E) ata_generic(E) pata_acpi(E) crc32c_intel(E) ahci(E) li kernel: CPU: 0 PID: 8067 Comm: kworker/0:2 Tainted: G E 4.20.0-rc1.master.20181111.ol7.x86_64 #1 kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 kernel: Workqueue: nfsiod rpc_async_release [sunrpc] kernel: RIP: 0010:__set_page_dirty+0xaf/0xd0 kernel: Code: c3 48 8b 02 f6 c4 04 74 d4 48 89 df e8 ba 05 f7 ff 48 89 c6 eb cb 48 8b 43 08 a8 01 75 1f 48 89 d8 48 8b 00 a8 04 74 02 eb 87 <0f> 0b eb 83 48 83 e8 01 eb 9f 48 83 ea 01 0f 1f 00 eb 8b 48 83 e8 kernel: RSP: 0000:ffffc1c8825b7d78 EFLAGS: 00013046 kernel: RAX: 000fffffc0020089 RBX: fffff2b603308b80 RCX: 0000000000000001 kernel: RDX: 0000000000000001 RSI: ffff9d11478115c8 RDI: ffff9d11478115d0 kernel: RBP: ffffc1c8825b7da0 R08: 0000646f6973666e R09: 8080808080808080 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9d11478115d0 kernel: R13: ffff9d11478115c8 R14: 0000000000003246 R15: 0000000000000001 kernel: FS: 0000000000000000(0000) GS:ffff9d115ba00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00007f408686f640 CR3: 0000000104d8e004 CR4: 00000000000606f0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: __set_page_dirty_buffers+0xb6/0x110 kernel: set_page_dirty+0x52/0xb0 kernel: nfs_direct_read_completion+0xc4/0x120 [nfs] kernel: nfs_pgio_release+0x10/0x20 [nfs] kernel: rpc_free_task+0x30/0x70 [sunrpc] kernel: rpc_async_release+0x12/0x20 [sunrpc] kernel: process_one_work+0x174/0x390 kernel: worker_thread+0x4f/0x3e0 kernel: kthread+0x102/0x140 kernel: ? drain_workqueue+0x130/0x130 kernel: ? kthread_stop+0x110/0x110 kernel: ret_from_fork+0x35/0x40 kernel: ---[ end trace 01341980905412c9 ]--- Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> [forward-ported to v4.20] Signed-off-by: Calum Mackay <calum.mackay@oracle.com> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* flexfiles: enforce per-mirror stateid only for v4 DSesTigran Mkrtchyan2018-12-171-2/+4
| | | | | | | | | | | | commit 320f35b7bf8cccf1997ca3126843535e1b95e9c4 upstream. Since commit bb21ce0ad227 we always enforce per-mirror stateid. However, this makes sense only for v4+ servers. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* flexfiles: use per-mirror specified stateid for IOTigran Mkrtchyan2018-12-133-12/+32
| | | | | | | | | | | | | | | | | | | | [ Upstream commit bb21ce0ad227b69ec0f83279297ee44232105d96 ] rfc8435 says: For tight coupling, ffds_stateid provides the stateid to be used by the client to access the file. However current implementation replaces per-mirror provided stateid with by open or lock stateid. Ensure that per-mirror stateid is used by ff_layout_write_prepare_v4 and nfs4_ff_layout_prepare_ds. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Rick Macklem <rmacklem@uoguelph.ca> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nfs: Fix a missed page unlock after pg_doio()Benjamin Coddington2018-11-131-19/+21
| | | | | | | | | | | | | | commit fdbd1a2e4a71adcb1ae219fcfd964930d77a7f84 upstream. We must check pg_error and call error_cleanup after any call to pg_doio. Currently, we are skipping the unlock of a page if we encounter an error in nfs_pageio_complete() before handing off the work to the RPC layer. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1: Fix the r/wsize checkingTrond Myklebust2018-11-131-7/+9
| | | | | | | | | | | | | | | | | | | | | | | commit 943cff67b842839f4f35364ba2db5c2d3f025d94 upstream. The intention of nfs4_session_set_rwsize() was to cap the r/wsize to the buffer sizes negotiated by the CREATE_SESSION. The initial code had a bug whereby we would not check the values negotiated by nfs_probe_fsinfo() (the assumption being that CREATE_SESSION will always negotiate buffer values that are sane w.r.t. the server's preferred r/wsizes) but would only check values set by the user in the 'mount' command. The code was changed in 4.11 to _always_ set the r/wsize, meaning that we now never use the server preferred r/wsizes. This is the regression that this patch fixes. Also rename the function to nfs4_session_limit_rwsize() in order to avoid future confusion. Fixes: 033853325fe3 (NFSv4.1 respect server's max size in CREATE_SESSION") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1 fix infinite loop on I/O.Trond Myklebust2018-09-262-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | commit 994b15b983a72e1148a173b61e5b279219bb45ae upstream. The previous fix broke recovery of delegated stateids because it assumes that if we did not mark the delegation as suspect, then the delegation has effectively been revoked, and so it removes that delegation irrespectively of whether or not it is valid and still in use. While this is "mostly harmless" for ordinary I/O, we've seen pNFS fail with LAYOUTGET spinning in an infinite loop while complaining that we're using an invalid stateid (in this case the all-zero stateid). What we rather want to do here is ensure that the delegation is always correctly marked as needing testing when that is the case. So we want to close the loophole offered by nfs4_schedule_stateid_recovery(), which marks the state as needing to be reclaimed, but not the delegation that may be backing it. Fixes: 0e3d3e5df07dc ("NFSv4.1 fix infinite loop on IO BAD_STATEID error") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1: Fix a potential layoutget/layoutrecall deadlockTrond Myklebust2018-09-191-2/+2
| | | | | | | | | | | | | | [ Upstream commit bd3d16a887b0c19a2a20d35ffed499e3a3637feb ] If the client is sending a layoutget, but the server issues a callback to recall what it thinks may be an outstanding layout, then we may find an uninitialised layout attached to the inode due to the layoutget. In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT rather than NFS4ERR_DELAY, as the latter can end up deadlocking. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.0 fix client reference leak in callbackOlga Kornievskaia2018-09-191-3/+8
| | | | | | | | | | | | | [ Upstream commit 32cd3ee511f4e07ca25d71163b50e704808d22f4 ] If there is an error during processing of a callback message, it leads to refrence leak on the client structure and eventually an unclean superblock. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Fix error handling in nfs4_sp4_select_mode()Wei Yongjun2018-09-151-1/+1
| | | | | | | | | | | | [ Upstream commit 72bf75cfc00c02aa66ef6133048f37aa5d88825c ] Error code is set in the error handling cases but never used. Fix it. Fixes: 937e3133cd0b ("NFSv4.1: Ensure we clear the SP4_MACH_CRED flags in nfs4_sp4_select_mode()") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()Trond Myklebust2018-09-091-4/+10
| | | | | | | | | | | | | | commit 8618289c46556fd4dd259a1af02ccc448032f48d upstream. We must drop the lock before we can sleep in referring_call_exists(). Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com> Fixes: 045d2a6d076a ("NFSv4.1: Delay callback processing...") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Fix locking in pnfs_generic_recover_commit_reqsTrond Myklebust2018-09-091-9/+7
| | | | | | | | | | | | | | | | commit d0fbb1d8a194c0ec0180c1d073ad709e45503a43 upstream. The use of the inode->i_lock was converted to a mutex, but we forgot to remove the old inode unlock/lock() pair that allowed the layout segment to be put inside the loop. Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com> Fixes: e824f99adaaf1 ("NFSv4: Use a mutex to protect the per-inode commit...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4 client live hangs after live data migration recoveryBill Baker2018-09-091-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0f90be132cbf1537d87a6a8b9e80867adac892f6 upstream. After a live data migration event at the NFS server, the client may send I/O requests to the wrong server, causing a live hang due to repeated recovery events. On the wire, this will appear as an I/O request failing with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly. NFS4ERR_BADSSESSION is returned because the session ID being used was issued by the other server and is not valid at the old server. The failure is caused by async worker threads having cached the transport (xprt) in the rpc_task structure. After the migration recovery completes, the task is redispatched and the task resends the request to the wrong server based on the old value still present in tk_xprt. The solution is to recompute the tk_xprt field of the rpc_task structure so that the request goes to the correct server. Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pnfs/blocklayout: off by one in bl_map_stripe()Dan Carpenter2018-09-091-1/+1
| | | | | | | | | | | | | | | | | | commit 0914bb965e38a055e9245637aed117efbe976e91 upstream. "dev->nr_children" is the number of children which were parsed successfully in bl_parse_stripe(). It could be all of them and then, in that case, it is equal to v->stripe.volumes_count. Either way, the > should be >= so that we don't go beyond the end of what we're supposed to. Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pNFS: Always free the session slot on error in nfs4_layoutget_handle_exceptionTrond Myklebust2018-08-241-7/+10
| | | | | | | | | | | | [ Upstream commit 2dbf8dffbf35fd8f611083b9d9fe74fdccf912a3 ] Right now, we can call nfs_commit_inode() while holding the session slot, which could lead to NFSv4 deadlocks. Ensure we only keep the slot if the server returned a layout that we have to process. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pnfs: Don't release the sequence slot until we've processed layoutget on openTrond Myklebust2018-08-031-1/+2
| | | | | | | | | | | | | [ Upstream commit ae55e59da0e401893b3c52b575fc18a00623d0a1 ] If the server recalls the layout that was just handed out, we risk hitting a race as described in RFC5661 Section 2.10.6.3 unless we ensure that we release the sequence slot after processing the LAYOUTGET operation that was sent as part of the OPEN compound. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRYTrond Myklebust2018-08-031-4/+7
| | | | | | | | | | | | [ Upstream commit f9312a541050007ec59eb0106273a0a10718cd83 ] If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP, then it thinks we're trying to replay an existing request. If so, then let's just bump the sequence ID and retry the operation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* skip LAYOUTRETURN if layout is invalidOlga Kornievskaia2018-08-031-2/+4
| | | | | | | | | | | | | | | | | | | [ Upstream commit 93b7f7ad2018d2037559b1d0892417864c78b371 ] Currently, when IO to DS fails, client returns the layout and retries against the MDS. However, then on umounting (inode eviction) it returns the layout again. This is because pnfs_return_layout() was changed in commit d78471d32bb6 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR") to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned the layout, it will be returned again. Instead, let's also check if we have already marked the layout invalid. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Fix a typo in nfs41_sequence_processTrond Myklebust2018-07-031-1/+1
| | | | | | | | | | | | | commit 995891006ccbb73c0c9c3923cf9d25c4d07ec16b upstream. We want to compare the slot_id to the highest slot number advertised by the server. Fixes: 3be0f80b5fe9c ("NFSv4.1: Fix up replays of interrupted requests") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Revert commit 5f83d86cf531d ("NFSv4.x: Fix wraparound issues..")Trond Myklebust2018-07-031-5/+2
| | | | | | | | | | | | | | commit fc40724fc6731d90cc7fb6d62d66135f85a33dd2 upstream. The correct behaviour for NFSv4 sequence IDs is to wrap around to the value 0 after 0xffffffff. See https://tools.ietf.org/html/rfc5661#section-2.10.6.1 Fixes: 5f83d86cf531d ("NFSv4.x: Fix wraparound issues when validing...") Cc: stable@vger.kernel.org # 4.6+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_messageDave Wysochanski2018-07-031-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d68894800ec5712d7ddf042356f11e36f87d7f78 upstream. In nfs_idmap_read_and_verify_message there is an incorrect sprintf '%d' that converts the __u32 'im_id' from struct idmap_msg to 'id_str', which is a stack char array variable of length NFS_UINT_MAXLEN == 11. If a uid or gid value is > 2147483647 = 0x7fffffff, the conversion overflows into a negative value, for example: crash> p (unsigned) (0x80000000) $1 = 2147483648 crash> p (signed) (0x80000000) $2 = -2147483648 The '-' sign is written to the buffer and this causes a 1 byte overflow when the NULL byte is written, which corrupts kernel stack memory. If CONFIG_CC_STACKPROTECTOR_STRONG is set we see a stack-protector panic: [11558053.616565] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa05b8a8c [11558053.639063] CPU: 6 PID: 9423 Comm: rpc.idmapd Tainted: G W ------------ T 3.10.0-514.el7.x86_64 #1 [11558053.641990] Hardware name: Red Hat OpenStack Compute, BIOS 1.10.2-3.el7_4.1 04/01/2014 [11558053.644462] ffffffff818c7bc0 00000000b1f3aec1 ffff880de0f9bd48 ffffffff81685eac [11558053.646430] ffff880de0f9bdc8 ffffffff8167f2b3 ffffffff00000010 ffff880de0f9bdd8 [11558053.648313] ffff880de0f9bd78 00000000b1f3aec1 ffffffff811dcb03 ffffffffa05b8a8c [11558053.650107] Call Trace: [11558053.651347] [<ffffffff81685eac>] dump_stack+0x19/0x1b [11558053.653013] [<ffffffff8167f2b3>] panic+0xe3/0x1f2 [11558053.666240] [<ffffffff811dcb03>] ? kfree+0x103/0x140 [11558053.682589] [<ffffffffa05b8a8c>] ? idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.689710] [<ffffffff810855db>] __stack_chk_fail+0x1b/0x30 [11558053.691619] [<ffffffffa05b8a8c>] idmap_pipe_downcall+0x1cc/0x1e0 [nfsv4] [11558053.693867] [<ffffffffa00209d6>] rpc_pipe_write+0x56/0x70 [sunrpc] [11558053.695763] [<ffffffff811fe12d>] vfs_write+0xbd/0x1e0 [11558053.702236] [<ffffffff810acccc>] ? task_work_run+0xac/0xe0 [11558053.704215] [<ffffffff811fec4f>] SyS_write+0x7f/0xe0 [11558053.709674] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b Fix this by calling the internally defined nfs_map_numeric_to_string() function which properly uses '%u' to convert this __u32. For consistency, also replace the one other place where snprintf is called. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Stephen Johnston <sjohnsto@redhat.com> Fixes: cf4ab538f1516 ("NFSv4: Fix the string length returned by the idmapper") Cc: stable@vger.kernel.org # v3.4+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4.1: Fix up replays of interrupted requestsTrond Myklebust2018-06-262-47/+103
| | | | | | | | | | | | | | | | | | | | | | | | commit 3be0f80b5fe9c16eca2d538f799b94ca8aa59433 upstream. If the previous request on a slot was interrupted before it was processed by the server, then our slot sequence number may be out of whack, and so we try the next operation using the old sequence number. The problem with this, is that not all servers check to see that the client is replaying the same operations as previously when they decide to go to the replay cache, and so instead of the expected error of NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could (if the operations match up) be mistaken by the client for a new reply. To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op in order to resync our slot sequence number. Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com> [olga.kornievskaia@gmail.com: fix an Oops] Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: system crashes after NFS4ERR_MOVED recoveryBill.Baker@oracle.com2018-05-301-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit ad86f605c59500da82d196ac312cfbac3daba31d ] nfs4_update_server unconditionally releases the nfs_client for the source server. If migration fails, this can cause the source server's nfs_client struct to be left with a low reference count, resulting in use-after-free. Also, adjust reference count handling for ELOOP. NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6 WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]() nfs_put_client+0xfa/0x110 [nfs] nfs4_run_state_manager+0x30/0x40 [nfsv4] kthread+0xd8/0xf0 BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 nfs4_xdr_enc_write+0x6b/0x160 [nfsv4] rpcauth_wrap_req+0xac/0xf0 [sunrpc] call_transmit+0x18c/0x2c0 [sunrpc] __rpc_execute+0xa6/0x490 [sunrpc] rpc_async_schedule+0x15/0x20 [sunrpc] process_one_work+0x160/0x470 worker_thread+0x112/0x540 ? rescuer_thread+0x3f0/0x3f0 kthread+0xd8/0xf0 This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"), but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops") Reported-by: Helen Chao <helen.chao@oracle.com> Fixes: 52442f9b11b7 ("NFS4: Avoid migration loops") Signed-off-by: Bill Baker <bill.baker@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: Do not convert nfs_idmap_cache_timeout to jiffiesJan Chochol2018-04-261-1/+1
| | | | | | | | | | | | | | | | | | | [ Upstream commit cbebc6ef4fc830f4040d4140bf53484812d5d5d9 ] Since commit 57e62324e469 ("NFS: Store the legacy idmapper result in the keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds. Unfortunately sysctl interface was not updated accordingly. As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some value will incorrectly multiply this value by HZ. Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value divided by HZ. Fixes: 57e62324e469 ("NFS: Store the legacy idmapper result in the keyring") Signed-off-by: Jan Chochol <jan@chochol.info> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFSv4: always set NFS_LOCK_LOST when a lock is lost.NeilBrown2018-04-262-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit dce2630c7da73b0634686bca557cc8945cc450c8 ] There are 2 comments in the NFSv4 code which suggest that SIGLOST should possibly be sent to a process. In these cases a lock has been lost. The current practice is to set NFS_LOCK_LOST so that read/write returns EIO when a lock is lost. So change these comments to code when sets NFS_LOCK_LOST. One case is when lock recovery after apparent server restart fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock attempt as part of lease recovery fails with NFS4ERR_DENIED. In an ideal world, these should not happen. However I have a packet trace showing an NFSv4.1 session getting NFS4ERR_BADSESSION after an extended network parition. The NFSv4.1 client treats this like server reboot until/unless it get NFS4ERR_NO_GRACE, in which case it switches over to "nograce" recovery mode. In this network trace, the client attempts to recover a lock and the server (incorrectly) reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This leads to the ineffective comment and the client then continues to write using the OPEN stateid. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fs: Teach path_connected to handle nfs filesystems with multiple roots.Eric W. Biederman2018-03-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 95dd77580ccd66a0da96e6d4696945b8cea39431 upstream. On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix unstable write completionTrond Myklebust2018-03-151-40/+43
| | | | | | | | | | | | | | | | | | | | | | | commit c4f24df942a181699c5bab01b8e5e82b925f77f3 upstream. We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to ensure that all outstanding COMMIT requests to the inode in question are complete. Currently we may exit early from both nfs_commit_inode() and nfs_write_inode() even if there are COMMIT requests in flight, or unstable writes on the commit list. In order to get the right semantics w.r.t. sync_inode(), we don't need to have nfs_commit_inode() reset the inode dirty flags when called from nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that nfs_write_inode() leaves them in the right state if there are outstanding commits, or stable pages. Reported-by: Scott Mayhew <smayhew@redhat.com> Fixes: dc4fd9ab01ab ("nfs: don't wait on commit in nfs_commit_inode()...") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* pNFS: Prevent the layout header refcount going to zero in pnfs_roc()Trond Myklebust2018-03-151-3/+10
| | | | | | | | | | | | | | | commit 9c6376ebddad585da4238532dd6d90ae23ffee67 upstream. Ensure that we hold a reference to the layout header when processing the pNFS return-on-close so that the refcount value does not inadvertently go to zero. Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: stable@vger.kernel.org # v4.10+ Tested-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix an incorrect type in struct nfs_direct_reqTrond Myklebust2018-03-151-1/+1
| | | | | | | | | | | | commit d9ee65539d3eabd9ade46cca1780e3309ad0f907 upstream. The start offset needs to be of type loff_t. Fixed: 5fadeb47dcc5c ("nfs: count DIO good bytes correctly with mirroring") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix a race between mmap() and O_DIRECTTrond Myklebust2018-02-161-1/+1
| | | | | | | | | | | | | commit e231c6879cfd44e4fffd384bb6dd7d313249a523 upstream. When locking the file in order to do O_DIRECT on it, we must unmap any mmapped ranges on the pagecache so that we can flush out the dirty data. Fixes: a5864c999de67 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: reject request for id_legacy key without auxdataEric Biggers2018-02-161-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 49686cbbb3ebafe42e63868222f269d8053ead00 upstream. nfs_idmap_legacy_upcall() is supposed to be called with 'aux' pointing to a 'struct idmap', via the call to request_key_with_auxdata() in nfs_idmap_request_key(). However it can also be reached via the request_key() system call in which case 'aux' will be NULL, causing a NULL pointer dereference in nfs_idmap_prepare_pipe_upcall(), assuming that the key description is valid enough to get that far. Fix this by making nfs_idmap_legacy_upcall() negate the key if no auxdata is provided. As usual, this bug was found by syzkaller. A simple reproducer using the command-line keyctl program is: keyctl request2 id_legacy uid:0 '' @s Fixes: 57e62324e469 ("NFS: Store the legacy idmapper result in the keyring") Reported-by: syzbot+5dfdbcf7b3eb5912abbb@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: commit direct writes even if they fail partiallyJ. Bruce Fields2018-02-161-3/+1
| | | | | | | | | | | | | | | | | | commit 1b8d97b0a837beaf48a8449955b52c650a7114b4 upstream. If some of the WRITE calls making up an O_DIRECT write syscall fail, we neglect to commit, even if some of the WRITEs succeed. We also depend on the commit code to free the reference count on the nfs_page taken in the "if (request_commit)" case at the end of nfs_direct_write_completion(). The problem was originally noticed because ENOSPC's encountered partway through a write would result in a closed file being sillyrenamed when it should have been unlinked. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix nfsstat breakage due to LOOKUPPTrond Myklebust2018-02-161-26/+38
| | | | | | | | | | | | | | | | commit 8634ef5e05311f32d7f2aee06f6b27a8834a3bd6 upstream. The LOOKUPP operation was inserted into the nfs4_procedures array rather than being appended, which put /proc/net/rpc/nfs out of whack, and broke the nfsstat utility. Fix by moving the LOOKUPP operation to the end of the array, and by ensuring that it keeps the same length whether or not NFSV4.1 and NFSv4.2 are compiled in. Fixes: 5b5faaf6df734 ("nfs4: add NFSv4 LOOKUPP handlers") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Add a cond_resched() to nfs_commit_release_pages()Trond Myklebust2018-02-161-0/+2
| | | | | | | | | | | | | commit 7f1bda447c9bd48b415acedba6b830f61591601f upstream. The commit list can get very large, and so we need a cond_resched() in nfs_commit_release_pages() in order to ensure we don't hog the CPU for excessive periods of time. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs41: do not return ENOMEM on LAYOUTUNAVAILABLETigran Mkrtchyan2018-02-161-3/+1
| | | | | | | | | | | | | | | | | | | | | | commit 7ff4cff637aa0bd2abbd81f53b2a6206c50afd95 upstream. A pNFS server may return LAYOUTUNAVAILABLE error on LAYOUTGET for files which don't have any layout. In this situation pnfs_update_layout currently returns NULL. As this NULL is converted into ENOMEM, IO requests fails instead of falling back to MDS. Do not return ENOMEM on LAYOUTUNAVAILABLE and let client retry through MDS. Fixes 8d40b0f14846f. I will suggest to backport this fix to affected stable branches. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> [trondmy: Use IS_ERR_OR_NULL()] Fixes: 8d40b0f14846 ("NFS filelayout:call GETDEVICEINFO after...") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mdsScott Mayhew2018-02-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ba4a76f703ab7eb72941fdaac848502073d6e9ee upstream. Currently when falling back to doing I/O through the MDS (via pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header without releasing the reference taken on the dreq via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init -> nfs_direct_pgio_init. It then takes another reference on the dreq via nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and as a result the requester will become stuck in inode_dio_wait. Once that happens, other processes accessing the inode will become stuck as well. Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean up correctly by calling hdr->completion_ops->completion() instead of calling hdr->release() directly. This can be reproduced (sometimes) by performing "storage failover takeover" commands on NetApp filer while doing direct I/O from a client. This can also be reproduced using SystemTap to simulate a failure while doing direct I/O from a client (from Dave Wysochanski <dwysocha@redhat.com>): stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }' Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* nfs: don't wait on commit in nfs_commit_inode() if there were no commit requestsScott Mayhew2017-12-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit dc4fd9ab01ab379ae5af522b3efd4187a7c30a31 upstream. If there were no commit requests, then nfs_commit_inode() should not wait on the commit or mark the inode dirty, otherwise the following BUG_ON can be triggered: [ 1917.130762] kernel BUG at fs/inode.c:578! [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1] [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1 [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000 [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80 [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64) [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 22002822 XER: 20000000 [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1 GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750 GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03 GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8 GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0 GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8 [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350 [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350 [ 1917.130873] Call Trace: [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable) [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270 [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0 [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs] [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0 [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180 [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150 [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100 [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60 [ 1917.130919] Instruction dump: [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0 [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134 Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* NFS: Fix a typo in nfs_rename()Trond Myklebust2017-12-141-1/+1
| | | | | | | | | | | | [ Upstream commit d803224c84be067754db7fa58a93f36f61566493 ] On successful rename, the "old_dentry" is retained and is attached to the "new_dir", so we need to call nfs_set_verifier() accordingly. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>