summaryrefslogtreecommitdiffstats
path: root/net/sunrpc/cache.c
Commit message (Collapse)AuthorAgeFilesLines
* sunrpc: fix crash when cache_head become valid before updatePavel Tikhomirov2019-12-171-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5fcaf6982d1167f1cd9b264704f6d1ef4c505d54 ] I was investigating a crash in our Virtuozzo7 kernel which happened in in svcauth_unix_set_client. I found out that we access m_client field in ip_map structure, which was received from sunrpc_cache_lookup (we have a bit older kernel, now the code is in sunrpc_cache_add_entry), and these field looks uninitialized (m_client == 0x74 don't look like a pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID. It looks like the problem appeared from our previous fix to sunrpc (1): commit 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request") And we've also found a patch already fixing our patch (2): commit d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.") Though the crash is eliminated, I think the core of the problem is not completely fixed: Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before cache_fresh_locked which was added in (1) to fix crash. These way cache_is_valid won't say the cache is valid anymore and in svcauth_unix_set_client the function cache_check will return error instead of 0, and we don't count entry as initialized. But it looks like we need to remove cache_fresh_locked completely in sunrpc_cache_lookup: In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so that cache_requests with no readers also release corresponding cache_head, to fix their leak. We with Vasily were not sure if cache_fresh_locked and cache_fresh_unlocked should be used in pair or not, so we've guessed to use them in pair. Now we see that we don't want the CACHE_VALID bit set here by cache_fresh_locked, as "valid" means "initialized" and there is no initialization in sunrpc_cache_add_entry. Both expiry_time and last_refresh are not used in cache_fresh_unlocked code-path and also not required for the initial fix. So to conclude cache_fresh_locked was called by mistake, and we can just safely remove it instead of crutching it with CACHE_NEGATIVE. It looks ideologically better for me. Hope I don't miss something here. Here is our crash backtrace: [13108726.326291] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074 [13108726.326365] IP: [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc] [13108726.326448] PGD 0 [13108726.326468] Oops: 0002 [#1] SMP [13108726.326497] Modules linked in: nbd isofs xfs loop kpatch_cumulative_81_0_r1(O) xt_physdev nfnetlink_queue bluetooth rfkill ip6table_nat nf_nat_ipv6 ip_vs_wrr ip_vs_wlc ip_vs_sh nf_conntrack_netlink ip_vs_sed ip_vs_pe_sip nf_conntrack_sip ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh nf_nat_ftp nf_conntrack_ftp iptable_raw xt_recent nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_TCPMSS xt_tcpmss vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_NFLOG nfnetlink_log dummy xt_mark xt_REDIRECT nf_nat_redirect raw_diag udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag rpcsec_gss_krb5 xt_addrtype ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 ebtable_nat ebtable_broute nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_raw nfsv4 [13108726.327173] dns_resolver cls_u32 binfmt_misc arptable_filter arp_tables ip6table_filter ip6_tables devlink fuse_kio_pcs ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_wdog_tmo xt_multiport bonding xt_set xt_conntrack iptable_filter iptable_mangle kpatch(O) ebtable_filter ebt_among ebtables ip_set_hash_ip ip_set nfnetlink vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass fuse pcspkr ses enclosure joydev sg mei_me hpwdt hpilo lpc_ich mei ipmi_si shpchp ipmi_devintf ipmi_msghandler xt_ipvs acpi_power_meter ip_vs_rr nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache nf_nat cls_fw sch_htb sch_cbq sch_sfq ip_vs em_u32 nf_conntrack tun br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat [13108726.327817] ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper scsi_transport_iscsi 8021q syscopyarea sysfillrect garp sysimgblt fb_sys_fops mrp stp ttm llc bnx2x crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel drm dm_multipath ghash_clmulni_intel uas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd tg3 smartpqi scsi_transport_sas mdio libcrc32c i2c_core usb_storage ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kpatch_cumulative_82_0_r1] [13108726.328403] CPU: 35 PID: 63742 Comm: nfsd ve: 51332 Kdump: loaded Tainted: G W O ------------ 3.10.0-862.20.2.vz7.73.29 #1 73.29 [13108726.328491] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018 [13108726.328554] task: ffffa0a6a41b1160 ti: ffffa0c2a74bc000 task.ti: ffffa0c2a74bc000 [13108726.328610] RIP: 0010:[<ffffffffc01f79eb>] [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc] [13108726.328706] RSP: 0018:ffffa0c2a74bfd80 EFLAGS: 00010246 [13108726.328750] RAX: 0000000000000001 RBX: ffffa0a6183ae000 RCX: 0000000000000000 [13108726.328811] RDX: 0000000000000074 RSI: 0000000000000286 RDI: ffffa0c2a74bfcf0 [13108726.328864] RBP: ffffa0c2a74bfe00 R08: ffffa0bab8c22960 R09: 0000000000000001 [13108726.328916] R10: 0000000000000001 R11: 0000000000000001 R12: ffffa0a32aa7f000 [13108726.328969] R13: ffffa0a6183afac0 R14: ffffa0c233d88d00 R15: ffffa0c2a74bfdb4 [13108726.329022] FS: 0000000000000000(0000) GS:ffffa0e17f9c0000(0000) knlGS:0000000000000000 [13108726.329081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13108726.332311] CR2: 0000000000000074 CR3: 00000026a1b28000 CR4: 00000000007607e0 [13108726.334606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [13108726.336754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [13108726.338908] PKRU: 00000000 [13108726.341047] Call Trace: [13108726.343074] [<ffffffff8a2c78b4>] ? groups_alloc+0x34/0x110 [13108726.344837] [<ffffffffc01f5eb4>] svc_set_client+0x24/0x30 [sunrpc] [13108726.346631] [<ffffffffc01f2ac1>] svc_process_common+0x241/0x710 [sunrpc] [13108726.348332] [<ffffffffc01f3093>] svc_process+0x103/0x190 [sunrpc] [13108726.350016] [<ffffffffc07d605f>] nfsd+0xdf/0x150 [nfsd] [13108726.351735] [<ffffffffc07d5f80>] ? nfsd_destroy+0x80/0x80 [nfsd] [13108726.353459] [<ffffffff8a2bf741>] kthread+0xd1/0xe0 [13108726.355195] [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60 [13108726.356896] [<ffffffff8a9556dd>] ret_from_fork_nospec_begin+0x7/0x21 [13108726.358577] [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60 [13108726.360240] Code: 4c 8b 45 98 0f 8e 2e 01 00 00 83 f8 fe 0f 84 76 fe ff ff 85 c0 0f 85 2b 01 00 00 49 8b 50 40 b8 01 00 00 00 48 89 93 d0 1a 00 00 <f0> 0f c1 02 83 c0 01 83 f8 01 0f 8e 53 02 00 00 49 8b 44 24 38 [13108726.363769] RIP [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc] [13108726.365530] RSP <ffffa0c2a74bfd80> [13108726.367179] CR2: 0000000000000074 Fixes: d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* sunrpc: don't mark uninitialised items as VALID.NeilBrown2019-05-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d58431eacb226222430940134d97bfd72f292fcd upstream. A recent commit added a call to cache_fresh_locked() when an expired item was found. The call sets the CACHE_VALID flag, so it is important that the item actually is valid. There are two ways it could be valid: 1/ If ->update has been called to fill in relevant content 2/ if CACHE_NEGATIVE is set, to say that content doesn't exist. An expired item that is waiting for an update will be neither. Setting CACHE_VALID will mean that a subsequent call to cache_put() will be likely to dereference uninitialised pointers. So we must make sure the item is valid, and we already have code to do that in try_to_negate_entry(). This takes the hash lock and so cannot be used directly, so take out the two lines that we need and use them. Now cache_fresh_locked() is certain to be called only on a valid item. Cc: stable@kernel.org # 2.6.35 Fixes: 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued request") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sunrpc: fix cache_head leak due to queued requestVasily Averin2019-01-131-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4ecd55ea074217473f94cfee21bb72864d39f8d7 upstream. After commit d202cce8963d, an expired cache_head can be removed from the cache_detail's hash. However, the expired cache_head may be waiting for a reply from a previously submitted request. Such a cache_head has an increased refcounter and therefore it won't be freed after cache_put(freeme). Because the cache_head was removed from the hash it cannot be found during cache_clean() and can be leaked forever, together with stalled cache_request and other taken resources. In our case we noticed it because an entry in the export cache was holding a reference on a filesystem. Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup") Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Cc: stable@kernel.org # 2.6.35 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* treewide: kzalloc() -> kcalloc()Kees Cook2018-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* Merge tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2018-04-051-16/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on server xdr decoding (with an eye towards eliminating a data copy in the RDMA case). I did some refactoring of the delegation code in preparation for eliminating some delegation self-conflicts and implementing write delegations" * tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits) nfsd: fix incorrect umasks sunrpc: remove incorrect HMAC request initialization NFSD: Clean up legacy NFS SYMLINK argument XDR decoders NFSD: Clean up legacy NFS WRITE argument XDR decoders nfsd: Trace NFSv4 COMPOUND execution nfsd: Add I/O trace points in the NFSv4 read proc nfsd: Add I/O trace points in the NFSv4 write path nfsd: Add "nfsd_" to trace point names nfsd: Record request byte count, not count of vectors nfsd: Fix NFSD trace points svc: Report xprt dequeue latency sunrpc: Report per-RPC execution stats sunrpc: Re-purpose trace_svc_process sunrpc: Save remote presentation address in svc_xprt for trace events sunrpc: Simplify trace_svc_recv sunrpc: Simplify do_enqueue tracing sunrpc: Move trace_svc_xprt_dequeue() sunrpc: Update show_svc_xprt_flags() to include recently added flags svc: Simplify ->xpo_secure_port sunrpc: Remove unneeded pointer dereference ...
| * SUNRPC: cache: ignore timestamp written to 'flush' file.NeilBrown2018-03-191-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interface for flushing the sunrpc auth cache was poorly designed and has caused problems a number of times. The design is that you write a timestamp, and all entries created before that time are discarded. The most obvious problem is that this is not what people actually want. They want to just flush the whole cache. The 1-second granularity can be a problem, as can the use of wall-clock time. A current problem is that code will write the current time to this file - expecting it to clear everything - and if the seconds number ticks over before this timestamp is checked, the test "then >= now" fails, and a full flush isn't forced. So lets just drop the subtleties and always flush the whole cache. The worst this could do is impose an extra cost refilling it, but that would require someone to be using non-standard tools. We still report an error if the string written is not a number, but we cause any valid number to flush the whole cache. Reported-by: "Wang, Alan 1. (NSB - CN/Hangzhou)" <alan.1.wang@nokia-sbell.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | net: Use octal not symbolic permissionsJoe Perches2018-03-261-5/+5
|/ | | | | | | | | | | | | | Prefer the direct use of octal for permissions. Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace and some typing. Miscellanea: o Whitespace neatening around these conversions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'misc.poll' of ↵Linus Torvalds2018-01-301-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
| * net: annotate ->poll() instancesAl Viro2017-11-271-4/+4
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | sunrpc: make the function arg as constBhumika Goyal2017-11-271-1/+1
|/ | | | | | | | | | | Make the struct cache_detail *tmpl argument of the function cache_create_net as const as it is only getting passed to kmemup having the argument as const void *. Add const to the prototype too. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* Merge tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-03-011-46/+22
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - NFSv4: Fix memory and state leak in _nfs4_open_and_get_state - xprtrdma: Fix Read chunk padding - xprtrdma: Per-connection pad optimization - xprtrdma: Disable pad optimization by default - xprtrdma: Reduce required number of send SGEs - nlm: Ensure callback code also checks that the files match - pNFS/flexfiles: If the layout is invalid, it must be updated before retrying - NFSv4: Fix reboot recovery in copy offload - Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" - NFSv4: fix getacl head length estimation - NFSv4: fix getacl ERANGE for sum ACL buffer sizes Features: - Add and use dprintk_cont macros - Various cleanups to NFS v4.x to reduce code duplication and complexity - Remove unused cr_magic related code - Improvements to sunrpc "read from buffer" code - Clean up sunrpc timeout code and allow changing TCP timeout parameters - Remove duplicate mw_list management code in xprtrdma - Add generic functions for encoding and decoding xdr streams Bugfixes: - Clean up nfs_show_mountd_netid - Make layoutreturn_ops static and use NULL instead of 0 to fix sparse warnings - Properly handle -ERESTARTSYS in nfs_rename() - Check if register_shrinker() failed during rpcauth_init() - Properly clean up procfs/pipefs entries - Various NFS over RDMA related fixes - Silence unititialized variable warning in sunrpc" * tag 'nfs-for-4.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (64 commits) NFSv4: fix getacl ERANGE for some ACL buffer sizes NFSv4: fix getacl head length estimation Revert "NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE" NFSv4: Fix reboot recovery in copy offload pNFS/flexfiles: If the layout is invalid, it must be updated before retrying NFSv4: Clean up owner/group attribute decode SUNRPC: Add a helper function xdr_stream_decode_string_dup() NFSv4: Remove bogus "struct nfs_client" argument from decode_ace() NFSv4: Fix the underestimation of delegation XDR space reservation NFSv4: Replace callback string decode function with a generic NFSv4: Replace the open coded decode_opaque_inline() with the new generic NFSv4: Replace ad-hoc xdr encode/decode helpers with xdr_stream_* generics SUNRPC: Add generic helpers for xdr_stream encode/decode sunrpc: silence uninitialized variable warning nlm: Ensure callback code also checks that the files match sunrpc: Allow xprt->ops->timer method to sleep xprtrdma: Refactor management of mw_list field xprtrdma: Handle stale connection rejection xprtrdma: Properly recover FRWRs with in-flight FASTREG WRs xprtrdma: Shrink send SGEs array ...
| * sunrpc: use simple_read_from_buffer for reading cache flushKinglong Mee2017-02-081-12/+3
| | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: update the comments of sunrpc proc pathKinglong Mee2017-02-081-2/+2
| | | | | | | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc/nfs: cleanup procfs/pipefs entry in cache_detailKinglong Mee2017-02-081-32/+17
| | | | | | | | | | | | | | Record flush/channel/content entries is useless, remove them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | Merge tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-02-281-15/+38
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "The nfsd update this round is mainly a lot of miscellaneous cleanups and bugfixes. A couple changes could theoretically break working setups on upgrade. I don't expect complaints in practice, but they seem worth calling out just in case: - NFS security labels are now off by default; a new security_label export flag reenables it per export. But, having them on by default is a disaster, as it generally only makes sense if all your clients and servers have similar enough selinux policies. Thanks to Jason Tibbitts for pointing this out. - NFSv4/UDP support is off. It was never really supported, and the spec explicitly forbids it. We only ever left it on out of laziness; thanks to Jeff Layton for finally fixing that" * tag 'nfsd-4.11' of git://linux-nfs.org/~bfields/linux: (34 commits) nfsd: Fix display of the version string nfsd: fix configuration of supported minor versions sunrpc: don't register UDP port with rpcbind when version needs congestion control nfs/nfsd/sunrpc: enforce transport requirements for NFSv4 sunrpc: flag transports as having congestion control sunrpc: turn bitfield flags in svc_version into bools nfsd: remove superfluous KERN_INFO nfsd: special case truncates some more nfsd: minor nfsd_setattr cleanup NFSD: Reserve adequate space for LOCKT operation NFSD: Get response size before operation for all RPCs nfsd/callback: Drop a useless data copy when comparing sessionid nfsd/callback: skip the callback tag nfsd/callback: Cleanup callback cred on shutdown nfsd/idmap: return nfserr_inval for 0-length names SUNRPC/Cache: Always treat the invalid cache as unexpired SUNRPC: Drop all entries from cache_detail when cache_purge() svcrdma: Poll CQs in "workqueue" mode svcrdma: Combine list fields in struct svc_rdma_op_ctxt svcrdma: Remove unused sc_dto_q field ...
| * | SUNRPC: Drop all entries from cache_detail when cache_purge()Kinglong Mee2017-02-081-15/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User always free the cache_detail after sunrpc_destroy_cache_detail(), so, it must cleanup up entries that left in the cache_detail, otherwise, NULL reference may be caused when using the left entries. Also, NeriBrown suggests "write a stand-alone cache_purge()." v3, move the cache_fresh_unlocked() out of write lock, v2, a stand-alone cache_purge(), not only for sunrpc_destroy_cache_detail Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrpc: free contexts immediately on PROC_DESTROYNeil Brown2017-01-311-0/+12
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently handle a client PROC_DESTROY request by turning it CACHE_NEGATIVE, setting the expired time to now, and then waiting for cache_clean to clean it up later. Since we forgot to set the cache's nextcheck value, that could take up to 30 minutes. Also, though there's probably no real bug in this case, setting CACHE_NEGATIVE directly like this probably isn't a great idea in general. So let's just remove the entry from the cache directly, and move this bit of cache manipulation to a helper function. Signed-off-by: Neil Brown <neilb@suse.com> Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* / locking/atomic, kref: Add kref_read()Peter Zijlstra2017-01-141-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-241-1/+1
| | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sunrpc: queue work on system_power_efficient_wqKe Wang2016-09-271-2/+3
| | | | | | | | | | | | | | | | | | sunrpc uses workqueue to clean cache regulary. There is no real dependency of executing work on the cpu which queueing it. On a idle system, especially for a heterogeneous systems like big.LITTLE, it is observed that the big idle cpu was woke up many times just to service this work, which against the principle of power saving. It would be better if we can schedule it on a cpu which the scheduler believes to be the most appropriate one. After apply this patch, system_wq will be replaced by system_power_efficient_wq for sunrpc. This functionality is enabled when CONFIG_WQ_POWER_EFFICIENT is selected. Signed-off-by: Ke Wang <ke.wang@spreadtrum.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* sunrpc: remove 'inuse' flag from struct cache_detail.NeilBrown2016-07-131-1/+1
| | | | | | | This field is not currently in use. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov2016-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a raceNeilBrown2016-03-171-3/+3
| | | | | | | | | | | | | sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer set. In this case it aborts the queuing of the upcall. However it has already taken a new counted reference on "h" and doesn't "put" it, even though it frees the data structure holding the reference. So let's delay the "cache_get" until we know we need it. Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* sunrpc/cache: fix off-by-one in qword_get()Stefan Hajnoczi2016-02-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The qword_get() function NUL-terminates its output buffer. If the input string is in hex format \xXXXX... and the same length as the output buffer, there is an off-by-one: int qword_get(char **bpp, char *dest, int bufsize) { ... while (len < bufsize) { ... *dest++ = (h << 4) | l; len++; } ... *dest = '\0'; return len; } This patch ensures the NUL terminator doesn't fall outside the output buffer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* wrappers for ->i_mutex accessAl Viro2016-01-221-5/+5
| | | | | | | | | | | parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* sunrpc/cache: make cache flushing more reliable.Neil Brown2015-10-231-13/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The caches used to store sunrpc authentication information can be flushed by writing a timestamp to a file in /proc. This timestamp has a one-second resolution and any entry in cache that was last_refreshed *before* that time is treated as expired. This is problematic as it is not possible to reliably flush the cache without interrupting NFS service. If the current time is written to the "flush" file, any entry that was added since the current second started will still be treated as valid. If one second beyond than the current time is written to the file then no entries can be valid until the second ticks over. This will mean that no NFS request will be handled for up to 1 second. To resolve this issue we make two changes: 1/ treat an entry as expired if the timestamp when it was last_refreshed is before *or the same as* the expiry time. This means that current code which writes out the current time will now flush the cache reliably. 2/ when a new entry in added to the cache - set the last_refresh timestamp to 1 second *beyond* the current flush time, when that not in the past. This ensures that newly added entries will always be valid. Now that we have a very reliable way to flush the cache, and also since we are using "since-boot" timestamps which are monotonic, change cache_purge() to set the smallest future flush_time which will work, and leave it there: don't revert to '1'. Also disable the setting of the 'flush_time' far into the future. That has never been useful and is now awkward as it would cause last_refresh times to be strange. Finally: if a request is made to set the 'flush_time' to the current second, assume the intent is to flush the cache and advance it, if necessary, to 1 second beyond the current 'flush_time' so that all active entries will be deemed to be expired. As part of this we need to add a 'cache_detail' arg to cache_init() and cache_fresh_locked() so they can find the current ->flush_time. Signed-off-by: NeilBrown <neilb@suse.com> Reported-by: Olaf Kirch <okir@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* sunrpc: Switch to using hash list instead single listKinglong Mee2015-08-131-29/+31
| | | | | | | | | | | Switch using list_head for cache_head in cache_detail, it is useful of remove an cache_head entry directly from cache_detail. v8, using hash list, not head list Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* sunrpc/nfsd: Remove redundant code by exports seq_operations functionsKinglong Mee2015-08-131-6/+9
| | | | | | | | | | | Nfsd has implement a site of seq_operations functions as sunrpc's cache. Just exports sunrpc's codes, and remove nfsd's redundant codes. v8, same as v6 Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* sunrpc: Store cache_detail in seq_file's private directlyKinglong Mee2015-08-131-15/+13
| | | | | | | | | | | | | Cleanup. Just store cache_detail in seq_file's private, an allocated handle is redundant. v8, same as v6. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* lib/string_helpers.c: change semantics of string_escape_memRasmus Villemoes2015-04-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sunrpc: fix braino in ->poll()Al Viro2015-03-081-1/+1
| | | | | | | | | | | POLL_OUT isn't what callers of ->poll() are expecting to see; it's actually __SI_POLL | 2 and it's a siginfo code, not a poll bitmap bit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Cc: Bruce Fields <bfields@fieldses.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sunrpc/cache: convert to use string_escape_str()Andy Shevchenko2014-12-091-20/+6
| | | | | | | | | There is nice kernel helper to escape a given strings by provided rules. Let's use it instead of custom approach. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> [bfields@redhat.com: fix length calculation] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* SUNRPC: Fix printk that is not only for nfsdKinglong Mee2014-05-081-1/+1
| | | | | Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2014-01-301-3/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: - Handle some loose ends from the vfs read delegation support. (For example nfsd can stop breaking leases on its own in a fewer places where it can now depend on the vfs to.) - Make life a little easier for NFSv4-only configurations (thanks to Kinglong Mee). - Fix some gss-proxy problems (thanks Jeff Layton). - miscellaneous bug fixes and cleanup * 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits) nfsd: consider CLAIM_FH when handing out delegation nfsd4: fix delegation-unlink/rename race nfsd4: delay setting current_fh in open nfsd4: minor nfs4_setlease cleanup gss_krb5: use lcm from kernel lib nfsd4: decrease nfsd4_encode_fattr stack usage nfsd: fix encode_entryplus_baggage stack usage nfsd4: simplify xdr encoding of nfsv4 names nfsd4: encode_rdattr_error cleanup nfsd4: nfsd4_encode_fattr cleanup minor svcauth_gss.c cleanup nfsd4: better VERIFY comment nfsd4: break only delegations when appropriate NFSD: Fix a memory leak in nfsd4_create_session sunrpc: get rid of use_gssp_lock sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt sunrpc: don't wait for write before allowing reads from use-gss-proxy file nfsd: get rid of unused function definition Define op_iattr for nfsd4_open instead using macro NFSD: fix compile warning without CONFIG_NFSD_V3 ...
| * net/sunrpc/cache: simplify code by using hex_pack_byte()Andy Shevchenko2013-12-121-3/+1
| | | | | | | | | | | | | | | | hex_pack_byte() is a fast way to convert a byte in its ASCII representation. We may use it instead of custom approach. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | net: replace macros net_random and net_srandom with direct calls to prandomAruna-Hewapathirane2014-01-141-1/+1
|/ | | | | | | | | | | | This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-07-141-13/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs stuff from Al Viro: "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including making simple_lookup() usable for filesystems with non-NULL s_d_op, which allows us to get rid of quite a bit of ugliness" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sunrpc: now we can just set ->s_d_op cgroup: we can use simple_lookup() now efivarfs: we can use simple_lookup() now make simple_lookup() usable for filesystems that set ->s_d_op configfs: don't open-code d_alloc_name() __rpc_lookup_create_exclusive: pass string instead of qstr rpc_create_*_dir: don't bother with qstr llist: llist_add() can use llist_add_batch() llist: fix/simplify llist_add() and llist_add_batch() fput: turn "list_head delayed_fput_list" into llist_head fs/file_table.c:fput(): add comment Safer ABI for O_TMPFILE
| * rpc_create_*_dir: don't bother with qstrAl Viro2013-07-141-13/+5
| | | | | | | | | | | | just pass the name Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | sunrpc: Don't schedule an upcall on a replaced cache entry.NeilBrown2013-07-011-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cache entry is replaced, the "expiry_time" get set to zero by a call to "cache_fresh_locked(..., 0)" at the end of "sunrpc_cache_update". This low expiry time makes cache_check() think that the 'refresh_age' is negative, so the 'age' is comparatively large and a refresh is triggered. However refreshing a replaced entry it pointless, it cannot achieve anything useful. So teach cache_check to ignore a low refresh_age when expiry_time is zero. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | net/sunrpc: xpt_auth_cache should be ignored when expired.NeilBrown2013-07-011-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d202cce8963d9268ff355a386e20243e8332b308 sunrpc: never return expired entries in sunrpc_cache_lookup moved the 'entry is expired' test from cache_check to sunrpc_cache_lookup, so that it happened early and some races could safely be ignored. However the ip_map (in svcauth_unix.c) has a separate single-item cache which allows quick lookup without locking. An entry in this case would not be subject to the expiry test and so could be used well after it has expired. This is not normally a big problem because the first time it is used after it is expired an up-call will be scheduled to refresh the entry (if it hasn't been scheduled already) and the old entry will then be invalidated. So on the second attempt to use it after it has expired, ip_map_cached_get will discard it. However that is subtle and not ideal, so replace the "!cache_valid" test with "cache_is_expired". In doing this we drop the test on the "CACHE_VALID" bit. This is unnecessary as the bit is never cleared, and an entry will only be cached if the bit is set. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | sunrpc/cache: ensure items removed from cache do not have pending upcalls.NeilBrown2013-07-011-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible for a race to set CACHE_PENDING after cache_clean() has removed a cache entry from the cache. If CACHE_PENDING is still set when the entry is finally 'put', the cache_dequeue() will never happen and we can leak memory. So set a new flag 'CACHE_CLEANED' when we remove something from the cache, and don't queue any upcall if it is set. If CACHE_PENDING is set before CACHE_CLEANED, the call that cache_clean() makes to cache_fresh_unlocked() will free memory as needed. If CACHE_PENDING is set after CACHE_CLEANED, the test in sunrpc_cache_pipe_upcall will ensure that the memory is not allocated. Reported-by: <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | sunrpc/cache: use cache_fresh_unlocked consistently and correctly.NeilBrown2013-07-011-13/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cache_fresh_unlocked() is called when a cache entry has been updated and ensures that if there were any pending upcalls, they are cleared. So every time we update a cache entry, we should call this, and this should be the only way that we try to clear pending calls (that sort of uniformity makes code sooo much easier to read). try_to_negate_entry() will (possibly) mark an entry as negative. If it doesn't, it is because the entry already is VALID. So the entry will be valid on exit, so it is appropriate to call cache_fresh_unlocked(). So tidy up try_to_negate_entry() to do that, and remove partial open-coded cache_fresh_unlocked() from the one call-site of try_to_negate_entry(). In the other branch of the 'switch(cache_make_upcall())', we again have a partial open-coded version of cache_fresh_unlocked(). Replace that with a real call. And again in cache_clean(), use a real call to cache_fresh_unlocked(). These call sites might previously have called cache_revisit_request() if CACHE_PENDING wasn't set. This is never necessary because cache_revisit_request() can only do anything if the item is in the cache_defer_hash, However any time that an item is added to the cache_defer_hash (setup_deferral), the code immediately tests CACHE_PENDING, and removes the entry again if it is clear. So all other places we only need to 'cache_revisit_request' if we've just cleared CACHE_PENDING. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | sunrpc/cache: remove races with queuing an upcall.NeilBrown2013-07-011-11/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently queue an upcall after setting CACHE_PENDING, and dequeue after clearing CACHE_PENDING. So a request should only be present when CACHE_PENDING is set. However we don't combine the test and the enqueue/dequeue in a protected region, so it is possible (if unlikely) for a race to result in a request being queued without CACHE_PENDING set, or a request to be absent despite CACHE_PENDING. So: include a test for CACHE_PENDING inside the regions of enqueue and dequeue where queue_lock is held, and abort the operation if the value is not as expected. Also remove the early 'return' from cache_dequeue() to ensure that it always removes all entries: As there is no locking between setting CACHE_PENDING and calling sunrpc_cache_pipe_upcall it is not inconceivable for some other thread to clear CACHE_PENDING and then someone else to set it and call sunrpc_cache_pipe_upcall, both before the original threads completed the call. With this, it perfectly safe and correct to: - call cache_dequeue() if and only if we have just cleared CACHE_PENDING - call sunrpc_cache_pipe_upcall() (via cache_make_upcall) if and only if we have just set CACHE_PENDING. Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | sunrpc: the cache_detail in cache_is_valid is unused any morechaoting fan2013-05-211-4/+4
|/ | | | | | | | The cache_detail(*detail) in function cache_is_valid is not used any more. Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2013-05-031-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd changes from J Bruce Fields: "Highlights include: - Some more DRC cleanup and performance work from Jeff Layton - A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the server using credentials from Active Directory often fail due to limitations of the svcgssd upcall interface. This replacement lifts those limitations. The existing upcall is still supported for backwards compatibility. - More NFSv4.1 support: at this point, if a user with a current client who upgrades from 4.0 to 4.1 should see no regressions. In theory we do everything a 4.1 server is required to do. Patches for a couple minor exceptions are ready for 3.11, and with those and some more testing I'd like to turn 4.1 on by default in 3.11." Fix up semantic conflict as per Stephen Rothwell and linux-next: Commit 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication") adds two new users of "PDE(inode)->data", but we're supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad87 ("procfs: new helper - PDE_DATA(inode)"). The old PDE() macro is no longer available since commit c30480b92cf4 ("proc: Make the PROC_I() and PDE() macros internal to procfs") * 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits) NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo() nfsd: make symbol nfsd_reply_cache_shrinker static svcauth_gss: fix error return code in rsc_parse() nfsd4: don't remap EISDIR errors in rename svcrpc: fix gss-proxy to respect user namespaces SUNRPC: gssp_procedures[] can be static SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case nfsd4: better error return to indicate SSV non-support nfsd: fix EXDEV checking in rename SUNRPC: Use gssproxy upcall for server RPCGSS authentication. SUNRPC: Add RPC based upcall mechanism for RPCGSS auth SUNRPC: conditionally return endtime from import_sec_context SUNRPC: allow disabling idle timeout SUNRPC: attempt AF_LOCAL connect on setup nfsd: Decode and send 64bit time values nfsd4: put_client_renew_locked can be static nfsd4: remove unused macro nfsd4: remove some useless code nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED ...
| * SUNRPC/cache: add module_put() on error path in cache_open()Alexey Khoroshilov2013-04-031-1/+3
| | | | | | | | | | | | | | | | | | | | If kmalloc() fails in cache_open(), module cd->owner left locked. The patch adds module_put(cd->owner) on this path. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-05-011-12/+12
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...
| * | procfs: new helper - PDE_DATA(inode)Al Viro2013-04-091-12/+12
| |/ | | | | | | | | | | | | | | | | | | The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* / lib, net: make isodigit() public and use itAndy Shevchenko2013-04-291-1/+0
|/ | | | | | | | | There are at least two users of isodigit(). Let's make it a public function of ctype.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>