summaryrefslogtreecommitdiffstats
path: root/fs/afs/vl_rotate.c
Commit message (Collapse)AuthorAgeFilesLines
* afs: Add tracing for cell refcount and active user countDavid Howells2020-10-161-1/+1
| | | | | | | | | | | Add a tracepoint to log the cell refcount and active user count and pass in a reason code through various functions that manipulate these counters. Additionally, a helper function, afs_see_cell(), is provided to log interesting places that deal with a cell without actually doing any accounting directly. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Fix cell refcounting by splitting the usage counterDavid Howells2020-10-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Management of the lifetime of afs_cell struct has some problems due to the usage counter being used to determine whether objects of that type are in use in addition to whether anyone might be interested in the structure. This is made trickier by cell objects being cached for a period of time in case they're quickly reused as they hold the result of a setup process that may be slow (DNS lookups, AFS RPC ops). Problems include the cached root volume from alias resolution pinning its parent cell record, rmmod occasionally hanging and occasionally producing assertion failures. Fix this by splitting the count of active users from the struct reference count. Things then work as follows: (1) The cell cache keeps +1 on the cell's activity count and this has to be dropped before the cell can be removed. afs_manage_cell() tries to exchange the 1 to a 0 with the cells_lock write-locked, and if successful, the record is removed from the net->cells. (2) One struct ref is 'owned' by the activity count. That is put when the active count is reduced to 0 (final_destruction label). (3) A ref can be held on a cell whilst it is queued for management on a work queue without confusing the active count. afs_queue_cell() is added to wrap this. (4) The queue's ref is dropped at the end of the management. This is split out into a separate function, afs_manage_cell_work(). (5) The root volume record is put after a cell is removed (at the final_destruction label) rather then in the RCU destruction routine. (6) Volumes hold struct refs, but aren't active users. (7) Both counts are displayed in /proc/net/afs/cells. There are some management function changes: (*) afs_put_cell() now just decrements the refcount and triggers the RCU destruction if it becomes 0. It no longer sets a timer to have the manager do this. (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease the active count. afs_unuse_cell() sets the management timer. (*) afs_queue_cell() is added to queue a cell with approprate refs. There are also some other fixes: (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL. (*) Make sure that candidate cells in lookups are properly destroyed rather than being simply kfree'd. This ensures the bits it points to are destroyed also. (*) afs_dec_cells_outstanding() is now called in cell destruction rather than at "final_destruction". This ensures that cell->net is still valid to the end of the destructor. (*) As a consequence of the previous two changes, move the increment of net->cells_outstanding that was at the point of insertion into the tree to the allocation routine to correctly balance things. Fixes: 989782dcdc91 ("afs: Overhaul cell database management") Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Fix error handling in VL server rotationDavid Howells2020-08-201-0/+4
| | | | | | | | | | | | | | | | | The error handling in the VL server rotation in the case of there being no contactable servers is not correct. In such a case, the records of all the servers in the list are scanned and the errors and abort codes are mapped and prioritised and one error is chosen. This is then forgotten and the default error is used (EDESTADDRREQ). Fix this by using the calculated error. Also we need to note whether a server responded on one of its endpoints so that we can priorise an error from an abort message over local and network errors. Fixes: 4584ae96ae30 ("afs: Fix missing net error handling") Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Don't use VL probe running state to make decisions outside probe codeDavid Howells2020-08-201-1/+1
| | | | | | | | | | | | | Don't use the running state for VL server probes to make decisions about which server to use as the state is cleared at the start of a probe and intermediate values might also be misleading. Instead, add a separate 'latest known' rtt in the afs_vlserver struct and a flag to indicate if the server is known to be responding and update these as and when we know what to change them to. Fixes: 3bf0fb6f33dd ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Expose information from afs_vlserver through /proc for debuggingDavid Howells2020-08-201-1/+2
| | | | | | | | Convert various bitfields in afs_vlserver::probe to a mask and then expose this and some other bits of information through /proc/net/afs/<cell>/vlservers to make it easier to debug VL server communication issues. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Detect cell aliases 3 - YFS Cells with a canonical cell name opDavid Howells2020-06-041-0/+4
| | | | | | | | YFS Volume Location servers have an operation by which the cell name may be queried. Use this to find out what a YFS server thinks the canonical cell name should be. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Remove some unused bitsDavid Howells2020-04-241-2/+2
| | | | | | | | | | | | | | Remove three bits: (1) afs_server::no_epoch is neither set nor used. (2) afs_server::have_result is set and a wakeup is applied to it, but nothing looks at it or waits on it. (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing sets it for VL servers. Signed-off-by: David Howells <dhowells@redhat.com>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36Thomas Gleixner2019-05-241-5/+1
| | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public licence as published by the free software foundation either version 2 of the licence or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 114 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* afs: Fix cell DNS lookupDavid Howells2019-05-161-4/+22
| | | | | | | | | | | | | | | | | | | | | | Currently, once configured, AFS cells are looked up in the DNS at regular intervals - which is a waste of resources if those cells aren't being used. It also leads to a problem where cells preloaded, but not configured, before the network is brought up end up effectively statically configured with no VL servers and are unable to get any. Fix this by not doing the DNS lookup until the first time a cell is touched. It is waited for if we don't have any cached records yet, otherwise the DNS lookup to maintain the record is done in the background. This has the downside that the first time you touch a cell, you now have to wait for the upcall to do the required DNS lookups rather than them already being cached. Further, the record is not replaced if the old record has at least one server in it and the new record doesn't have any. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Fix afs_cell records to always have a VL server list recordDavid Howells2019-05-151-1/+1
| | | | | | | Fix it such that afs_cell records always have a VL server list record attached, even if it's a dummy one, so that various checks can be removed. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Fix missing net error handlingDavid Howells2018-11-291-40/+10
| | | | | | | | | | | | | | | | | | | | | | | kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and ERFKILL) that it doesn't handle in its server/address rotation algorithms. They cause the probing and rotation to abort immediately rather than rotating. Fix this by: (1) Abstracting out the error prioritisation from the VL and FS rotation algorithms into a common function and expand usage into the server probing code. When multiple errors are available, this code selects the one we'd prefer to return. (2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL. Fixes: 0fafdc9f888b ("afs: Fix file locking") Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* afs: Probe multiple fileservers simultaneouslyDavid Howells2018-10-241-54/+105
| | | | | | | | | | | | | Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Eliminate the address pointer from the address list cursorDavid Howells2018-10-241-1/+1
| | | | | | | | | Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Allow dumping of server cursor on operation failureDavid Howells2018-10-241-0/+53
| | | | | | | | Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells <dhowells@redhat.com>
* afs: Implement VL server rotationDavid Howells2018-10-241-0/+251
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs/<cell>/vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells <dhowells@redhat.com>