| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
When we get an error on parsing a stat due to a protocol bug,
we can generate an oops during cleanup because we didn't
initialize the string pointers in the stat structure.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In v9fs_get_inode(), for block, as well as char devices (in theory),
the function init_special_inode() is called to set up callback functions
for file ops. this function uses the file mode's value to determine whether
to use block or char dev functions. In v9fs_inode_from_fid(), the function
p9mode2unixmode() is used, but for all devices it initially returns S_IFBLK,
then uses v9fs_get_inode() to initialise a new inode, then finally uses
v9fs_stat2inode(), which would determine whether the inode is a block or
character device. However, at that point init_special_inode() had already
decided to use the block device functions, so even if the inode's mode is
turned to a character device, the block functions are still used to operate
on them. The attached patch simply calls init_special_inode() again for devices
after parsing device node data in v9fs_stat2inode() so that the proper functions
are used.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new debug support lacks some of the information that the previous fcprint
code provided -- this patch focuses on better presentation of debug data along
with more helpful debug along error paths.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove depricated conv functions which have been replaced with new
protocol routines.
This patch also reworks the one instance of the file-system code which
directly calls conversion routines (to accomplish unpacking dirreads).
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Now that the new protocol functions are in place, this patch switches
the client code to using the new support code.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the vestigial tag field from the p9_req_t structure.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One of the current debug options allows users to get a verbose dump of fcalls.
This isn't really necessary as correctly parsed protocol frames can be printed
as part of the code in the client functions. The consolidated printfcalls
structure would require new entries to be added for every extension. This
patch removes the debug print methods and their use.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new protocol processing support code based on Anthony Liguori's
9p library code. This code performs protocol marshalling/unmarshalling using
printf like strings to represent protocol elements. It is my intent to use
them to replace the current functions in conv.c as well as the
p9_create_* functions.
This should make the client implementation much more clear, and also make it
much easier to add new protocol extensions by limiting the number of places
in which changes need to be made.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Alsmot all 9P client wire functions have their own (set of) functions.
Tversion is an exception as its encapsulated into the client_create code.
This patch moves the protocol specifics of this to a function to match the
rest of the code.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently reading a directory is implemented in the client code.
This function is not actually a wire operation, but a meta operation
which calls read operations and processes the results.
This patch moves this functionality to the fs layer and calls component
wire operations instead of constructing their packets. This provides a
cleaner separation and will help when we reorganize the client functions
and protocol processing methods.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the 9p net wire operation ensures that all data is sent by sending
multiple packets if the data requested is larger than the msize. This is
better handled in the vfs code so that we can simplify wire operations to
being concerned with only putting data onto and taking data off of the wire.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a couple of methods in the client code which aren't actually
wire operations. To keep things organized cleaner, these operations are
being moved to the fs layer.
This patch moves the readn meta-function (which executes multiple wire
reads until a buffer is full) to the fs layer.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there are two separate versions of read and write. One for
dealing with user buffers and the other for dealing with kernel buffers.
There is a tremendous amount of code duplication in the otherwise
identical versions of these functions. This patch adds an additional
user buffer parameter to read and write and conditionalizes handling of
the buffer on whether the kernel buffer or the user buffer is populated.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Post p9_fd_poll() error path which checks m->poll_waddr[i] for PTR_ERR
value has the following problems.
* It's completely unused. Error value is set iff NULL @wait_address
has been specified to p9_pollwait() which is guaranteed not to
happen.
* It dereferences @m after deallocating it (introduced by 571ffeaf and
spotted by Raja R Harinath.
* It returned the wrong value on error. It should return
poll_waddr[i] but it returnes poll_waddr (introduced by 571ffeaf).
* p9_mux_poll_stop() doesn't handle PTR_ERR value. It will try to
operate on the PTR_ERR value as if it's a normal pointer and cause
oops.
As the error path is bogus in the first place, there's no reason to
hold onto it. Kill it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Raja R Harinath <harinath@hurrynot.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code moves the rpc function to the common client base,
reorganizes the flush code to be more simple and stable, and
makes the necessary adjustments to the underlying transports
to adapt to the new structure.
This reduces the overall amount of code duplication between the
transports and should make adding new transports more straightforward.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
| |
This patch reworks the read_work function to enable it to directly use a passed
in rcall structure. This should help allow us to remove unnecessary copies
in the future.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
| |
Apply the now common p9_req_t structure to the fd transport.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
| |
Simplify trans_fd by using new common client tagpool structure.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
The virtio transport uses a simplified request management system
that I want to use for all transports. This patch adapts and moves the
exisiting code for managing requests to the client common code.
Later patches will apply these mechanisms to the other transports.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
The current trans_fd rpc mechanisms use a dynamic callback mechanism which
introduces a lot of complexity which only accomodates a single special case.
This patch removes much of that complexity in favor of a simple exception
mechanism to deal with flushes.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
Currently, trans_fd has two structures (p9_req and p9_mux-rpc)
which contain mostly duplicate data.
This patch consolidates these two structures and removes p9_mux_rpc.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Cleanup files by reordering functions in order to remove need for
unnecessary function prototypes.
There are no code changes here, just functions being moved around and
prototypes being eliminated.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
| |
Now that we are passing client state into the transport modules, remove
duplicate state which is present in transport private structures.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now there is a transport module structure which provides per-transport
type functions and data and a transport structure which contains per-instance
public data as well as function pointers to instance specific functions.
This patch moves public transport visible instance data to the client
structure (which in some cases had duplicate data) and consolidates the
functions into the transport module structure.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
trans_fd used pool of upto 100 pollers to monitor the r/w fds. The
approach makes sense in userspace back when the only available
interfaces were poll(2) and select(2). As each event monitor -
trigger - handling iteration took O(n) where `n' is the number of
watched fds, it makes sense to spread them to many pollers such that
the `n' can be divided by the number of pollers. However, this
doesn't make any sense in kernel because persistent edge triggered
event monitoring is how the whole thing is implemented in the kernel
in the first place.
This patch converts trans_fd to use single poller which watches all
the fds instead of the poll of pollers approach. All the fds are
registered for monitoring on creation and only the fds with pending
events are scanned when something happens much like how epoll is
implemented.
This change makes trans_fd fd monitoring more efficient and simpler.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
I dunno how this missed Bjorn and his quest to use %pF in commit
c80cfb0406c01bb5da91bfe30f5cb1fd96831138 ("vsprintf: use new vsprintf
symbolic function pointer format"), but it did.
So use %pF in the two remaining places that still tried to print out
function pointers by hand.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (53 commits)
NFS: Fix a resolution problem with nfs_inode->cache_change_attribute
NFS: Fix the resolution problem with nfs_inode_attrs_need_update()
NFS: Changes to inode->i_nlinks must set the NFS_INO_INVALID_ATTR flag
RPC/RDMA: ensure connection attempt is complete before signalling.
RPC/RDMA: correct the reconnect timer backoff
RPC/RDMA: optionally emit useful transport info upon connect/disconnect.
RPC/RDMA: reformat a debug printk to keep lines together.
RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.
RPC/RDMA: fix connect/reconnect resource leak.
RPC/RDMA: return a consistent error, when connect fails.
RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
RPC/RDMA: avoid an oops due to disconnect racing with async upcalls.
RPC/RDMA: maintain the RPC task bytes-sent statistic.
RPC/RDMA: suppress retransmit on RPC/RDMA clients.
RPC/RDMA: fix connection IRD/ORD setting
RPC/RDMA: support FRMR client memory registration.
RPC/RDMA: check selected memory registration mode at runtime.
RPC/RDMA: add data types and new FRMR memory registration enum.
RPC/RDMA: refactor the inline memory registration code.
NFS: fix nfs_parse_ip_address() corner case
...
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The cache_change_attribute is used to decide whether or not a directory has
changed, in which case we may need to look it up again. Again, the use of
'jiffies' leads to an issue of resolution.
Once again, the fix is to change nfs_inode->cache_change_attribute, and
just make it a simple counter.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It appears that 'jiffies' timestamps do not have high enough resolution for
nfs_inode_attrs_need_update(). One problem is that a GETATTR can be
launched within < 1 jiffy of the last operation that updated the attribute.
Another problem is that RPC calls can take < 1 jiffy to execute.
We can fix this by switching the variables to use a simple global counter
that gets incremented every time we start another GETATTR call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RPC/RDMA connection logic could return early from reconnection
attempts, leading to additional spurious retries.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RPC/RDMA code had a constant 5-second reconnect backoff, and
always performed it, even when re-establishing a connection to a
server after the RPC layer closed it due to being idle. Make it
an geometric backoff (up to 30 seconds), and don't delay idle
reconnect.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The send marshaling code split a particular dprintk across two
lines, which makes it hard to extract from logfiles.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add defensive timeouts to wait_for_completion() calls in RDMA
address resolution, and make them interruptible. Fix the timeout
units to milliseconds (formerly jiffies) and move to private header.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RPC/RDMA code can leak RDMA connection manager endpoints in
certain error cases on connect. Don't signal unwanted events,
and be certain to destroy any allocated qp.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The xprt_connect call path does not expect such errors as ECONNREFUSED
to be returned from failed transport connection attempts, otherwise it
translates them to EIO and signals fatal errors. For example, mount.nfs
prints simply "internal error". Translate all such errors to ENOTCONN
from RPC/RDMA to match sockets behavior.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RPC/RDMA protocol allows clients and servers to avoid RDMA
operations for data which is purely the result of XDR padding.
On the client, automatically insert the necessary padding for
such server replies, and optionally don't marshal such chunks.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
RDMA disconnects yield an upcall from the RDMA connection manager,
which can race with rpc transport close, e.g. on ^C of a mount.
Ensure any rdma cm_id and qp are fully destroyed before continuing.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
An RPC/RDMA client cannot retransmit on an unbroken connection,
doing so violates its flow control with the server.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This logic sets the connection parameter that configures the local device
and informs the remote peer how many concurrent incoming RDMA_READ
requests are supported. The original logic didn't really do what was
intended for two reasons:
- The max number supported by the device is typically smaller than
any one factor in the calculation used, and
- The field in the connection parameter structure where the value is
stored is a u8 and always overflows for the default settings.
So what really happens is the value requested for responder resources
is the left over 8 bits from the "desired value". If the desired value
happened to be a multiple of 256, the result was zero and it wouldn't
connect at all.
Given the above and the fact that max_requests is almost always larger
than the max responder resources supported by the adapter, this patch
simplifies this logic and simply requests the max supported by the device,
subject to a reasonable limit.
This bug was found by Jim Schutt at Sandia.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Configure, detect and use "fastreg" support from IB/iWARP verbs
layer to perform RPC/RDMA memory registration.
Make FRMR the default memreg mode (will fall back if not supported
by the selected RDMA adapter).
This allows full and optimal operation over the cxgb3 adapter, and others.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
At transport creation, check for, and use, any local dma lkey.
Then, check that the selected memory registration mode is in fact
supported by the RDMA adapter selected for the mount. Fall back
to best alternative if not.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Internal RPC/RDMA structure updates in preparation for FRMR support.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Refactor the memory registration and deregistration routines.
This saves stack space, makes the code more readable and prepares
to add the new FRMR registration methods.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Bruce observed that nfs_parse_ip_address() will successfully parse an
IPv6 address that looks like this:
"::1%"
A scope delimiter is present, but there is no scope ID following it.
This is harmless, as it would simply set the scope ID to zero. However,
in some cases we would like to flag this as an improperly formed
address.
We are now also careful to reject addresses where garbage follows the
address (up to the length of the string), instead of ignoring the
non-address characters; and where the scope ID is nonsense (not a valid
device name, but also not numeric). Before, both of these cases would
result in a harmless zero scope ID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a regression seen when running the Connectathon testsuite
against an ext3 filesystem. The reason was that the inode was constantly
being marked as 'just updated' by the jiffy wraparound test.
This again meant that newer GETATTR calls were failing to pass the
nfs_inode_attrs_need_update() test unless the changes caused a ctime update
on the server, since they were perceived as having been started before the
latest inode update.
Given that nfs_inode_attrs_need_update() already checks for wraparound
of nfsi->last_updated, we can drop the buggy "protection" in
nfs_update_inode().
Also make a slight micro-optimisation of nfs_inode_attrs_need_update(): we
are more often going to see time_after(fattr->time_start, nfsi->last_updated)
be true, rather than seeing an update of ctime/size, so put that test
first to ensure that we optimise away the ctime/size tests.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|