summaryrefslogtreecommitdiffstats
path: root/include/linux/ceph/messenger.h
Commit message (Collapse)AuthorAgeFilesLines
* libceph: ceph_connection_operations::reencode_message() methodIlya Dryomov2017-07-071-0/+2
| | | | | | | | | Give upper layers a chance to reencode the message after the connection is negotiated and ->peer_features is set. OSD client will use this to support both luminous and pre-luminous OSDs (in a single cluster): the former need MOSDOp v8; the latter will continue to be sent MOSDOp v4. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2016-12-161-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "A varied set of changes: - a large rework of cephx auth code to cope with CONFIG_VMAP_STACK (myself). Also fixed a deadlock caused by a bogus allocation on the writeback path and authorize reply verification. - a fix for long stalls during fsync (Jeff Layton). The client now has a way to force the MDS log flush, leading to ~100x speedups in some synthetic tests. - a new [no]require_active_mds mount option (Zheng Yan). On mount, we will now check whether any of the MDSes are available and bail rather than block if none are. This check can be avoided by specifying the "no" option. - a couple of MDS cap handling fixes and a few assorted patches throughout" * tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client: (32 commits) libceph: remove now unused finish_request() wrapper libceph: always signal completion when done ceph: avoid creating orphan object when checking pool permission ceph: properly set issue_seq for cap release ceph: add flags parameter to send_cap_msg ceph: update cap message struct version to 10 ceph: define new argument structure for send_cap_msg ceph: move xattr initialzation before the encoding past the ceph_mds_caps ceph: fix minor typo in unsafe_request_wait ceph: record truncate size/seq for snap data writeback ceph: check availability of mds cluster on mount ceph: fix splice read for no Fc capability case ceph: try getting buffer capability for readahead/fadvise ceph: fix scheduler warning due to nested blocking ceph: fix printing wrong return variable in ceph_direct_read_write() crush: include mapper.h in mapper.c rbd: silence bogus -Wmaybe-uninitialized warning libceph: no need to drop con->mutex for ->get_authorizer() libceph: drop len argument of *verify_authorizer_reply() libceph: verify authorize reply on connect ...
| * libceph: drop len argument of *verify_authorizer_reply()Ilya Dryomov2016-12-121-1/+1
| | | | | | | | | | | | | | | | | | The length of the reply is protocol-dependent - for cephx it's ceph_x_authorize_reply. Nothing sensible can be passed from the messenger layer anyway. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
* | ceph: don't include blk_types.h in messenger.hChristoph Hellwig2016-11-011-1/+1
|/ | | | | | | | The file only needs the struct bvec_iter delcaration, which is available from bvec.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* libceph: fix ceph_msg_revoke()Ilya Dryomov2016-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of problems with revoking a "was sending" message: (1) We never make any attempt to revoke data - only kvecs contibute to con->out_skip. However, once the header (envelope) is written to the socket, our peer learns data_len and sets itself to expect at least data_len bytes to follow front or front+middle. If ceph_msg_revoke() is called while the messenger is sending message's data portion, anything we send after that call is counted by the OSD towards the now revoked message's data portion. The effects vary, the most common one is the eventual hang - higher layers get stuck waiting for the reply to the message that was sent out after ceph_msg_revoke() returned and treated by the OSD as a bunch of data bytes. This is what Matt ran into. (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs is wrong. If ceph_msg_revoke() is called before the tag is sent out or while the messenger is sending the header, we will get a connection reset, either due to a bad tag (0 is not a valid tag) or a bad header CRC, which kind of defeats the purpose of revoke. Currently the kernel client refuses to work with header CRCs disabled, but that will likely change in the future, making this even worse. (3) con->out_skip is not reset on connection reset, leading to one or more spurious connection resets if we happen to get a real one between con->out_skip is set in ceph_msg_revoke() and before it's cleared in write_partial_skip(). Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never zero the tag or the header, i.e. send out tag+header regardless of when ceph_msg_revoke() is called. That way the header is always correct, no unnecessary resets are induced and revoke stands ready for disabled CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new "message out temp" and copy the header into it before sending. Cc: stable@vger.kernel.org # 4.0+ Reported-by: Matt Conner <matt.conner@keepertech.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Matt Conner <matt.conner@keepertech.com> Reviewed-by: Sage Weil <sage@redhat.com>
* libceph: stop duplicating client fields in messengerIlya Dryomov2015-11-021-10/+1
| | | | | | | supported_features and required_features serve no purpose at all, while nocrc and tcp_nodelay belong to ceph_options::flags. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: msg signing callouts don't need con argumentIlya Dryomov2015-11-021-3/+2
| | | | | | | | | | We can use msg->con instead - at the point we sign an outgoing message or check the signature on the incoming one, msg->con is always set. We wouldn't know how to sign a message without an associated session (i.e. msg->con == NULL) and being able to sign a message using an explicitly provided authorizer is of no use. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: don't access invalid memory in keepalive2 pathIlya Dryomov2015-09-171-1/+3
| | | | | | | | | | | | | | | | This struct ceph_timespec ceph_ts; ... con_out_kvec_add(con, sizeof(ceph_ts), &ceph_ts); wraps ceph_ts into a kvec and adds it to con->out_kvec array, yet ceph_ts becomes invalid on return from prepare_write_keepalive(). As a result, we send out bogus keepalive2 stamps. Fix this by encoding into a ceph_timespec member, similar to how acks are read and written. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
* libceph: use keepalive2 to verify the mon session is aliveYan, Zheng2015-09-081-0/+4
| | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: enable ceph in a non-default network namespaceIlya Dryomov2015-07-091-0/+3
| | | | | | | | | | | | | | Grab a reference on a network namespace of the 'rbd map' (in case of rbd) or 'mount' (in case of ceph) process and use that to open sockets instead of always using init_net and bailing if network namespace is anything but init_net. Be careful to not share struct ceph_client instances between different namespaces and don't add any code in the !CONFIG_NET_NS case. This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>
* libceph: tcp_nodelay supportChaitanya Huilgol2015-02-191-1/+3
| | | | | | | | | | | TCP_NODELAY socket option set on connection sockets, disables Nagle’s algorithm and improves latency characteristics. tcp_nodelay(default)/notcp_nodelay option flags provided to enable/disable setting the socket option. Signed-off-by: Chaitanya Huilgol <chaitanya.huilgol@sandisk.com> [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
* libceph: message signature supportYan, Zheng2014-12-171-1/+8
| | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
* libceph: move and add dout()s to ceph_msg_{get,put}()Ilya Dryomov2014-07-081-12/+2
| | | | | | | | Add dout()s to ceph_msg_{get,put}(). Also move them to .c and turn kref release callback into a static function. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
* Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds2014-01-301-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
| * ceph: Convert to immutable biovecsKent Overstreet2013-11-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | Now that we've got a mechanism for immutable biovecs - bi_iter.bi_bvec_done - we need to convert drivers to use primitives that respect it instead of using the bvec array directly. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sage Weil <sage@inktank.com> Cc: ceph-devel@vger.kernel.org
* | libceph: add ceph_kv{malloc,free}() and switch to themIlya Dryomov2014-01-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encapsulate kmalloc vs vmalloc memory allocation and freeing logic into two helpers, ceph_kvmalloc() and ceph_kvfree(), and switch to them. ceph_kvmalloc() kmalloc()'s a maximum of 8 pages, anything bigger is vmalloc()'ed with __GFP_HIGHMEM set. This changes the existing behaviour: - for buffers (ceph_buffer_new()), from trying to kmalloc() everything and using vmalloc() just as a fallback - for messages (ceph_msg_new()), from going to vmalloc() for anything bigger than a page - for messages (ceph_msg_new()), from disallowing vmalloc() to use high memory Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | libceph: rename ceph_msg::front_max to front_alloc_lenIlya Dryomov2014-01-141-1/+1
| | | | | | | | | | | | | | | | Rename front_max field of struct ceph_msg to front_alloc_len to make its purpose more clear. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | libceph: all features fields must be u64Ilya Dryomov2013-12-311-5/+5
|/ | | | | | | | | In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* libceph: add, don't set data for a messageAlex Elder2013-05-011-3/+3
| | | | | | | | | | | | | Change the names of the functions that put data on a pagelist to reflect that we're adding to whatever's already there rather than just setting it to the one thing. Currently only one data item is ever added to a message, but that's about to change. This resolves: http://tracker.ceph.com/issues/2770 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: implement multiple data items in a messageAlex Elder2013-05-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to the messenger for more than one data item in its data list. A message data cursor has two more fields to support this: - a count of the number of bytes left to be consumed across all data items in the list, "total_resid" - a pointer to the head of the list (for validation only) The cursor initialization routine has been split into two parts: the outer one, which initializes the cursor for traversing the entire list of data items; and the inner one, which initializes the cursor to start processing a single data item. When a message cursor is first initialized, the outer initialization routine sets total_resid to the length provided. The data pointer is initialized to the first data item on the list. From there, the inner initialization routine finishes by setting up to process the data item the cursor points to. Advancing the cursor consumes bytes in total_resid. If the resid field reaches zero, it means the current data item is fully consumed. If total_resid indicates there is more data, the cursor is advanced to point to the next data item, and then the inner initialization routine prepares for using that. (A check is made at this point to make sure we don't wrap around the front of the list.) The type-specific init routines are modified so they can be given a length that's larger than what the data item can support. The resid field is initialized to the smaller of the provided length and the length of the entire data item. When total_resid reaches zero, we're done. This resolves: http://tracker.ceph.com/issues/3761 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: replace message data pointer with listAlex Elder2013-05-011-1/+2
| | | | | | | | | In place of the message data pointer, use a list head which links through message data items. For now we only support a single entry on that list. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: have cursor point to dataAlex Elder2013-05-011-4/+4
| | | | | | | | | Rather than having a ceph message data item point to the cursor it's associated with, have the cursor point to a data item. This will allow a message cursor to be used for more than one data item. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: move cursor into messageAlex Elder2013-05-011-21/+22
| | | | | | | | | | | | | | | | | | | A message will only be processing a single data item at a time, so there's no need for each data item to have its own cursor. Move the cursor embedded in the message data structure into the message itself. To minimize the impact, keep the data->cursor field, but make it be a pointer to the cursor in the message. Move the definition of ceph_msg_data above ceph_msg_data_cursor so the cursor can point to the data without a forward definition rather than vice-versa. This and the upcoming patches are part of: http://tracker.ceph.com/issues/3761 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: record bio lengthAlex Elder2013-05-011-1/+4
| | | | | | | | The bio is the only data item type that doesn't record its full length. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: fix possible CONFIG_BLOCK build problemAlex Elder2013-05-011-0/+2
| | | | | | | | | | This patch: 15a0d7b libceph: record message data length did not enclose some bio-specific code inside CONFIG_BLOCK as it should have. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: record message data lengthAlex Elder2013-05-011-1/+3
| | | | | | | | | | Keep track of the length of the data portion for a message in a separate field in the ceph_msg structure. This information has been maintained in wire byte order in the message header, but that's going to change soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: make message data be a pointerAlex Elder2013-05-011-4/+1
| | | | | | | | | | | | | | | | | | | | | | Begin the transition from a single message data item to a list of them by replacing the "data" structure in a message with a pointer to a ceph_msg_data structure. A null pointer will indicate the message has no data; replace the use of ceph_msg_has_data() with a simple check for a null pointer. Create functions ceph_msg_data_create() and ceph_msg_data_destroy() to dynamically allocate and free a data item structure of a given type. When a message has its data item "set," allocate one of these to hold the data description, and free it when the last reference to the message is dropped. This partially resolves: http://tracker.ceph.com/issues/4429 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill last of ceph_msg_posAlex Elder2013-05-011-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only remaining field in the ceph_msg_pos structure is did_page_crc. In the new cursor model of things that flag (or something like it) belongs in the cursor. Define a new field "need_crc" in the cursor (which applies to all types of data) and initialize it to true whenever a cursor is initialized. In write_partial_message_data(), the data CRC still will be computed as before, but it will check the cursor->need_crc field to determine whether it's needed. Any time the cursor is advanced to a new piece of a data item, need_crc will be set, and this will cause the crc for that entire piece to be accumulated into the data crc. In write_partial_message_data() the intermediate crc value is now held in a local variable so it doesn't have to be byte-swapped so many times. In read_partial_msg_data() we do something similar (but mainly for consistency there). With that, the ceph_msg_pos structure can go away, and it no longer needs to be passed as an argument to prepare_message_data(). This cleanup is related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill most of ceph_msg_posAlex Elder2013-05-011-2/+0
| | | | | | | | | | | | All but one of the fields in the ceph_msg_pos structure are now never used (only assigned), so get rid of them. This allows several small blocks of code to go away. This is cleanup of old code related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: collapse all data items into oneAlex Elder2013-05-011-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that only one of the data item types is ever used at any one time in a single message (currently). - A page array is used by the osd client (on behalf of the file system) and by rbd. Only one osd op (and therefore at most one data item) is ever used at a time by rbd. And the only time the file system sends two, the second op contains no data. - A bio is only used by the rbd client (and again, only one data item per message) - A page list is used by the file system and by rbd for outgoing data, but only one op (and one data item) at a time. We can therefore collapse all three of our data item fields into a single field "data", and depend on the messenger code to properly handle it based on its type. This allows us to eliminate quite a bit of duplicated code. This is related to: http://tracker.ceph.com/issues/4429 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill ceph message bio_iter, bio_segAlex Elder2013-05-011-5/+1
| | | | | | | | | | | | The bio_iter and bio_seg fields in a message are no longer used, we use the cursor instead. So get rid of them and the functions that operate on them them. This is related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: record residual bytes for all message data typesAlex Elder2013-05-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | All of the data types can use this, not just the page array. Until now, only the bio type doesn't have it available, and only the initiator of the request (the rbd client) is able to supply the length of the full request without re-scanning the bio list. Change the cursor init routines so the length is supplied based on the message header "data_len" field, and use that length to intiialize the "resid" field of the cursor. In addition, change the way "last_piece" is defined so it is based on the residual number of bytes in the original request. This is necessary (at least for bio messages) because it is possible for a read request to succeed without consuming all of the space available in the data buffer. This resolves: http://tracker.ceph.com/issues/4427 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill message trailAlex Elder2013-05-011-4/+0
| | | | | | | | | | | | The wart that is the ceph message trail can now be removed, because its only user was the osd client, and the previous patch made that no longer the case. The result allows write_partial_msg_pages() to be simplified considerably. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: implement pages array cursorAlex Elder2013-05-011-0/+6
| | | | | | | | Implement and use cursor routines for page array message data items for outbound message data. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: implement bio message data item cursorAlex Elder2013-05-011-0/+7
| | | | | | | | | | | Implement and use cursor routines for bio message data items for outbound message data. (See the previous commit for reasoning in support of the changes in out_msg_pos_next().) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: prepare for other message data item typesAlex Elder2013-05-011-2/+6
| | | | | | | | | This just inserts some infrastructure in preparation for handling other types of ceph message data items. No functional changes, just trying to simplify review by separating out some noise. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: start defining message data cursorAlex Elder2013-05-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch lays out the foundation for using generic routines to manage processing items of message data. For simplicity, we'll start with just the trail portion of a message, because it stands alone and is only present for outgoing data. First some basic concepts. We'll use the term "data item" to represent one of the ceph_msg_data structures associated with a message. There are currently four of those, with single-letter field names p, l, b, and t. A data item is further broken into "pieces" which always lie in a single page. A data item will include a "cursor" that will track state as the memory defined by the item is consumed by sending data from or receiving data into it. We define three routines to manipulate a data item's cursor: the "init" routine; the "next" routine; and the "advance" routine. The "init" routine initializes the cursor so it points at the beginning of the first piece in the item. The "next" routine returns the page, page offset, and length (limited by both the page and item size) of the next unconsumed piece in the item. It also indicates to the caller whether the piece being returned is the last one in the data item. The "advance" routine consumes the requested number of bytes in the item (advancing the cursor). This is used to record the number of bytes from the current piece that were actually sent or received by the network code. It returns an indication of whether the result means the current piece has been fully consumed. This is used by the message send code to determine whether it should calculate the CRC for the next piece processed. The trail of a message is implemented as a ceph pagelist. The routines defined for it will be usable for non-trail pagelist data as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: abstract message dataAlex Elder2013-05-011-20/+51
| | | | | | | | | | | | | | | Group the types of message data into an abstract structure with a type indicator and a union containing fields appropriate to the type of data it represents. Use this to represent the pages, pagelist, bio, and trail in a ceph message. Verify message data is of type NONE in ceph_msg_data_set_*() routines. Since information about message data of type NONE really should not be interpreted, get rid of the other assertions in those functions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: be explicit about message data representationAlex Elder2013-05-011-12/+21
| | | | | | | | | | | | | | | | | | | | | | A ceph message has a data payload portion. The memory for that data (either the source of data to send or the location to place data that is received) is specified in several ways. The ceph_msg structure includes fields for all of those ways, but this mispresents the fact that not all of them are used at a time. Specifically, the data in a message can be in: - an array of pages - a list of pages - a list of Linux bios - a second list of pages (the "trail") (The two page lists are currently only ever used for outgoing data.) Impose more structure on the ceph message, making the grouping of some of these fields explicit. Shorten the name of the "page_alignment" field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: define ceph_msg_has_*() data macrosAlex Elder2013-05-011-0/+7
| | | | | | | | Define and use macros ceph_msg_has_*() to determine whether to operate on the pages, pagelist, bio, and trail fields of a message. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: record message data byte lengthAlex Elder2013-05-011-1/+1
| | | | | | | | | | Record the number of bytes of data in a page array rather than the number of pages in the array. It can be assumed that the page array is of sufficient size to hold the number of bytes indicated (and offset by the indicated alignment). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: isolate other message data fieldsAlex Elder2013-05-011-0/+5
| | | | | | | | | | | | | Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and ceph_msg_data_set_trail() to clearly abstract the assignment of the remaining data-related fields in a ceph message structure. Use the new functions in the osd client and mds client. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: set page info with byte lengthAlex Elder2013-05-011-1/+1
| | | | | | | | When setting page array information for message data, provide the byte length rather than the page count ceph_msg_data_set_pages(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: isolate message page field manipulationAlex Elder2013-05-011-9/+13
| | | | | | | | | | | | | | | | | | | | | | | Define a function ceph_msg_data_set_pages(), which more clearly abstracts the assignment page-related fields for data in a ceph message structure. Use this new function in the osd client and mds client. Ideally, these fields would never be set more than once (with BUG_ON() calls to guarantee that). At the moment though the osd client sets these every time it receives a message, and in the event of a communication problem this can happen more than once. (This will be resolved shortly, but setting up these helpers first makes it all a bit easier to work with.) Rearrange the field order in a ceph_msg structure to group those that are used to define the possible data payloads. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: kill ceph_msg->pagelist_countAlex Elder2013-05-011-1/+0
| | | | | | | The pagelist_count field is never actually used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: distinguish page array and pagelist countAlex Elder2013-05-011-1/+2
| | | | | | | | | Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: make ceph_msg->bio_seg be unsignedAlex Elder2013-05-011-1/+1
| | | | | | | | | | | | | | The bio_seg field is used by the ceph messenger in iterating through a bio. It should never have a negative value, so make it an unsigned. (I contemplated making it unsigned short to match the struct bio definition, but it offered no benefit.) Change variables used to hold bio_seg values to all be unsigned as well. Change two variable names in init_bio_iter() to match the convention used everywhere else. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* libceph: fix messenger CONFIG_BLOCK dependenciesAlex Elder2013-02-131-0/+2
| | | | | | | | | | | | | | The ceph messenger has a few spots that are only used when bio messages are supported, and that's only when CONFIG_BLOCK is defined. This surrounds a couple of spots with #ifdef's that would cause a problem if CONFIG_BLOCK were not present in the kernel configuration. This resolves: http://tracker.ceph.com/issues/3976 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
* UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel ↵David Howells2012-10-021-2/+2
| | | | | | | | | | | | system headers Convert #include "..." to #include <path/...> in kernel system headers. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
* libceph: clean up con flagsSage Weil2012-07-301-10/+0
| | | | | | | Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: Sage Weil <sage@inktank.com>