summaryrefslogtreecommitdiffstats
path: root/net/ceph
Commit message (Collapse)AuthorAgeFilesLines
* libceph: fix page calculation for non-page-aligned ioSage Weil2011-06-131-4/+6
| | | | | | | | Set the page count correctly for non-page-aligned IO. We were already doing this correctly for alignment, but not the page count. Fixes DIRECT_IO writes from unaligned pages. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: fix sync vs canceled writeSage Weil2011-06-071-5/+10
| | | | | | | If we cancel a write, trigger the safe completions to prevent a sync from blocking indefinitely in ceph_osdc_sync(). Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: subscribe to osdmap when cluster is fullSage Weil2011-05-241-0/+9
| | | | | | | | When the cluster is marked full, subscribe to subsequent map updates to ensure we find out promptly when it is no longer full. This will prevent us from spewing ENOSPC for (much) longer than necessary. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: handle new osdmap down/state change encodingSage Weil2011-05-241-3/+8
| | | | | | | | Old incrementals encode a 0 value (nearly always) when an osd goes down. Change that to allow any state bit(s) to be flipped. Special case 0 to mean flip the CEPH_OSD_UP bit to mimic the old behavior. Signed-off-by: Sage Weil <sage@newdream.net>
* ceph: check return value for start_request in writepagesSage Weil2011-05-191-1/+7
| | | | | | | Since we pass the nofail arg, we should never get an error; BUG if we do. (And fix the function to not return an error if __map_request fails.) Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: add missing breaks in addr_set_portSage Weil2011-05-191-0/+2
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix TAG_WAIT caseSage Weil2011-05-191-1/+3
| | | | | | | If we get a WAIT as a client something went wrong; error out. And don't fall through to an unrelated case. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix osdmap timestamp assignmentSage Weil2011-05-191-1/+1
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: use snprintf for unknown addrsSage Weil2011-05-191-1/+2
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: use snprintf for formatting object nameSage Weil2011-05-191-1/+1
| | | | Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix uninitialized value when no get_authorizer method is setSage Weil2011-05-191-5/+6
| | | | | | | | If there is no get_authorizer method we set the out_kvec to a bogus pointer. The length is also zero in that case, so it doesn't much matter, but it's better not to add the empty item in the first place. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: handle connection reopen race with callbacksSage Weil2011-05-191-13/+51
| | | | | | | | | | | | | | | | | | | If a connection is closed and/or reopened (ceph_con_close, ceph_con_open) it can race with a callback. con_work does various state checks for closed or reopened sockets at the beginning, but drops con->mutex before making callbacks. We need to check for state bit changes after retaking the lock to ensure we restart con_work and execute those CLOSED/OPENING tests or else we may end up operating under stale assumptions. In Jim's case, this was causing 'bad tag' errors. There are four cases where we re-take the con->mutex inside con_work: catch them all and return EAGAIN from try_{read,write} so that we can restart con_work. Reported-by: Jim Schutt <jaschut@sandia.gov> Tested-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix ceph_osdc_alloc_request error checksSage Weil2011-05-031-2/+2
| | | | | | ceph_osdc_alloc_request returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix ceph_msg_new error pathHenry C Chang2011-05-031-13/+13
| | | | | | | | | | If memory allocation failed, calling ceph_msg_put() will cause GPF since some of ceph_msg variables are not initialized first. Fix Bug #970. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-04-141-3/+9
|\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix linger request requeueing
| * libceph: fix linger request requeueingSage Weil2011-04-061-3/+9
| | | | | | | | | | | | | | | | | | | | | | Fix the request transition from linger -> normal request. The key is to preserve r_osd and requeue on the same OSD. Reregister as a normal request, add the request to the proper queues, then unregister the linger. Fix the unregister helper to avoid clearing r_osd (and also simplify the parallel check in __unregister_request()). Reported-by: Henry Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds2011-04-071-1/+1
|\ \ | |/ |/| | | | | * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
| * Fix common misspellingsLucas De Marchi2011-03-311-1/+1
| | | | | | | | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* | libceph: Create a new key type "ceph".Tommi Virtanen2011-03-293-8/+77
| | | | | | | | | | | | | | | | This allows us to use existence of the key type as a feature test, from userspace. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | libceph: Get secret from the kernel keys api when mounting with key=NAME.Tommi Virtanen2011-03-292-0/+59
| | | | | | | | | | Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: Move secret key parsing earlier.Tommi Virtanen2011-03-296-15/+59
| | | | | | | | | | | | | | | | | | This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | libceph: fix null dereference when unregistering linger requestsSage Weil2011-03-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | We should only clear r_osd if we are neither registered as a linger or a regular request. We may unregister as a linger while still registered as a regular request (e.g., in reset_osd). Incorrectly clearing r_osd there leads to a null pointer dereference in __send_request. Also simplify the parallel check in __unregister_request() where we just removed r_osd_item and know it's empty. Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: unlock on error in ceph_osdc_start_request()Dan Carpenter2011-03-291-1/+3
| | | | | | | | | | | | | | There was a missing unlock on the error path if __map_request() failed. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: fix possible NULL pointer dereferenceMariusz Kozlowski2011-03-261-1/+1
| | | | | | | | | | | | | | This patch fixes 'event_work' dereference before it is checked for NULL. Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: Sage Weil <sage@newdream.net>
* | ceph: flush msgr_wq during mds_client shutdownSage Weil2011-03-251-2/+2
|/ | | | | | | | | | | | | | | The release method for mds connections uses a backpointer to the mds_client, so we need to flush the workqueue of any pending work (and ceph_connection references) prior to freeing the mds_client. This fixes an oops easily triggered under UML by while true ; do mount ... ; umount ... ; done Also fix an outdated comment: the flush in ceph_destroy_client only flushes OSD connections out. This bug is basically an artifact of the ceph -> ceph+libceph conversion. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: add lingering request and watch/notify event frameworkYehuda Sadeh2011-03-222-12/+374
| | | | | | | | | | | | | | Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix osd request queuing on osdmap updatesSage Weil2011-03-211-133/+122
| | | | | | | | | | | | | | | | | | | | | | | If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: Fix base64-decoding when input ends in newline.Tommi Virtanen2011-03-151-1/+3
| | | | | | | | | | | It used to return -EINVAL because it thought the end was not aligned to 4 bytes. Clean up superfluous src < end test in if, the while itself guarantees that. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix msgr standby handlingSage Weil2011-03-041-8/+22
| | | | | | | | | | | | | | | | | The standby logic used to be pretty dependent on the work requeueing behavior that changed when we switched to WQ_NON_REENTRANT. It was also very fragile. Restructure things so that: - We clear WRITE_PENDING when we set STANDBY. This ensures we will requeue work when we wake up later. - con_work backs off if STANDBY is set. There is nothing to do if we are in standby. - clear_standby() helper is called by both con_send() and con_keepalive(), the two actions that can wake us up again. Move the connect_seq++ logic here. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix msgr keepalive flagSage Weil2011-03-041-5/+4
| | | | | | | There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix msgr backoffSage Weil2011-03-041-2/+28
| | | | | | | | | | | | | | | | | With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: retry after authorization failureSage Weil2011-03-031-2/+0
| | | | | | | | | | | | If we mark the connection CLOSED we will give up trying to reconnect to this server instance. That is appropriate for things like a protocol version mismatch that won't change until the server is restarted, at which point we'll get a new addr and reconnect. An authorization failure like this is probably due to the server not properly rotating it's secret keys, however, and should be treated as transient so that the normal backoff and retry behavior kicks in. Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: fix handling of short returns from get_user_pagesSage Weil2011-03-031-5/+13
| | | | | | | | get_user_pages() can return fewer pages than we ask for. We were returning a bogus pointer/error code in that case. Instead, loop until we get all the pages we want or get an error we can return to the caller. Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-02-211-31/+31
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling
| * libceph: fix socket write error handlingSage Weil2011-01-251-11/+12
| | | | | | | | | | | | | | Pass errors from writing to the socket up the stack. If we get -EAGAIN, return 0 from the helper to simplify the callers' checks. Signed-off-by: Sage Weil <sage@newdream.net>
| * libceph: fix socket read error handlingSage Weil2011-01-251-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | If we get EAGAIN when trying to read from the socket, it is not an error. Return 0 from the helper in this case to simplify the error handling cases in the caller (indirectly, try_read). Fix try_read to pass any error to it's caller (con_work) instead of almost always returning 0. This let's us respond to things like socket disconnects. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-01-133-45/+8
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode
| * net/ceph: make ceph_msgr_wq non-reentrantTejun Heo2011-01-121-44/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph messenger code does a rather complex dancing around multithread workqueue to make sure the same work item isn't executed concurrently on different CPUs. This restriction can be provided by workqueue with WQ_NON_REENTRANT. Make ceph_msgr_wq non-reentrant workqueue with the default concurrency level and remove the QUEUED/BUSY logic. * This removes backoff handling in con_work() but it couldn't reliably block execution of con_work() to begin with - queue_con() can be called after the work started but before BUSY is set. It seems that it was an optimization for a rather cold path and can be safely removed. * The number of concurrent work items is bound by the number of connections and connetions are independent from each other. With the default concurrency level, different connections will be executed independently. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: Always free allocated memory in osdmap_decode()Jesper Juhl2011-01-121-1/+3
| | | | | | | | | | | | | | | | Always free memory allocated to 'pi' in net/ceph/osdmap.c::osdmap_decode(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Sage Weil <sage@newdream.net>
| * ceph: add dir_layout to inodeSage Weil2011-01-121-0/+3
| | | | | | | | | | | | | | | | | | Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'master' of ↵David S. Miller2010-12-263-27/+35
|\| | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c
| * Merge branch 'for-linus' of ↵Linus Torvalds2010-12-202-11/+12
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path
| | * ceph: handle partial result from get_user_pagesHenry C Chang2010-12-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The get_user_pages() helper can return fewer than the requested pages. Error out in that case, and clean up the partial result. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
| | * ceph: mark user pages dirty on direct-io readsHenry C Chang2010-12-171-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
| | * ceph: fix msgr_init error pathSage Weil2010-12-131-5/+3
| | | | | | | | | | | | | | | | | | create_workqueue() returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2010-11-291-22/+0
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) af_unix: limit recursion level pch_gbe driver: The wrong of initializer entry pch_gbe dreiver: chang author ucc_geth: fix ucc halt problem in half duplex mode inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners ehea: Add some info messages and fix an issue hso: fix disable_net NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty cxgb4vf: fix setting unicast/multicast addresses ... net, ppp: Report correct error code if unit allocation failed DECnet: don't leak uninitialized stack byte au1000_eth: fix invalid address accessing the MAC enable register dccp: fix error in updating the GAR tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) netns: Don't leak others' openreq-s in proc Net: ceph: Makefile: Remove unnessary code vhost/net: fix rcu check usage econet: fix CVE-2010-3848 econet: fix CVE-2010-3850 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849 ...
| * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2010-11-241-1/+1
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: of/phylib: Use device tree properties to initialize Marvell PHYs. phylib: Add support for Marvell 88E1149R devices. phylib: Use common page register definition for Marvell PHYs. qlge: Fix incorrect usage of module parameters and netdev msg level ipv6: fix missing in6_ifa_put in addrconf SuperH IrDA: correct Baud rate error correction atl1c: Fix hardware type check for enabling OTP CLK net: allow GFP_HIGHMEM in __vmalloc() bonding: change list contact to netdev@vger.kernel.org e1000: fix screaming IRQ
| * | | ceph: explicitly specify page alignment in network messagesSage Weil2010-11-092-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The alignment used for reading data into or out of pages used to be taken from the data_off field in the message header. This only worked as long as the page alignment matched the object offset, breaking direct io to non-page aligned offsets. Instead, explicitly specify the page alignment next to the page vector in the ceph_msg struct, and use that instead of the message header (which probably shouldn't be trusted). The alloc_msg callback is responsible for filling in this field properly when it sets up the page vector. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: make page alignment explicit in osd interfaceSage Weil2010-11-091-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | ceph: fix comment, remove extraneous argsSage Weil2010-11-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | The offset/length arguments aren't used. Signed-off-by: Sage Weil <sage@newdream.net>