summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.infradead.org/~dwmw2/firmware-2.6Linus Torvalds2009-06-121-26/+103
|\ | | | | | | | | * git://git.infradead.org/~dwmw2/firmware-2.6: firmware: speed up request_firmware(), v3
| * firmware: speed up request_firmware(), v3David Woodhouse2009-05-151-26/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than calling vmalloc() repeatedly to grow the firmware image as we receive data from userspace, just allocate and fill individual pages. Then vmap() the whole lot in one go when we're done. A quick test with a 337KiB iwlagn firmware shows the time taken for request_firmware() going from ~32ms to ~5ms after I apply this patch. [v2: define PAGE_KERNEL_RO as PAGE_KERNEL where necessary, use min_t()] [v3: kunmap() takes the struct page *, not the virtual address] Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Tested-by: Sachin Sant <sachinp@in.ibm.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds2009-06-129-10/+442
|\ \ | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Remove lock_kernel from gfs2_put_super() GFS2: Add tracepoints
| * | GFS2: Remove lock_kernel from gfs2_put_super()Steven Whitehouse2009-06-121-4/+0
| | | | | | | | | | | | | | | | | | | | | It is not required here. Signed-off-by: Steven Whitehouse <swhiteho@redhat,com> Cc: Christoph Hellwig <hch@infradead.org>
| * | GFS2: Add tracepointsSteven Whitehouse2009-06-128-6/+442
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguestLinus Torvalds2009-06-1221-818/+1103
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits) lguest: add support for indirect ring entries lguest: suppress notifications in example Launcher lguest: try to batch interrupts on network receive lguest: avoid sending interrupts to Guest when no activity occurs. lguest: implement deferred interrupts in example Launcher lguest: remove obsolete LHREQ_BREAK call lguest: have example Launcher service all devices in separate threads lguest: use eventfds for device notification eventfd: export eventfd_signal and eventfd_fget for lguest lguest: allow any process to send interrupts lguest: PAE fixes lguest: PAE support lguest: Add support for kvm_hypercall4() lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated lguest: map switcher with executable page table entries lguest: fix writev returning short on console output lguest: clean up length-used value in example launcher lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. lguest: beyond ARRAY_SIZE of cpu->arch.gdt ...
| * | | lguest: add support for indirect ring entriesMark McLoughlin2009-06-121-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the VIRTIO_RING_F_INDIRECT_DESC feature. This is a simple matter of changing the descriptor walking code to operate on a struct vring_desc* and supplying it with an indirect table if detected. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: suppress notifications in example LauncherRusty Russell2009-06-121-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Guest only really needs to tell us about activity when we're going to listen to the eventfd: normally, we don't want to know. So if there are no available buffers, turn on notifications, re-check, then wait for the Guest to notify us via the eventfd, then turn notifications off again. There's enough else going on that the differences are in the noise. Before: Secs RxKicks TxKicks 1G TCP Guest->Host: 3.94 4686 32815 1M normal pings: 104 142862 1000010 1M 1k pings (-l 120): 57 142026 1000007 After: 1G TCP Guest->Host: 3.76 4691 32811 1M normal pings: 111 142859 997467 1M 1k pings (-l 120): 55 19648 501549 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: try to batch interrupts on network receiveRusty Russell2009-06-121-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than triggering an interrupt every time, we only trigger an interrupt when there are no more incoming packets (or the recv queue is full). However, the overhead of doing the select to figure this out is measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host TCP goes from 3.69 to 3.94 seconds. It's close to the noise though. I tested various timeouts, including reducing it as the number of pending packets increased, timing a 1 gigabyte TCP send from Guest -> Host and Host -> Guest (GSO disabled, to increase packet rate). // time tcpblast -o -s 65536 -c 16k 192.168.2.1:9999 > /dev/null Timeout Guest->Host Pkts/irq Host->Guest Pkts/irq Before 11.3s 1.0 6.3s 1.0 0 11.7s 1.0 6.6s 23.5 1 17.1s 8.8 8.6s 26.0 1/pending 13.4s 1.9 6.6s 23.8 2/pending 13.6s 2.8 6.6s 24.1 5/pending 14.1s 5.0 6.6s 24.4 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: avoid sending interrupts to Guest when no activity occurs.Rusty Russell2009-06-121-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we track how many buffers we've used, we can tell whether we really need to interrupt the Guest. This happens as a side effect of spurious notifications. Spurious notifications happen because it can take a while before the Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and meanwhile the Guest can more notifications. A real fix would be to use wake counts, rather than a suppression flag, but the practical difference is generally in the noise: the interrupt is usually coalesced into a pending one anyway so we just save a system call which isn't clearly measurable. Secs Spurious IRQS 1G TCP Guest->Host: 3.93 58 1M normal pings: 100 72 1M 1k pings (-l 120): 57 492904 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: implement deferred interrupts in example LauncherRusty Russell2009-06-121-19/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than sending an interrupt on every buffer, we only send an interrupt when we're about to wait for the Guest to send us a new one. The console input and network input still send interrupts manually, but the block device, network and console output queues can simply rely on this logic to send interrupts to the Guest at the right time. The patch is cluttered by moving trigger_irq() higher in the code. In practice, two factors make this optimization less interesting: (1) we often only get one input at a time, even for networking, (2) triggering an interrupt rapidly tends to get coalesced anyway. Before: Secs RxIRQS TxIRQs 1G TCP Guest->Host: 3.72 32784 32771 1M normal pings: 99 1000004 995541 100,000 1k pings (-l 120): 5 49510 49058 After: 1G TCP Guest->Host: 3.69 32809 32769 1M normal pings: 99 1000004 996196 100,000 1k pings (-l 120): 5 52435 52361 (Note the interrupt count on 100k pings goes *up*: see next patch). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: remove obsolete LHREQ_BREAK callRusty Russell2009-06-124-43/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We no longer need an efficient mechanism to force the Guest back into host userspace, as each device is serviced without bothering the main Guest process (aka. the Launcher). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: have example Launcher service all devices in separate threadsRusty Russell2009-06-121-574/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently lguest has three threads: the main Launcher thread, a Waker thread, and a thread for the block device (because synchronous block was simply too painful to bear). The Waker selects() on all the input file descriptors (eg. stdin, net devices, pipe to the block thread) and when one becomes readable it calls into the kernel to kick the Launcher thread out into userspace, which repeats the poll, services the device(s), and then tells the kernel to release the Waker before re-entering the kernel to run the Guest. Also, to make a slightly-decent network transmit routine, the Launcher would suppress further network interrupts while it set a timer: that signal handler would write to a pipe, which would rouse the Waker which would prod the Launcher out of the kernel to check the network device again. Now we can convert all our virtqueues to separate threads: each one has a separate eventfd for when the Guest pokes the device, and can trigger interrupts in the Guest directly. The linecount shows how much this simplifies, but to really bring it home, here's an strace analysis of single Guest->Host ping before: * Guest sends packet, notifies xmit vq, return control to Launcher * Launcher clears notification flag on xmit ring * Launcher writes packet to TUN device writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108 * Launcher sets up interrupt for Guest (xmit ring is empty) write(10, "\2\0\0\0\3\0\0\0", 8) = 0 * Launcher sets up timer for interrupt mitigation setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0 * Launcher re-runs guest pread64(10, 0xbfa5f4d4, 4, 0) ... * Waker notices reply packet in tun device (it was in select) select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4]) * Waker kicks Launcher out of guest: pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0 * Launcher returns from running guest: ... = -1 EAGAIN (Resource temporarily unavailable) * Launcher looks at input fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0}) * Launcher reads pong from tun device: readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108 * Launcher injects guest notification: write(10, "\2\0\0\0\2\0\0\0", 8) = 0 * Launcher rechecks fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout) * Launcher clears Waker: pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0 * Launcher reruns Guest: pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted) * Signal comes in, uses pipe to wake up Launcher: --- SIGALRM (Alarm clock) @ 0 (0) --- write(8, "\0", 1) = 1 sigreturn() = ? (mask now []) * Waker sees write on pipe: select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6]) * Waker kicks Launcher out of Guest: pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0 * Launcher exits from kernel: pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable) * Launcher looks to see what fd woke it: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) * Launcher reads timeout fd, sets notification flag on xmit ring read(6, "\0", 32) = 1 * Launcher rechecks fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout) * Launcher clears Waker: pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0 * Launcher resumes Guest: pread64(10, "\0p\0\4", 4, 0) .... strace analysis of single Guest->Host ping after: * Guest sends packet, notifies xmit vq, creates event on eventfd. * Network xmit thread wakes from read on eventfd: read(7, "\1\0\0\0\0\0\0\0", 8) = 8 * Network xmit thread writes packet to TUN device writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108 * Network recv thread wakes up from read on tunfd: readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108 * Network recv thread sets up interrupt for the Guest write(6, "\2\0\0\0\2\0\0\0", 8) = 0 * Network recv thread goes back to reading tunfd 13:39:42.460285 readv(4, <unfinished ...> * Network xmit thread sets up interrupt for Guest (xmit ring is empty) write(6, "\2\0\0\0\3\0\0\0", 8) = 0 * Network xmit thread goes back to reading from eventfd read(7, <unfinished ...> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: use eventfds for device notificationRusty Russell2009-06-125-6/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Davide Libenzi <davidel@xmailserver.org>
| * | | eventfd: export eventfd_signal and eventfd_fget for lguestRusty Russell2009-06-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lguest wants to attach eventfds to guest notifications, and lguest is usually a module. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: Davide Libenzi <davidel@xmailserver.org>
| * | | lguest: allow any process to send interruptsRusty Russell2009-06-123-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently only allow the Launcher process to send interrupts, but it as we already send interrupts from the hrtimer, it's a simple matter of extracting that code into a common set_interrupt routine. As we switch to a thread per virtqueue, this avoids a bottleneck through the main Launcher process. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: PAE fixesRusty Russell2009-06-121-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) j wasn't initialized in setup_pagetables, so they weren't set up for me causing immediate guest crashes. 2) gpte_addr should not re-read the pmd from the Guest. Especially not BUG_ON() based on the value. If we ever supported SMP guests, they could trigger that. And the Launcher could also trigger it (tho currently root-only). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: PAE supportMatias Zabaljauregui2009-06-129-48/+403
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This version requires that host and guest have the same PAE status. NX cap is not offered to the guest, yet. Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: Add support for kvm_hypercall4()Matias Zabaljauregui2009-06-122-10/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for kvm_hypercall4(); PAE wants it. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGDMatias Zabaljauregui2009-06-125-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name (That's really what it is, and the confusion gets worse with PAE support) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
| * | | lguest: use native_set_* macros, which properly handle 64-bit entries when ↵Matias Zabaljauregui2009-06-122-21/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAE is activated Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: map switcher with executable page table entriesMatias Zabaljauregui2009-06-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Map switcher with executable page table entries. (This bug didn't matter before PAE and hence NX support -- RR) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: fix writev returning short on console outputRusty Russell2009-06-121-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | I've never seen it here, but I can't find anywhere that says writev will write everything. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: clean up length-used value in example launcherRusty Russell2009-06-121-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "len" field in the used ring for virtio indicates the number of bytes *written* to the buffer. This means the guest doesn't have to zero the buffers in advance as it always knows the used length. Erroneously, the console and network example code puts the length *read* into that field. The guest ignores it, but it's wrong. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.Matias Zabaljauregui2009-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If GDT_ENTRIES were every > 256, this could become a problem. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: beyond ARRAY_SIZE of cpu->arch.gdtRoel Kluin2009-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not go beyond ARRAY_SIZE of cpu->arch.gdt Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: clean up example launcher compile flags.Rusty Russell2009-06-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 18 months ago 5bbf89fc260830f3f58b331d946a16b39ad1ca2d changed to loading bzImages directly, and no longer manually ungzipping them, so we no longer need libz. Also, -m32 is useful for those on 64-bit platforms (and harmless on 32-bit). Reported-by: Ron Minnich <rminnich@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: optimize by coding restore_flags and irq_enable in assembler.Rusty Russell2009-06-123-30/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The downside of the last patch which made restore_flags and irq_enable check interrupts is that they are now too big to be patched directly into the callsites, so the C versions are always used. But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all the registers. In fact, we don't need any registers in the fast path, so we can do better than this if we actually code them in assembler. The results are in the noise, but since it's about the same amount of code, it's worth applying. 1GB Guest->Host: input(suppressed),output(suppressed) Before: Seconds: 0:16.53 Packets: 377268,753673 Interrupts: 22461,24297 Notifications: 1(5245),21303(732370) Net IRQs triggered: 377023(245),42578(711095) After: Seconds: 0:16.48 Packets: 377289,753673 Interrupts: 22281,24465 Notifications: 1(5245),21296(732377) Net IRQs triggered: 377060(229),42564(711109) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: improve interrupt handling, speed up stream networkingRusty Russell2009-06-128-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lguest never checked for pending interrupts when enabling interrupts, and things still worked. However, it makes a significant difference to TCP performance, so it's time we fixed it by introducing a pending_irq flag and checking it on irq_restore and irq_enable. These two routines are now too big to patch into the 8/10 bytes patch space, so we drop that code. Note: The high latency on interrupt delivery had a very curious effect: once everything else was optimized, networking without GSO was faster than networking with GSO, since more interrupts were sent and hence a greater chance of one getting through to the Guest! Note2: (Almost) Closing the same loophole for iret doesn't have any measurable effect, so I'm leaving that patch for the moment. Before: 1GB tcpblast Guest->Host: 30.7 seconds 1GB tcpblast Guest->Host (no GSO): 76.0 seconds After: 1GB tcpblast Guest->Host: 6.8 seconds 1GB tcpblast Guest->Host (no GSO): 27.8 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: fix race in halt codeRusty Russell2009-06-123-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting that a timer or the Waker will wake_up_process() us. But we do it in a stupid way, leaving a classic missing wakeup race. So split maybe_do_interrupt() into interrupt_pending() and try_deliver_interrupt(), and check maybe_do_interrupt() and the "break_out" flag before calling schedule. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: remove invalid interrupt forcing logic.Rusty Russell2009-06-121-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 20887611523e749d99cc7d64ff6c97d27529fbae (lguest: notify on empty) introduced lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on interrupts all the time. Because we always process one buffer at a time, the inflight count is always 0 when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from the Guest. It should be looking to see if there are more buffers in the Guest's queue: if it's empty, then we force an interrupt. This makes little difference, since we usually have an empty queue; but that's the subject of another patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: fix lguest wake on guest clock tick, or fd activityRusty Russell2009-06-122-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Launcher could be inside the Guest on another CPU; wake_up_process will do nothing because it is "running". kick_process will knock it back into our kernel in this case, otherwise we'll miss it until the next guest exit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | sched: export kick_processRusty Russell2009-06-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lguest needs kick_process: wake_up_process() does nothing if a process is running, which isn't sufficient (we need it in the kernel). And lguest support is usually modular. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>
| * | | lguest: get more serious about wmb() in example Launcher codeRusty Russell2009-06-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the Launcher process runs the Guest, it doesn't have to be very serious about its barriers: the Guest isn't running while we are (Guest is UP). Before we change to use threads to service devices, we need to fix this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: clean up lguest_init_IRQRusty Russell2009-06-121-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Copy from arch/x86/kernel/irqinit_32.c: we don't use the vectors beyond LGUEST_IRQS (if any), but we might as well set them all. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: cleanup passing of /dev/lguest fd around example launcher.Rusty Russell2009-06-121-55/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We hand the /dev/lguest fd everywhere; it's far neater to just make it a global (it already is, in fact, hidden in the waker_fds struct). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | lguest: be paranoid about guest playing with device descriptors.Rusty Russell2009-06-121-10/+17
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't trust the values in the device descriptor table once the guest has booted, so keep local copies. They could set them to strange values then cause us to segv (they're 8 bit values, so they can't make our pointers go too wild). This becomes more important with the following patches which read them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtioLinus Torvalds2009-06-1216-158/+592
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio: virtio: enhance id_matching for virtio drivers virtio: fix id_matching for virtio drivers virtio: handle short buffers in virtio_rng. virtio_blk: add missing __dev{init,exit} markings virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) virtio: teach virtio_has_feature() about transport features virtio: expose features in sysfs virtio_pci: optional MSI-X support virtio_pci: split up vp_interrupt virtio: find_vqs/del_vqs virtio operations virtio: add names to virtqueue struct, mapping from devices to queues. virtio: meet virtio spec by finalizing features before using device virtio: fix obsolete documentation on probe function
| * | | virtio: enhance id_matching for virtio driversChristian Borntraeger2009-06-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows a virtio driver to use VIRTIO_DEV_ANY_ID for the device id. This will be used by a test module that can be bound to any virtio device. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: fix id_matching for virtio driversChristian Borntraeger2009-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug never appeared, since all current virtio drivers use VIRTIO_DEV_ANY_ID for the vendor field. If a real vendor would be used, the check in virtio_id_match is wrong - it returns 0 if id->vendor == dev->id.vendor. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: handle short buffers in virtio_rng.Rusty Russell2009-06-121-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device fills less than 4 bytes of our random buffer, we'll BUG_ON. It's nicer to handle the case where it partially fills the buffer (the protocol doesn't explicitly bad that). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio_blk: add missing __dev{init,exit} markingsMike Frysinger2009-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remove member of the virtio_driver structure uses __devexit_p(), so the remove function itself should be marked with __devexit. And where there be __devexit on the remove, so is there __devinit on the probe. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)Mark McLoughlin2009-06-122-2/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new feature flag for indirect ring entries. These are ring entries which point to a table of buffer descriptors. The idea here is to increase the ring capacity by allowing a larger effective ring size whereby the ring size dictates the number of requests that may be outstanding, rather than the size of those requests. This should be most effective in the case of block I/O where we can potentially benefit by concurrently dispatching a large number of large requests. Even in the simple case of single segment block requests, this results in a threefold increase in ring capacity. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: teach virtio_has_feature() about transport featuresMark McLoughlin2009-06-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers don't add transport features to their table, so we shouldn't check these with virtio_check_driver_offered_feature(). We could perhaps add an ->offered_feature() virtio_config_op, but that perhaps that would be overkill for a consitency check like this. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: expose features in sysfsRusty Russell2009-06-121-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Each device negotiates feature bits; expose these in sysfs to help diagnostics and debugging. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio_pci: optional MSI-X supportMichael S. Tsirkin2009-06-122-20/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements optional MSI-X support in virtio_pci. MSI-X is used whenever the host supports at least 2 MSI-X vectors: 1 for configuration changes and 1 for virtqueues. Per-virtqueue vectors are allocated if enough vectors available. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ whitespace, style)
| * | | virtio_pci: split up vp_interruptMichael S. Tsirkin2009-06-121-19/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reorganizes virtio-pci code in vp_interrupt slightly, so that it's easier to add per-vq MSI support on top. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: find_vqs/del_vqs virtio operationsMichael S. Tsirkin2009-06-1210-89/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations, and updates all drivers. This is needed for MSI support, because MSI needs to know the total number of vectors upfront. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
| * | | virtio: add names to virtqueue struct, mapping from devices to queues.Rusty Russell2009-06-1214-31/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a linked list of all virtqueues for a virtio device: this helps for debugging and is also needed for upcoming interface change. Also, add a "name" field for clearer debug messages. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | | virtio: meet virtio spec by finalizing features before using deviceRusty Russell2009-06-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Virtio devices are supposed to negotiate features before they start using the device, but the current code doesn't do this. This is because the driver's probe() function invariably has to add buffers to a virtqueue, or probe the disk (virtio_blk). This currently doesn't matter since no existing backend is strict about the feature negotiation. But it's possible to imagine a future feature which completely changes how a device operates: in this case, we'd need to acknowledge it before using the device. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>