| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.
The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.
Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
| |
This adds a trace event for xdp redirect which may help when debugging
XDP programs that use redirect bpf commands.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for redirect to xdp generic creating a fall back for
devices that do not yet have support and allowing test infrastructure
using veth pairs to be built.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for a bpf_redirect helper function to the XDP
infrastructure. For now this only supports redirecting to the egress
path of a port.
In order to support drivers handling a xdp_buff natively this patches
uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
to the specified device.
If the program specifies either (a) an unknown device or (b) a device
that does not support the operation a BPF warning is thrown and the
XDP_ABORTED error code is returned.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in Faro during Netfilter Workshop 2017, RB trees can be
used with RCU, using a seqlock.
Note that net/rxrpc/conn_service.c is already using this.
This patch converts inetpeer from AVL tree to RB tree, since it allows
to remove private AVL implementation in favor of shared RB code.
$ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
text data bss dec hex filename
3195 40 128 3363 d23 net/ipv4/inetpeer.before
1562 24 0 1586 632 net/ipv4/inetpeer.after
The same technique can be used to speed up
net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All unix sockets now account inflight FDs to the respective sender.
This was introduced in:
commit 712f4aad406bb1ed67f3f98d04c044191f0ff593
Author: willy tarreau <w@1wt.eu>
Date: Sun Jan 10 07:54:56 2016 +0100
unix: properly account for FDs passed over unix sockets
and further refined in:
commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6
Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed Feb 3 02:11:03 2016 +0100
unix: correctly track in-flight fds in sending process user_struct
Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.
Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.
Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A<-B' to
mean `queue-B-on-A', the following sequence circumvents the recursion
level easily:
A<-B
B<-C
C<-D
...
Y<-Z
resulting in:
A<-B<-C<-...<-Z
With all of this in mind, lets drop the recursion limit. It has no
additional security value, anymore. On the contrary, it randomly
confuses message brokers that try to forward file-descriptors, since
any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
maliciously modifies the FD while inflight.
Cc: Alban Crequy <alban.crequy@collabora.co.uk>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_hmac_algo_param_t, and
replace with struct sctp_hmac_algo_param in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_chunks_param_t, and
replace with struct sctp_chunks_param in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_random_param_t, and
replace with struct sctp_random_param in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_supported_ext_param_t, and
replace with struct sctp_supported_ext_param in the places where it's
using this typedef.
It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_adaptation_ind_param_t, and
replace with struct sctp_adaptation_ind_param in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
Remove it, there is even no places using it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_supported_addrs_param_t, and
replace with struct sctp_supported_addrs_param in the places where it's
using this typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
| |
Remove this typedef, there is even no places using it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_cookie_preserve_param_t, and
replace with struct sctp_cookie_preserve_param in the places where it's
using this typedef.
It is also to fix some indents in sctp_sf_do_5_2_6_stale().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_ipv6addr_param_t, and replace
with struct sctp_ipv6addr_param in the places where it's using this
typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
| |
This patch is to remove the typedef sctp_ipv4addr_param_t, and replace
with struct sctp_ipv4addr_param in the places where it's using this
typedef.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The definition of an "anycast destination address" has been tweaked as a
side-effect of commit 2647a9b07032 ("ipv6: Remove external dependency on
rt6i_gateway and RTF_ANYCAST"). The first address of a point-to-point
/127 subnet is now considered as an anycast address. This prevents
ICMPv6 errors to be returned to a sender of such a subnet and breaks
PMTU discovery.
This can be reproduced with:
ip link add name out6 type veth peer name in6
ip link add name out7 type veth peer name in7
ip link set mtu 1400 dev out7
ip link set mtu 1400 dev in7
ip netns add next-hop
ip netns add next-next-hop
ip link set netns next-hop dev in6
ip link set netns next-hop dev out7
ip link set netns next-next-hop dev in7
ip link set up dev out6
ip addr add 2001:db8:1::12/127 dev out6
ip netns exec next-hop ip link set up dev in6
ip netns exec next-hop ip link set up dev out7
ip netns exec next-hop ip addr add 2001:db8:1::13/127 dev in6
ip netns exec next-hop ip addr add 2001:db8:1::14/127 dev out7
ip netns exec next-hop ip route add default via 2001:db8:1::15
ip netns exec next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
ip netns exec next-next-hop ip link set up dev in7
ip netns exec next-next-hop ip addr add 2001:db8:1::15/127 dev in7
ip netns exec next-next-hop ip addr add 2001:db8:1::50/128 dev in7
ip netns exec next-next-hop ip route add default via 2001:db8:1::14
ip netns exec next-next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
ip route add 2001:db8:1::48/123 via 2001:db8:1::13
sleep 4
ping -M do -s 1452 -c 3 2001:db8:1::50 || true
ip route get 2001:db8:1::50
Before the patch, we get:
2001:db8:1::50 from :: via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 1024 pref medium
After the patch, we get:
2001:db8:1::50 via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 0
cache expires 578sec mtu 1400 pref medium
Fixes: 2647a9b07032 ("ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST")
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull networking fixes from David Miller:
1) Fix 64-bit division in mlx5 IPSEC offload support, from Ilan Tayari
and Arnd Bergmann.
2) Fix race in statistics gathering in bnxt_en driver, from Michael
Chan.
3) Can't use a mutex in RCU reader protected section on tap driver, from
Cong WANG.
4) Fix mdb leak in bridging code, from Eduardo Valentin.
5) Fix free of wrong pointer variable in nfp driver, from Dan Carpenter.
6) Buffer overflow in brcmfmac driver, from Arend van SPriel.
7) ioremap_nocache() return value needs to be checked in smsc911x
driver, from Alexey Khoroshilov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
net: stmmac: revert "support future possible different internal phy mode"
sfc: don't read beyond unicast address list
datagram: fix kernel-doc comments
socket: add documentation for missing elements
smsc911x: Add check for ioremap_nocache() return code
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
net: hns: Bugfix for Tx timeout handling in hns driver
net: ipmr: ipmr_get_table() returns NULL
nfp: freeing the wrong variable
mlxsw: spectrum_switchdev: Check status of memory allocation
mlxsw: spectrum_switchdev: Remove unused variable
mlxsw: spectrum_router: Fix use-after-free in route replace
mlxsw: spectrum_router: Add missing rollback
samples/bpf: fix a build issue
bridge: mdb: fix leak on complete_info ptr on fail path
tap: convert a mutex to a spinlock
cxgb4: fix BUG() on interrupt deallocating path of ULD
qed: Fix printk option passed when printing ipv6 addresses
net: Fix minor code bug in timestamping.txt
net: stmmac: Make 'alloc_dma_[rt]x_desc_resources()' look even closer
...
|
| |
| |
| |
| |
| |
| |
| | |
Fill in missing kernel-doc for missing elements in struct sock.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"This pull request contains:
- i2c core reorganization. One source file became too monolithic. It
is now split up, yet we still have the same named object as the
final output. This should ease maintenance.
- new drivers: ZTE ZX2967 family, ASPEED 24XX/25XX
- designware driver gained slave mode support
- xgene-slimpro driver gained ACPI support
- bigger overhaul for pca-platform driver
- the algo-bit module now supports messages with enforced STOP
- slightly bigger than usual set of driver updates and improvements
and with much appreciated quality assurance from Andy Shevchenko"
* 'i2c/for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (51 commits)
i2c: Provide a stub for i2c_detect_slave_mode()
i2c: designware: Let slave adapter support be optional
i2c: designware: Make HW init functions static
i2c: designware: fix spelling mistakes
i2c: pca-platform: propagate error from i2c_pca_add_numbered_bus
i2c: pca-platform: correctly set algo_data.reset_chip
i2c: acpi: Do not create i2c-clients for LNXVIDEO ACPI devices
i2c: designware: enable SLAVE in platform module
i2c: designware: add SLAVE mode functions
i2c: zx2967: drop COMPILE_TEST dependency
i2c: zx2967: always use the same device when printing errors
i2c: pca-platform: use dev_warn/dev_info instead of printk
i2c: pca-platform: use device managed allocations
i2c: pca-platform: add devicetree awareness
i2c: pca-platform: switch to struct gpio_desc
dt-bindings: add bindings for i2c-pca-platform
i2c: cadance: fix ctrl/addr reg write order
i2c: zx2967: add i2c controller driver for ZTE's zx2967 family
dt: bindings: add documentation for zx2967 family i2c controller
i2c: algo-bit: add support for I2C_M_STOP
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Drivers would like to call i2c_detect_slave_mode() even if !I2C_SLAVE.
Give them what they want to,
Otherwise kernel will not compile:
drivers/i2c/busses/i2c-designware-platdrv.c: In function ‘dw_i2c_plat_probe’:
drivers/i2c/busses/i2c-designware-platdrv.c:331:6: error: implicit declaration of function ‘i2c_detect_slave_mode’ [-Werror=implicit-function-declaration]
if (i2c_detect_slave_mode(&pdev->dev))
^~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
Fixes: 6e38cf3b4421 ("i2c: designware: Let slave adapter support be optional")
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
No platform currently upstream makes use of this platform_data anymore.
The ones that did are converted to DT meanwhile. So, remove it. The old
platforms likely don't have the 'clks_per_cnt' feature, otherwise it
would have been implemented by now. And in the unlikely case they need
to setup a different bus speed, we should rather go for a generic i2c
platform data just for that.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Break out the exported SMBus functions and the emulation layer into a
separate file. This also involved splitting up the tracing header into
an I2C and an SMBus part.
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This update comes with:
- Support for lockless operation in the ARM io-pgtable code.
This is an important step to solve the scalability problems in the
common dma-iommu code for ARM
- Some Errata workarounds for ARM SMMU implemenations
- Rewrite of the deferred IO/TLB flush code in the AMD IOMMU driver.
The code suffered from very high flush rates, with the new
implementation the flush rate is down to ~1% of what it was before
- Support for amd_iommu=off when booting with kexec.
The problem here was that the IOMMU driver bailed out early without
disabling the iommu hardware, if it was enabled in the old kernel
- The Rockchip IOMMU driver is now available on ARM64
- Align the return value of the iommu_ops->device_group call-backs to
not miss error values
- Preempt-disable optimizations in the Intel VT-d and common IOVA
code to help Linux-RT
- Various other small cleanups and fixes"
* tag 'iommu-updates-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (60 commits)
iommu/vt-d: Constify intel_dma_ops
iommu: Warn once when device_group callback returns NULL
iommu/omap: Return ERR_PTR in device_group call-back
iommu: Return ERR_PTR() values from device_group call-backs
iommu/s390: Use iommu_group_get_for_dev() in s390_iommu_add_device()
iommu/vt-d: Don't disable preemption while accessing deferred_flush()
iommu/iova: Don't disable preempt around this_cpu_ptr()
iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #126
iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
ACPI/IORT: Fixup SMMUv3 resource size for Cavium ThunderX2 SMMUv3 model
iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
iommu/arm-smmu-v3: Remove io-pgtable spinlock
iommu/arm-smmu: Remove io-pgtable spinlock
iommu/io-pgtable-arm-v7s: Support lockless operation
iommu/io-pgtable-arm: Support lockless operation
iommu/io-pgtable: Introduce explicit coherency
iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
...
|
| | \ \ | |
| | \ \ | |
| | \ \ | |
| | \ \ | |
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | | |
'arm/core', 'x86/vt-d', 'x86/amd', 's390' and 'core' into next
|
| | | |/ / /
| | |/| | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
A driver would need to know if there are any active references to a
a PASID before cleaning up its resources. This function helps check
if there are any active users of a PASID before it can perform any
recovery on that device.
To: Joerg Roedel <joro@8bytes.org>
To: linux-kernel@vger.kernel.org
To: David Woodhouse <dwmw2@infradead.org>
Cc: Jean-Phillipe Brucker <jean-philippe.brucker@arm.com>
Cc: iommu@lists.linux-foundation.org
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
"This work from Amir introduces the inodes index feature, which
provides:
- hardlinks are not broken on copy up
- infrastructure for overlayfs NFS export
This also fixes constant st_ino for samefs case for lower hardlinks"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
ovl: mark parent impure and restore timestamp on ovl_link_up()
ovl: document copying layers restrictions with inodes index
ovl: cleanup orphan index entries
ovl: persistent overlay inode nlink for indexed inodes
ovl: implement index dir copy up
ovl: move copy up lock out
ovl: rearrange copy up
ovl: add flag for upper in ovl_entry
ovl: use struct copy_up_ctx as function argument
ovl: base tmpfile in workdir too
ovl: factor out ovl_copy_up_inode() helper
ovl: extract helper to get temp file in copy up
ovl: defer upper dir lock to tempfile link
ovl: hash overlay non-dir inodes by copy up origin
ovl: cleanup bad and stale index entries on mount
ovl: lookup index entry for copy up origin
ovl: verify index dir matches upper dir
ovl: verify upper root dir matches lower root dir
ovl: introduce the inodes index dir feature
ovl: generalize ovl_create_workdir()
...
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Added an i_state flag I_INUSE and helpers to set/clear/test the bit.
The 'inuse' lock is an 'advisory' inode lock, that can be used to extend
exclusive create protection beyond parent->i_mutex lock among cooperating
users.
This is going to be used by overlayfs to get exclusive ownership on upper
and work dirs among overlayfs mounts.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |\ \ \ \ \ \
| | |_|_|/ / /
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
overlayfs-next
UUID/GUID updates:
- introduce the new uuid_t/guid_t types that are going to replace
the somewhat confusing uuid_be/uuid_le types and make the terminology
fit the various specs, as well as the userspace libuuid library.
(me, based on a previous version from Amir)
- consolidated generic uuid/guid helper functions lifted from XFS
and libnvdimm (Amir and me)
- conversions to the new types and helpers (Amir, Andy and me)
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pull more block updates from Jens Axboe:
"This is a followup for block changes, that didn't make the initial
pull request. It's a bit of a mixed bag, this contains:
- A followup pull request from Sagi for NVMe. Outside of fixups for
NVMe, it also includes a series for ensuring that we properly
quiesce hardware queues when browsing live tags.
- Set of integrity fixes from Dmitry (mostly), fixing various issues
for folks using DIF/DIX.
- Fix for a bug introduced in cciss, with the req init changes. From
Christoph.
- Fix for a bug in BFQ, from Paolo.
- Two followup fixes for lightnvm/pblk from Javier.
- Depth fix from Ming for blk-mq-sched.
- Also from Ming, performance fix for mtip32xx that was introduced
with the dynamic initialization of commands"
* 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
block: call bio_uninit in bio_endio
nvmet: avoid unneeded assignment of submit_bio return value
nvme-pci: add module parameter for io queue depth
nvme-pci: compile warnings in nvme_alloc_host_mem()
nvmet_fc: Accept variable pad lengths on Create Association LS
nvme_fc/nvmet_fc: revise Create Association descriptor length
lightnvm: pblk: remove unnecessary checks
lightnvm: pblk: control I/O flow also on tear down
cciss: initialize struct scsi_req
null_blk: fix error flow for shared tags during module_init
block: Fix __blkdev_issue_zeroout loop
nvme-rdma: unconditionally recycle the request mr
nvme: split nvme_uninit_ctrl into stop and uninit
virtio_blk: quiesce/unquiesce live IO when entering PM states
mtip32xx: quiesce request queues to make sure no submissions are inflight
nbd: quiesce request queues to make sure no submissions are inflight
nvme: kick requeue list when requeueing a request instead of when starting the queues
nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
...
|
| |\ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Pull followup NVMe (mostly) changes from Sagi:
I added the quiesce/unquiesce patches in here as it's
easy for me easily apply changes on top. It has accumulated
reviews and includes mostly nvme anyway, please tell me if
you don't want to take them with this.
This includes:
- quiesce/unquiesce fixes in nvme and others from me
- nvme-fc add create association padding spec updates from James
- some more quirking from MKP
- nvmet nit cleanup from Max
- Fix nvme-rdma racy RDMA completion signalling from Marta
- some centralization patches from me
- add tagset nr_hw_queues updates on controller resets in
nvme drivers from me
- nvme-rdma fix resources recycling when doing error recovery from me
- minor cleanups in nvme-fc from me
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Revises the Create Association LS for the amount of pad expected in 1.16.
Add defines for the minimum lengths that a target can accept (e.g. variable
pad lengths)
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
And instead call directly into the integrity code from bio_end_io.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Some ->bi_end_io handlers (for example: pi_verify or decrypt handlers)
need to know original data vector, but after bio traverse io-stack it may
be advanced, splited and relocated many times so it is hard to guess
original iterator. Let's add 'bi_done' conter which accounts number
of bytes iterator was advanced during it's evolution. Later end_io handler
may easily restore original iterator by rewinding iterator to
iter->bi_done.
Note: this change makes sizeof (struct bvec_iter) multiple to 8
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
[hch: switched to true/false return]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.
Sane reaction would be to propagate error back to calling context
But bvec_iter_advance's calling context is not always good for error
handling. For safity reason let truncate iterator size to zero which
will break external iteration loop which prevent us from unpredictable
memory range corruption. And even it caller ignores an error, it will
corrupt it's own bvecs, not others.
This patch does:
- Return error back to caller with hope that it will react on this
- Truncate iterator size
Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
[hch: switch to true/false returns instead of errno values]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.
In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold it in to one function.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
[hch: merged with the latest block tree,
return bool from bio_integrity_prep]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()
So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
Let's just remove them completely.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\ \ \ \ \ \ \ \ \
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Pull ceph updates from Ilya Dryomov:
"The main item here is support for v12.y.z ("Luminous") clusters:
RESEND_ON_SPLIT, RADOS_BACKOFF, OSDMAP_PG_UPMAP and CRUSH_CHOOSE_ARGS
feature bits, and various other changes in the RADOS client protocol.
On top of that we have a new fsc mount option to allow supplying
fscache uniquifier (similar to NFS) and the usual pile of filesystem
fixes from Zheng"
* tag 'ceph-for-4.13-rc1' of git://github.com/ceph/ceph-client: (44 commits)
libceph: advertise support for NEW_OSDOP_ENCODING and SERVER_LUMINOUS
libceph: osd_state is 32 bits wide in luminous
crush: remove an obsolete comment
crush: crush_init_workspace starts with struct crush_work
libceph, crush: per-pool crush_choose_arg_map for crush_do_rule()
crush: implement weight and id overrides for straw2
libceph: apply_upmap()
libceph: compute actual pgid in ceph_pg_to_up_acting_osds()
libceph: pg_upmap[_items] infrastructure
libceph: ceph_decode_skip_* helpers
libceph: kill __{insert,lookup,remove}_pg_mapping()
libceph: introduce and switch to decode_pg_mapping()
libceph: don't pass pgid by value
libceph: respect RADOS_BACKOFF backoffs
libceph: make DEFINE_RB_* helpers more general
libceph: avoid unnecessary pi lookups in calc_target()
libceph: use target pi for calc_target() calculations
libceph: always populate t->target_{oid,oloc} in calc_target()
libceph: make sure need_resend targets reflect latest map
libceph: delete from need_resend_linger before check_linger_pool_dne()
...
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
All four SERVER_LUMINOUS feature bits are implemented, switch it on!
NEW_OSDOP_ENCODING doesn't mean much for the client (it signifies
support for MOSDOp v6) but needs to be enabled in order to get the
latest (currently v25) pg_pool_t.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Sage Weil <sage@redhat.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.
Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
dbe36e08be00c6519a8c89718dd47b0219c20516.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).
We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:
position 0 position 1
item 3 0x10000 0x100000
item 7 0x40000 0x10000
item 13 0x40000 0x10000
When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.
By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.
bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.
Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)
Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.
Replace skip_name_map() with ceph_decode_skip_map as an example.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id,
compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare()
respectively.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Otherwise we may miss events like PG splits, pool deletions, etc when
we get multiple incremental maps at once. Because check_pool_dne() can
now be fed an unlinked request, finish_request() needed to be taught to
handle unlinked requests.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|