summaryrefslogtreecommitdiffstats
path: root/net/ipv4/fib_trie.c
Commit message (Collapse)AuthorAgeFilesLines
* [IPV4] fib_trie: fix warning from rcu_assign_poingerStephen Hemminger2008-03-221-2/+5
| | | | | | | | | | | | This gets rid of a warning caused by the test in rcu_assign_pointer. I tried to fix rcu_assign_pointer, but that devolved into a long set of discussions about doing it right that came to no real solution. Since the test in rcu_assign_pointer for constant NULL would never succeed in fib_trie, just open code instead. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fib_trie: /proc/net/route performance improvementStephen Hemminger2008-02-121-11/+82
| | | | | | | | | Use key/offset caching to change /proc/net/route (use by iputils route) from O(n^2) to O(n). This improves performance from 30sec with 160,000 routes to 1sec. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fib_trie: handle empty treeStephen Hemminger2008-02-121-4/+2
| | | | | | | | This fixes possible problems when trie_firstleaf() returns NULL to trie_leafindex(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Formatting fix for /proc/net/fib_trie.Denis V. Lunev2008-02-051-2/+1
| | | | | | | | | | | | The line in the /proc/net/fib_trie for route with TOS specified - has extra \n at the end - does not have a space after route scope like below. |-- 1.1.1.1 /32 universe UNICASTtos =1 Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: rescan if key is lost during dumpStephen Hemminger2008-01-311-17/+32
| | | | | | | | | | Normally during a dump the key of the last dumped entry is used for continuation, but since lock is dropped it might be lost. In that case fallback to the old counter based N^2 behaviour. This means the dump will end up skipping some routes which matches what FIB_HASH does. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: apply fixes from fib_hashJulian Anastasov2008-01-311-20/+35
| | | | | | | | | | | | | | | Update fib_trie with some fib_hash fixes: - check for duplicate alternative routes for prefix+tos+priority when replacing route - properly insert by matching tos together with priority - fix alias walking to use list_for_each_entry_continue for insertion and deletion when fa_head is not NULL - copy state from fa to new_fa on replace (not a problem for now) - additionally, avoid replacement without error if new route is same, as Joonwoo Park suggests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: remove unneeded NULL checkStephen Hemminger2008-01-281-3/+0
| | | | | | | | Since fib_route_seq_show now uses hlist_for_each_entry(), the leaf info can not be NULL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: More whitespace cleanup.Stephen Hemminger2008-01-281-6/+0
| | | | | | | Remove extra blank lines. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: avoid rescan on dumpStephen Hemminger2008-01-281-13/+21
| | | | | | | | | | | | | | | This converts dumping (and flushing) of large route tables form O(N^2) to O(N). If the route dump took multiple pages then the dump routine gets called again. The old code kept track of location by counter, the new code instead uses the last key. This is a really big win ( 0.3 sec vs 12 sec) for big route tables. One side effect is that if the table changes during the dump, then the last key will not be found, and we will return -EBUSY. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: avoid extra search on deleteStephen Hemminger2008-01-281-38/+12
| | | | | | | Get rid of extra search that made route deletion O(n). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: dump table in sorted orderStephen Hemminger2008-01-281-35/+39
| | | | | | | | | It is easier with TRIE to dump the data traversal rather than interating over every possible prefix. This saves some time and makes the dump come out in sorted order. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: iterator recodeStephen Hemminger2008-01-281-51/+52
| | | | | | | | | Remove the complex loop structure of nextleaf() and replace it with a simpler tree walker. This improves the performance and is much cleaner. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: dump message multiple part flagStephen Hemminger2008-01-281-1/+1
| | | | | | | Match fib_hash, and set NLM_F_MULTI to handle multiple part messages. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: use hash listStephen Hemminger2008-01-281-25/+24
| | | | | | | | The code to dump can use the existing hash chain rather than doing repeated lookup. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: compute size when neededStephen Hemminger2008-01-281-8/+13
| | | | | | | Compute the number of prefixes when needed, rather than doing bookeeping. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: style cleanupStephen Hemminger2008-01-281-95/+142
| | | | | | | | | | | Style cleanups: * make check_leaf return -1 or plen, rather than by reference * Get rid of #ifdef that is always set * split out embedded function calls in if statements. * checkpatch warnings Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: put leaf nodes in a slab cacheStephen Hemminger2008-01-281-3/+10
| | | | | | | | This improves locality for operations that touch all the leaves. Save space since these entries don't need to be hardware cache aligned. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [FIB]: Fix rcu_dereference() abuses in fib_trie.cEric Dumazet2008-01-281-12/+21
| | | | | | | | | | | | | | | | | | | | | | node_parent() and tnode_get_child() currently use rcu_dereference(). These functions are called from both - readers only paths (where rcu_dereference() is needed), and - writer path (where rcu_dereference() is not needed) To make explicit where rcu_dereference() is really needed, I introduced new node_parent_rcu() and tnode_get_child_rcu() functions which use rcu_dereference(), while node_parent() and tnode_get_child() dont use it. Then I changed calling sites where rcu_dereference() was really needed to call the _rcu() variants. This should have no impact but for alpha architecture, and may help future sparse checks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: fib hash|trie initializationStephen Hemminger2008-01-281-8/+8
| | | | | | | | | Initialization of the slab cache's should be done when IP is initialized to make sure of available memory, and that code can be marked __init. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: size and statisticsStephen Hemminger2008-01-281-35/+20
| | | | | | | | | | | | Show number of entries in trie, the size field was being set but never used, but it only counted leaves, not all entries. Refactor the two cases in fib_triestat_seq_show into a single routine. Note: the stat structure was being malloc'd but the stack usage isn't so high (288 bytes) that it is worth the additional complexity. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [FIB]: Avoid using static variables without proper lockingEric Dumazet2008-01-281-10/+12
| | | | | | | | | | | fib_trie_seq_show() uses two helper functions, rtn_scope() and rtn_type() that can write to static storage without locking. Just pass to them a temporary buffer to avoid potential corruption (probably not triggerable but still...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [FIB]: full_children & empty_children should be uint, not ushortEric Dumazet2008-01-281-13/+12
| | | | | | | | | | | | | | If declared as unsigned short, these fields can overflow, and whole trie logic is broken. I could not make the machine crash, but some tnode can never be freed. Note for 64 bit arches : By reordering t_key and parent in [node, leaf, tnode] structures, we can use 32 bits hole after t_key so that sizeof(struct tnode) doesnt change after this patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: removes a memset() call in tnode_new()Eric Dumazet2008-01-281-1/+0
| | | | | | | | tnode_alloc() already clears allocated memory, using kcalloc() or alloc_pages(GFP_KERNEL|__GFP_ZERO, ...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [FIB]: Reduce text size of net/ipv4/fib_trie.oEric Dumazet2008-01-281-2/+2
| | | | | | | | | | | | | | | | | In struct tnode, we use two fields of 5 bits for 'pos' and 'bits'. Switching to plain 'unsigned char' (8 bits) take the same space because of compiler alignments, and reduce text size by 435 bytes on i386. On i386 : $ size net/ipv4/fib_trie.o.before_patch net/ipv4/fib_trie.o text data bss dec hex filename 13714 4 64 13782 35d6 net/ipv4/fib_trie.o.before 13279 4 64 13347 3423 net/ipv4/fib_trie.o Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: Fix sparse warnings.Stephen Hemminger2008-01-281-3/+4
| | | | | | | Make FIB TRIE go through sparse checker without warnings. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: Add statistics.Stephen Hemminger2008-01-281-12/+19
| | | | | | | | | | | The FIB TRIE code has a bunch of statistics, but the code is hidden behind an ifdef that was never implemented. Since it was dead code, it was broken as well. This patch fixes that by making it a config option. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: fib_insert_node cleanupStephen Hemminger2008-01-281-17/+11
| | | | | | | | | The only error from fib_insert_node is if memory allocation fails, so instead of passing by reference, just use the convention of returning NULL. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: Use %u for unsigned printfs.Stephen Hemminger2008-01-281-7/+7
| | | | | | | Use %u instead of %d when printing unsigned values. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: Get rid of unused revision element.Stephen Hemminger2008-01-281-10/+4
| | | | | | | | The revision element must of been part of an earlier design, because currently it is set but never used. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: Get rid of trie_init().Stephen Hemminger2008-01-281-16/+1
| | | | | | | | trie_init is worthless it is just zeroing stuff that is already zero! Move the memset() down to make it obvious. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Show routing information from correct namespace (fib_trie.c)Denis V. Lunev2008-01-281-12/+32
| | | | | | | | | | | | This is the second part (for the CONFIG_IP_FIB_TRIE case) of the patch #4, where we have created proc files in namespaces. Now we can dump correct info in them. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add netns parameter to fib_get_table/fib_new_table.Denis V. Lunev2008-01-281-4/+4
| | | | | | | | | | | This patch extends the fib_get_table and the fib_new_table functions with the network namespace pointer. That will allow to access the table relatively from the network namespace. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Refactor fib initialization so it can handle multiple namespaces.Denis V. Lunev2008-01-281-5/+1
| | | | | | | | | | | | This patch makes the fib to be initialized as a subsystem for the network namespaces. The code does not handle several namespaces yet, so in case of a creation of a network namespace, the creation/initialization will not occur. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace to API for routing /proc entries creation.Denis V. Lunev2008-01-281-10/+11
| | | | | | | | | | | | This adds netns parameter to fib_proc_init/exit and replaces __init specifier with __net_init. After this, we will not yet have these proc files show info from the specific namespace - this will be done when these tables become namespaced. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Thresholds in fib_trie.c are used as consts, so make them const.Denis V. Lunev2008-01-281-4/+4
| | | | | | | | | There are several thresholds for trie fib hash management. They are used in the code as a constants. Make them constants from the compiler point of view. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: last default route is a fib table propertyDenis V. Lunev2008-01-281-9/+9
| | | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Unify assignment of fi to fib_resultDenis V. Lunev2008-01-281-15/+4
| | | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: no need pass pointer to a default into fib_detect_deathDenis V. Lunev2008-01-281-2/+2
| | | | | | | | ipv4: no need pass pointer to a default into fib_detect_death Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Move trie_local and trie_main into the proc iterator.Eric W. Biederman2008-01-281-13/+34
| | | | | | | | | | | | We only use these variables when displaying the trie in proc so place them into the iterator to make this explicit. We should probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES case but at least this makes it clear that the silliness is limited to the display in /proc. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: fix duplicated route issueJoonwoo Park2008-01-201-0/+3
| | | | | | | | | | http://bugzilla.kernel.org/show_bug.cgi?id=9493 The fib allows making identical routes with 'ip route replace'. This patch makes the fib return -EEXIST if replacement would cause duplication. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* remove asm/bitops.h includesJiri Slaby2007-10-191-1/+1
| | | | | | | | | | | | | remove asm/bitops.h includes including asm/bitops directly may cause compile errors. don't include it and include linux/bitops instead. next patch will deny including asm header directly. Cc: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [NET]: Make core networking code use seq_open_privatePavel Emelyanov2007-10-101-38/+4
| | | | | | | | | | | This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make /proc/net per network namespaceEric W. Biederman2007-10-101-8/+9
| | | | | | | | | | | | | | | | | | This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: macro cleanupStephen Hemminger2007-10-101-7/+11
| | | | | | | | This patch converts the messy macro for MASK_PFX to inline function and expands TKEY_GET_MASK in the one place it is used. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: cleanupStephen Hemminger2007-10-101-32/+36
| | | | | | | | | | | Try this out: * replace macro's with inlines * get rid of places doing multiple evaluations of NODE_PARENT [akpm@linux-foundation.org: rcu_dereference wants an lval] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mm: Remove slab destructors from kmem_cache_create().Paul Mundt2007-07-201-1/+1
| | | | | | | | | | | | | | Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* [RTNETLINK]: Fix sending netlink message when replace route.Milan Kocian2007-05-241-2/+4
| | | | | | | | | | | | When you replace route via ip r r command the netlink multicast message is not send. This patch corrects it. NL message is sent with NLM_F_REPLACE flag. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8320 Signed-off-by: Milan Kocian <milon@wq.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: fib_trie root node settingsRobert Olsson2007-04-251-2/+2
| | | | | | | | | | | | | | The threshold for root node can be more aggressive set to get better tree compression. The new setting mekes the root grow from 16 to 19 bits and substansial improvemnt in Aver depth this with the current table of 214393 prefixes But really the dynamic resize should need more investigation both in terms convergence and performance and maybe it should be possible to change... Maybe just for the brave to start with or we may have to back this out.
* [IPV4]: fib_trie resize breakRobert Olsson2007-04-251-3/+23
| | | | | | | | | The patch below adds break condition for the resize operations. If we don't achieve the desired fill factor a warning is printed. Trie should still be operational but new thresholds should be considered. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: make seq_operations constStephen Hemminger2007-04-251-2/+2
| | | | | | | | The seq_file operations stuff can be marked constant to get it out of dirty cache. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>