summaryrefslogtreecommitdiffstats
path: root/kernel/printk/printk_ringbuffer.h
Commit message (Collapse)AuthorAgeFilesLines
* printk: Wait for all reserved records with pr_flush()John Ogness2024-02-071-0/+1
| | | | | | | | | | | | | | | | | | | | | Currently pr_flush() will only wait for records that were available to readers at the time of the call (using prb_next_seq()). But there may be more records (non-finalized) that have following finalized records. pr_flush() should wait for these to print as well. Particularly because any trailing finalized records may be the messages that the calling context wants to ensure are printed. Add a new ringbuffer function prb_next_reserve_seq() to return the sequence number following the most recently reserved record. This guarantees that pr_flush() will wait until all current printk() messages (completed or in progress) have been printed. Fixes: 3b604ca81202 ("printk: add pr_flush()") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-10-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* printk: ringbuffer: Clarify special lpos valuesJohn Ogness2024-02-071-1/+15
| | | | | | | | | | | | | | | | | | | | | | For empty line records, no data blocks are created. Instead, these valid records are identified by special logical position values (in fields of @prb_desc.text_blk_lpos). Currently the macro NO_LPOS is used for empty line records. This name is confusing because it does not imply _why_ there is no data block. Rename NO_LPOS to EMPTY_LINE_LPOS so that it is clear why there is no data block. Also add comments explaining the use of EMPTY_LINE_LPOS as well as clarification to the values used to represent data-less blocks. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-6-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* printk: ringbuffer: Do not skip non-finalized records with prb_next_seq()John Ogness2024-02-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f244b4dc53e5 ("printk: ringbuffer: Improve prb_next_seq() performance") introduced an optimization for prb_next_seq() by using best-effort to track recently finalized records. However, the order of finalization does not necessarily match the order of the records. The optimization changed prb_next_seq() to return inconsistent results, possibly yielding sequence numbers that are not available to readers because they are preceded by non-finalized records or they are not yet visible to the reader CPU. Rather than simply best-effort tracking recently finalized records, force the committing writer to read records and increment the last "contiguous block" of finalized records. In order to do this, the sequence number instead of ID must be stored because ID's cannot be directly compared. A new memory barrier pair is introduced to guarantee that a reader can always read the records up until the sequence number returned by prb_next_seq() (unless the records have since been overwritten in the ringbuffer). This restores the original functionality of prb_next_seq() while also keeping the optimization. For 32bit systems, only the lower 32 bits of the sequence number are stored. When reading the value, it is expanded to the full 64bit sequence number using the 32bit seq macros, which fold in the value returned by prb_first_seq(). Fixes: f244b4dc53e5 ("printk: ringbuffer: Improve prb_next_seq() performance") Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-5-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* printk: Use prb_first_seq() as base for 32bit seq macrosJohn Ogness2024-02-071-4/+4
| | | | | | | | | | | | | | | | | | | Note: This change only applies to 32bit architectures. On 64bit architectures the macros are NOPs. Currently prb_next_seq() is used as the base for the 32bit seq macros __u64seq_to_ulseq() and __ulseq_to_u64seq(). However, in a follow-up commit, prb_next_seq() will need to make use of the 32bit seq macros. Use prb_first_seq() as the base for the 32bit seq macros instead because it is guaranteed to return 64bit sequence numbers without relying on any 32bit seq macros. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-4-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* printk: Adjust mapping for 32bit seq macrosSebastian Andrzej Siewior2024-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: This change only applies to 32bit architectures. On 64bit architectures the macros are NOPs. __ulseq_to_u64seq() computes the upper 32 bits of the passed argument value (@ulseq). The upper bits are derived from a base value (@rb_next_seq) in a way that assumes @ulseq represents a 64bit number that is less than or equal to @rb_next_seq. Until now this mapping has been correct for all call sites. However, in a follow-up commit, values of @ulseq will be passed in that are higher than the base value. This requires a change to how the 32bit value is mapped to a 64bit sequence number. Rather than mapping @ulseq such that the base value is the end of a 32bit block, map @ulseq such that the base value is in the middle of a 32bit block. This allows supporting 31 bits before and after the base value, which is deemed acceptable for the console sequence number during runtime. Here is an example to illustrate the previous and new mappings. For a base value (@rb_next_seq) of 2 2000 0000... Before this change the range of possible return values was: 1 2000 0001 to 2 2000 0000 __ulseq_to_u64seq(1fff ffff) => 2 1fff ffff __ulseq_to_u64seq(2000 0000) => 2 2000 0000 __ulseq_to_u64seq(2000 0001) => 1 2000 0001 __ulseq_to_u64seq(9fff ffff) => 1 9fff ffff __ulseq_to_u64seq(a000 0000) => 1 a000 0000 __ulseq_to_u64seq(a000 0001) => 1 a000 0001 After this change the range of possible return values are: 1 a000 0001 to 2 a000 0000 __ulseq_to_u64seq(1fff ffff) => 2 1fff ffff __ulseq_to_u64seq(2000 0000) => 2 2000 0000 __ulseq_to_u64seq(2000 0001) => 2 2000 0001 __ulseq_to_u64seq(9fff ffff) => 2 9fff ffff __ulseq_to_u64seq(a000 0000) => 2 a000 0000 __ulseq_to_u64seq(a000 0001) => 1 a000 0001 [ john.ogness: Rewrite commit message. ] Reported-by: Francesco Dolcini <francesco@dolcini.it> Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-3-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* printk: nbcon: Relocate 32bit seq macrosJohn Ogness2024-02-071-0/+33
| | | | | | | | | | | | | | | | | | | | | | The macros __seq_to_nbcon_seq() and __nbcon_seq_to_seq() are used to provide support for atomic handling of sequence numbers on 32bit systems. Until now this was only used by nbcon.c, which is why they were located in nbcon.c and include nbcon in the name. In a follow-up commit this functionality is also needed by printk_ringbuffer. Rather than duplicating the functionality, relocate the macros to printk_ringbuffer.h. Also, since the macros will be no longer nbcon-specific, rename them to __u64seq_to_ulseq() and __ulseq_to_u64seq(). This does not result in any functional change. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20240207134103.1357162-2-john.ogness@linutronix.de Signed-off-by: Petr Mladek <pmladek@suse.com>
* printk: ringbuffer: Improve prb_next_seq() performancePetr Mladek2022-01-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | prb_next_seq() always iterates from the first known sequence number. In the worst case, it might loop 8k times for 256kB buffer, 15k times for 512kB buffer, and 64k times for 2MB buffer. It was reported that polling and reading using syslog interface might occupy 50% of CPU. Speedup the search by storing @id of the last finalized descriptor. The loop is still needed because the @id is stored and read in the best effort way. An atomic variable is used to keep the @id consistent. But the stores and reads are not serialized against each other. The descriptor could get reused in the meantime. The related sequence number will be used only when it is still valid. An invalid value should be read _only_ when there is a flood of messages and the ringbuffer is rapidly reused. The performance is the least problem in this case. Reported-by: Chunlei Wang <chunlei.wang@mediatek.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/1642770388-17327-1-git-send-email-quic_mojha@quicinc.com Link: https://lore.kernel.org/lkml/YXlddJxLh77DKfIO@alley/T/#m43062e8b2a17f8dbc8c6ccdb8851fb0dbaabbb14
* printk: rectify kernel-doc for prb_rec_init_wr()Lukas Bulwahn2021-01-261-1/+1
| | | | | | | | | | | | The command 'find ./kernel/printk/ | xargs ./scripts/kernel-doc -none' reported a mismatch with the kernel-doc of prb_rec_init_wr(). Rectify the kernel-doc, such that no issues remain for ./kernel/printk/. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210125081748.19903-1-lukas.bulwahn@gmail.com
* printk: avoid and/or handle record truncationJohn Ogness2020-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | If a reader provides a buffer that is smaller than the message text, the @text_len field of @info will have a value larger than the buffer size. If readers blindly read @text_len bytes of data without checking the size, they will read beyond their buffer. Add this check to record_print_text() to properly recognize when such truncation has occurred. Add a maximum size argument to the ringbuffer function to extend records so that records can not be created that are larger than the buffer size of readers. When extending records (LOG_CONT), do not extend records beyond LOG_LINE_MAX since that is the maximum size available in the buffers used by consoles and syslog. Fixes: f5f022e53b87 ("printk: reimplement log_cont using record extension") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200930090134.8723-2-john.ogness@linutronix.de
* printk: remove dict ringJohn Ogness2020-09-221-50/+13
| | | | | | | | | | | Since there is no code that will ever store anything into the dict ring, remove it. If any future dictionary properties are to be added, these should be added to the struct printk_info. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200918223421.21621-4-john.ogness@linutronix.de
* printk: move dictionary keys to dev_printk_infoJohn Ogness2020-09-221-0/+3
| | | | | | | | | | | | | | | | | | | | Dictionaries are only used for SUBSYSTEM and DEVICE properties. The current implementation stores the property names each time they are used. This requires more space than otherwise necessary. Also, because the dictionary entries are currently considered optional, it cannot be relied upon that they are always available, even if the writer wanted to store them. These issues will increase should new dictionary properties be introduced. Rather than storing the subsystem and device properties in the dict ring, introduce a struct dev_printk_info with separate fields to store only the property values. Embed this struct within the struct printk_info to provide guaranteed availability. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/87mu1jl6ne.fsf@jogness.linutronix.de
* printk: move printk_info into separate arrayJohn Ogness2020-09-221-13/+16
| | | | | | | | | | | | | | | | | | | | The majority of the size of a descriptor is taken up by meta data, which is often not of interest to the ringbuffer (for example, when performing state checks). Since descriptors are often temporarily stored on the stack, keeping their size minimal will help reduce stack pressure. Rather than embedding the printk_info into the descriptor, create a separate printk_info array. The index of a descriptor in the descriptor array corresponds to the printk_info with the same index in the printk_info array. The rules for validity of a printk_info match the existing rules for the data blocks: the descriptor must be in a consistent state. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200918223421.21621-2-john.ogness@linutronix.de
* printk: ringbuffer: add finalization/extension supportJohn Ogness2020-09-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for extending the newest data block. For this, introduce a new finalization state (desc_finalized) denoting a committed descriptor that cannot be extended. Until a record is finalized, a writer can reopen that record to append new data. Reopening a record means transitioning from the desc_committed state back to the desc_reserved state. A writer can explicitly finalize a record if there is no intention of extending it. Also, records are automatically finalized when a new record is reserved. This relieves writers of needing to explicitly finalize while also making such records available to readers sooner. (Readers can only traverse finalized records.) Four new memory barrier pairs are introduced. Two of them are insignificant additions (data_realloc:A/desc_read:D and data_realloc:A/data_push_tail:B) because they are alternate path memory barriers that exactly match the purpose, pairing, and context of the two existing memory barrier pairs they provide an alternate path for. The other two new memory barrier pairs are significant additions: desc_reopen_last:A / _prb_commit:B - When reopening a descriptor, ensure the state transitions back to desc_reserved before fully trusting the descriptor data. _prb_commit:B / desc_reserve:D - When committing a descriptor, ensure the state transitions to desc_committed before checking the head ID to see if the descriptor needs to be finalized. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-6-john.ogness@linutronix.de
* printk: ringbuffer: change representation of statesJohn Ogness2020-09-151-11/+20
| | | | | | | | | | | | | | | Rather than deriving the state by evaluating bits within the flags area of the state variable, assign the states explicit values and set those values in the flags area. Introduce macros to make it simple to read and write state values for the state variable. Although the functionality is preserved, the binary representation for the states is changed. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200914123354.832-5-john.ogness@linutronix.de
* printk: ringbuffer: support dataless recordsJohn Ogness2020-09-081-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | With commit 896fbe20b4e2333fb55 ("printk: use the lockless ringbuffer"), printk() started silently dropping messages without text because such records are not supported by the new printk ringbuffer. Add support for such records. Currently dataless records are denoted by INVALID_LPOS in order to recognize failed prb_reserve() calls. Change the ringbuffer to instead use two different identifiers (FAILED_LPOS and NO_LPOS) to distinguish between failed prb_reserve() records and successful dataless records, respectively. Fixes: 896fbe20b4e2333fb55 ("printk: use the lockless ringbuffer") Fixes: https://lkml.kernel.org/r/20200718121053.GA691245@elver.google.com Reported-by: Marco Elver <elver@google.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Marco Elver <elver@google.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200721132528.9661-1-john.ogness@linutronix.de
* printk: add lockless ringbufferJohn Ogness2020-07-101-0/+399
Introduce a multi-reader multi-writer lockless ringbuffer for storing the kernel log messages. Readers and writers may use their API from any context (including scheduler and NMI). This ringbuffer will make it possible to decouple printk() callers from any context, locking, or console constraints. It also makes it possible for readers to have full access to the ringbuffer contents at any time and context (for example from any panic situation). The printk_ringbuffer is made up of 3 internal ringbuffers: desc_ring: A ring of descriptors. A descriptor contains all record meta data (sequence number, timestamp, loglevel, etc.) as well as internal state information about the record and logical positions specifying where in the other ringbuffers the text and dictionary strings are located. text_data_ring: A ring of data blocks. A data block consists of an unsigned long integer (ID) that maps to a desc_ring index followed by the text string of the record. dict_data_ring: A ring of data blocks. A data block consists of an unsigned long integer (ID) that maps to a desc_ring index followed by the dictionary string of the record. The internal state information of a descriptor is the key element to allow readers and writers to locklessly synchronize access to the data. Co-developed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20200709132344.760-3-john.ogness@linutronix.de