util/lint: Add a lint tool to find non-ascii & unprintable chars

This examines characters in coreboot's sourcecode to look for values that are not TAB, or in the range of space (0x20) to ~ (0x7F). It specifically excludes copyright lines so that names with high- ASCII characters are not flagged. Change-Id: I40f7e61fd403cbad19cf0746e2017c53e7379bf8 Signed-off-by: Martin Roth <martinroth@google.com> Reviewed-on: https://review.coreboot.org/15979 Tested-by: build bot (Jenkins) Reviewed-by: Patrick Georgi <pgeorgi@google.com>
author: Martin Roth <martinroth@google.com> 2016-07-29 14:20:55 -0600
committer: Martin Roth <martinroth@google.com> 2016-08-02 18:56:14 +0200
commit: ae39fc45a8577ea0dab093a7aefcc336ecae88ea (patch)
tree: 4dcf8408325c90624fcf7ebcda183061e23b6c3e /util
parent: 597614347561f7a72ad3f9750f74d99a5cfe978e (diff)
download: coreboot-ae39fc45a8577ea0dab093a7aefcc336ecae88ea.tar.gz
coreboot-ae39fc45a8577ea0dab093a7aefcc336ecae88ea.tar.bz2
coreboot-ae39fc45a8577ea0dab093a7aefcc336ecae88ea.zip
1 files changed, 43 insertions, 0 deletions
diff --git a/util/lint/lint-016-non-ascii b/util/lint/lint-016-non-ascii
new file mode 100755
index 000000000000..881eeba69ea0
--- /dev/null
+++ b/util/lint/lint-016-non-ascii
@@ -0,0 +1,43 @@
+#!/bin/sh
+# This file is part of the coreboot project.
+#
+# Copyright (C) 2016 Google Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# DESCR: Check for non-ASCII and unprintable characters
+
+LC_ALL=C export LC_ALL
+
+INCLUDED_FILES='\.[chsS]$\|\.asl$\|\.cb$\|\.inc$\|Kconfig\|\.ld$|\.txt\|\.hex'
+EXCLUDED_DIRS='^payloads/\|^src/vendorcode/\|^Documentation/\|^build/\|^3rdparty/\|^\.git/\|^coreboot-builds/\|^util/nvidia/cbootimage'
+EXCLUDED_FILES='to-wiki/towiki\.sh$\|vga/vga_font\|video/font\|PDCurses.*x11'
+EXCLUDED_PHRASES='Copyright\|Ported to\|Intel®\|°C\|°F\|Athlon™\|Copyright.*©\|A-Za-zÀ-ÿ'
+
+# Use git ls-files if the code is in a git repo, otherwise use find.
+if [ -n "$(command -v git)" ] && [ -d .git ]; then
+	FIND_FILES="git ls-files"
+else
+	FIND_FILES="find . "
+fi
+
+# 1. Get the list of files to parse and send them through grep
+# 2. Find any characters that aren't TAB, or space (0x20) to ~ (0x7F)
+#    LF (0x10) isn't included, as it ends the grep line
+# 3. Remove common phrases and names that have been found
+# 4. Run the result through grep again to highlight the issues that were
+#    found.  Without this step, the characters can be difficult to see.
+grep -n "[^	 -~]" \
+	$(${FIND_FILES} | sed 's|^\./||' | sort | \
+		grep "$INCLUDED_FILES" | \
+		grep -v "$EXCLUDED_DIRS" | \
+		grep -v "$EXCLUDED_FILES") | \
+	grep -iv "$EXCLUDED_PHRASES" | \
+	grep --color='auto' "[^	 -~]"
author	Martin Roth <martinroth@google.com>	2016-07-29 14:20:55 -0600
committer	Martin Roth <martinroth@google.com>	2016-08-02 18:56:14 +0200
commit	ae39fc45a8577ea0dab093a7aefcc336ecae88ea (patch)
tree	4dcf8408325c90624fcf7ebcda183061e23b6c3e /util
parent	597614347561f7a72ad3f9750f74d99a5cfe978e (diff)
download	coreboot-ae39fc45a8577ea0dab093a7aefcc336ecae88ea.tar.gz coreboot-ae39fc45a8577ea0dab093a7aefcc336ecae88ea.tar.bz2 coreboot-ae39fc45a8577ea0dab093a7aefcc336ecae88ea.zip