A Guide To Log Filtering: Tips for IT Pros
As an IT professional, you'll find that log messages are one way to catch errors and solve your problems. As much as log messages are very helpful, they can really be confusing because a lot of messages—even the log messages that you don't actually need to see—are logged out from the server. Instead of making your life easier, log messages can actually make things harder for you because unnecessary error messages are being logged out.
In this post, I'm going to guide you on how you can filter log messages and make them useful for solving your server problems. Errors can occur in any development environment and to professionals of any seniority. If you want to speed up the way you use a log message to solve errors, stick around. You'll discover some great tips that you can use and forget about headaches.
What Is a Log Message?
Almost every IT professional will have an answer to the question "What is a log message?" I won't waste much of your time on this, but I won't leave the question hanging, either. Here's a brief definition of a log from Wikipedia:
In computing, a log file is a file that records either events that occur in an operating system or other software runs, or messages between different users of a communication software. Logging is the act of keeping a log. In the simplest case, messages are written to a single log file.
From this definition, you can see that a log does not log only a single message, but rather many messages. What does that mean? Well, everyone can guess here. It simply means you are likely to see unwanted log messages in a log file. And who wants that? This is why you'll need to find a way to filter the logs: to only see what's useful for your need, which is, of course, to solve a problem or keep a record of something.
How Can You Filter a Log File and Get the Relevant Log Messages?
You might be having challenges getting the right information from a log file that can help you. Log messages can be bulk, and even those that aren't making any sense to you will still be logged in the log file. Here's a list of ways that you can filter a log file to get the desired results from the server.
Using a Grep Command
Running a grep command will help you get a keyword that can lead you to the actual message you want to see from the log file. Grep is a command-line tool that you can use to find a certain string in one or more files depending on the regular expression pattern. Here is an example of using grep on the Linux terminal:
$cat logfile.txt | grep --color=auto 'FAILURE' > new_logfile.txt
Using this example, let's say you have a big file and you want to trace the line that has the keyword FAILURE. Running the above command in the terminal will generate a new log file with a name as specified in the command. In this case, the name is new_logfile.txt. This will help you quickly identify the error message and act on it. You can also negate the command using a $cat logfile.txt | grep -v 'failure'
new_logfile.txt. Since "failure" is the string keyword, the command will search the file and leave the lines that don't contain the word "failure."
Using the AWK Command
AWK is a scripting language, and it's one of the best commands in Linux. From the AWK documentation, AWK is "a program that you can use to select particular records in a file and perform operations upon them." AWK is used to manipulate data. For that reason, you can use it to process a log file by searching through it.
According to your needs, you can write commands to search for a particular selection. If you want to dig deeper into AWK, you need to see its documentation. Here's a basic example using AWK in the terminal. Utilizing this tool can help you a lot by saving your time.
$ awk 'error' input-file.txt > output-file.txt
AWK will search the file and find any matching keyword "error." You can also apply regular expression here to specify the search criteria. For example, if you want to search lines that start with the error keyword, you can write an AWK command like this: $ awk '^error' input-file.txt > output-file.txt. Just like the above command, AWK will output the matching keyword in a file output-file.txt.
Using a Programming Language
Many programming languages have a way for you to log out messages from the server and save them in a text file. Having worked with both the Python and Django frameworks, I've had a better experience logging out messages using the Django framework. The language you're using can give you better flexibility to write scripts that you can use for your specific need, giving you more control. Here's an example that you can use to filter logs in Python:
import logging
logger = logging.getLogger(__name__)
class LogFilter(logging.Filter):
def filter(self, record):
return record.getMessage().startswith('keyword')
logger.addFilter(LogFilter())
The filtering called above will log only log messages that have your specified keyword. You can also filter by specifying the level of log message with just six lines of code in Python. The code can look like this:
import logging
logger = logging.getLogger(__name__)
class LoggingErrors(logging.Filter):
def filter(self, record):
return record.levelno == logging.ERROR
logger.addFilter(InfoFilter())
These few lines of code will be able to log error messages only.
Using Regular Expressions (Regex)
Sometimes, you may want to write your own filter instead of using a programming language. You can use regular expressions, also known as a regex, to filter logs. Regex can be very helpful in filtering your logs to get exactly what you want to see from the log message. Regex will also give the flexibility of writing an expression of your choice. This can be very handy when dealing with a log file that contains more complicated filtering. In a previous section, I talked about the power of the Linux tool grep when used for searching text files. Using it with regex makes it even more powerful.
For example, if you want to filter IP addresses from a log file that contains text and digits, you can write a regular expression to get only digits and dots. As you know, an IP address is made of digits connected by dots. A simple but very helpful regex combined with grep can look like this:
grep -o '\d\d*\.\d\d*\.\d\d*\.\d\d*'
The result will be as awesome as expected: a well-formatted IPv4 address grouped in four digits like 64.242.88.10.
Using Tail Command
Tail, a command for Unix-like operating systems, is another import and an exciting tool to work with. If you specify a number, it outputs that amount of the last lines from a file. By default, the tail command will output the last 10 lines. When used with -n, tail gives you the ability to specify the number of the last lines you want to be outputted from the file. Most of the time, you want to check the last line of the file from thousands of lines of logs. Using the tail command will be as simple as this:
tail -n 1 /usr/share/dict/file.log
If you want the last five lines from a log file, it's simple: Just increase the number after -n. In this case, -n 5 will give you the last five lines. Another good thing about the tail command is that you can actually watch the file changes. Just pass -f. This can look like this: tail -f /usr/share/dict/file.log
.
Using PowerShell To Filter Logs
You can use PowerShell to log messages from the server. From the Microsoft documentation, "PowerShell is a cross-platform task automation and configuration management framework, consisting of a command-line shell and scripting language. Unlike most shells, which accept and return text, PowerShell is built on top of the .NET Common Language Runtime (CLR) and accepts and returns .NET objects."
As described in the documentation's description, PowerShell can be used to actually filter logs. Here's an example that returns the last 20 error event logs:
Get-WinEvent –FilterHashtable @{logname='system'; level=2} –MaxEvents 50
–FilterHashtable is helping us to just map the key and value. If you look at the command, you can see that we have the level of event logs to Level 2, and this is the level that echoes error messages.
Conclusion
I'm certain you have noticed that it's difficult to get the actual line of an error message or just about anything you're looking for in a log file. Applying some filtering makes your work easier and more professional.
Something that I've seen commonly is that most servers are Linux or a Unix-like operating system base. That makes all these commands that Linux has such as grep, tail, and AWK even more interesting to use. In some cases, you don't want to write any command or a line of code to filter the log file. Well, in such cases, there are systems that are designed to help you with filtering logs and give you the precise results you want in real time.
If you want something that won't give you a headache and consume your time, using a system that is specially designed for the purpose of filtering logs can be the best option. I also understand writing these commands and regular expressions is not an easy job.
If writing commands and regular expressions are not your things, you can try Papertrail, a cloud-hosted log management system designed for faster troubleshooting of infrastructure and application issues. This lets you worry about other things and not on how to filter log messages.
This post was first posted on Papertrail website.