
Monday, September 9, 2024

Log Parsing


Log Parsing Cheat Sheet


GREP searches any given input files, selecting lines that match one or more patterns.

2. CUT

CUT cuts out selected portions of each line from each file and writes them to the standard output.

3. SED

SED reads the specified files, modifying the input as specified by a list of commands.

4. AWK

AWK scans each input file for lines that match any of a set of patterns.


SORT sorts text and binary files by lines.


UNIQ reads the specified input file comparing adjacent lines and writes a copy of each unique input line to the output file.

Let’s walk through an example.

To count the number of hits from the top 10 IP addresses requesting the path "/api/payments" from the access log in this common log format: - leon [01/Jul/2002:12:11:52 +0000] "GET /index.html HTTP/1.1" 200 431

We can use a combination of grep, cut, sort, and uniq commands. Here is a sample command:

grep '/api/payments' access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -rn | head -10

Here's what each part of the command does:

- grep '/api/payments' access.log: This filters the lines containing "/api/payments" from the access.log file.

- cut -d ' ' -f 1: This extracts the first field (the IP address) from each line. The -d ' ' option specifies space as the field delimiter.

- sort: This sorts the IP addresses.

- uniq -c: This removes duplicate lines and prefixes lines by the number of occurrences.

- sort -rn: This sorts the lines in reverse order (highest first) numerically.

- head -10: This shows only the first 10 lines of the output, which correspond to the top 10 IP addresses.


Log parsing commands are essential for analyzing logs and extracting useful information, especially in system administration and troubleshooting. Below are some of the most commonly used log parsing commands along with examples.

### 1. **grep**

`grep` is used to search for specific patterns in a file or output.

- **Example: Search for errors in a log file**


  grep "ERROR" /var/log/syslog


  This command will search for the word "ERROR" in the syslog file and return all matching lines.

- **Example: Case-insensitive search**


  grep -i "error" /var/log/syslog


  This will return lines with "error", "Error", "ERROR", etc.

### 2. **awk**

`awk` is a powerful text-processing tool that allows manipulation and extraction of data based on patterns and actions.

- **Example: Extract the date and message from a log**


  awk '{print $1, $2, $3, $5}' /var/log/syslog


  This extracts the first three fields (date and time) and the 5th field (log message) from each line.

- **Example: Filter logs by a specific user**


  awk '$5 == "username"' /var/log/auth.log


  This searches for entries where the 5th field equals "username".

### 3. **sed**

`sed` is used for stream editing, such as searching, finding, and replacing text in logs.

- **Example: Replace "ERROR" with "WARNING"**


  sed 's/ERROR/WARNING/g' /var/log/syslog


  This replaces all occurrences of "ERROR" with "WARNING" in the syslog file.

- **Example: Delete lines containing "DEBUG"**


  sed '/DEBUG/d' /var/log/syslog


  This deletes all lines containing "DEBUG" from the output.

### 4. **cut**

`cut` is used to extract specific columns or fields from a log file.

- **Example: Extract the timestamp from logs**


  cut -d ' ' -f 1-3 /var/log/syslog


  This extracts the first three fields (assumed to be date and time) from each line, where fields are separated by spaces.

### 5. **tail**

`tail` shows the last few lines of a file, which is helpful for viewing recent log entries.

- **Example: Show the last 10 lines of the log**


  tail /var/log/syslog


- **Example: Continuously monitor new log entries (real-time)**


  tail -f /var/log/syslog


### 6. **head**

`head` is the opposite of `tail`; it shows the first few lines of a file.

- **Example: View the first 10 lines of a log file**


  head /var/log/syslog


### 7. **sort**

`sort` arranges the lines of a file or output in ascending or descending order.

- **Example: Sort log entries by timestamp**


  sort /var/log/syslog


- **Example: Sort in reverse order**


  sort -r /var/log/syslog


### 8. **uniq**

`uniq` filters out repeated lines, which is useful for finding unique log entries.

- **Example: Find unique IP addresses**


  cut -d ' ' -f 7 /var/log/syslog | sort | uniq


  This extracts the 7th field (assumed to be an IP address), sorts it, and removes duplicates.

### 9. **wc**

`wc` counts lines, words, or characters in a file.

- **Example: Count the number of log entries**


  wc -l /var/log/syslog


- **Example: Count words in a log file**


  wc -w /var/log/syslog


### 10. **less**

`less` is a pager command that allows you to view large log files one screen at a time.

- **Example: View a log file interactively**


  less /var/log/syslog


### 11. **find**

`find` is used to search for files based on criteria, which can be helpful when looking for log files across directories.

- **Example: Find log files modified in the last 24 hours**


  find /var/log -name "*.log" -mtime -1


### 12. **xargs**

`xargs` is used to build and execute commands based on the output of previous commands.

- **Example: Delete old log files**


  find /var/log -name "*.log" -mtime +30 | xargs rm


  This finds all log files older than 30 days and deletes them.

### 13. **logger**

`logger` is used to manually add entries to the system log.

- **Example: Log a custom message**


  logger "This is a custom log entry"



### Use Case Example: Combined Commands for Log Parsing

To find unique IP addresses from recent logs:


tail -n 1000 /var/log/syslog | grep "Accepted" | awk '{print $NF}' | sort | uniq


This command looks at the last 1000 lines of the syslog, filters for "Accepted" SSH login messages, extracts the IP addresses, sorts them, and removes duplicates.


These commands, either individually or in combination, can be extremely powerful for parsing, analyzing, and managing logs in Unix/Linux environments.

No comments:

Post a Comment