Pages

Monday, September 9, 2024

Log Parsing

 


Log Parsing Cheat Sheet

1. GREP

GREP searches any given input files, selecting lines that match one or more patterns.

2. CUT

CUT cuts out selected portions of each line from each file and writes them to the standard output.

3. SED

SED reads the specified files, modifying the input as specified by a list of commands.

4. AWK

AWK scans each input file for lines that match any of a set of patterns.

5. SORT

SORT sorts text and binary files by lines.

6. UNIQ

UNIQ reads the specified input file comparing adjacent lines and writes a copy of each unique input line to the output file.



Let’s walk through an example.

To count the number of hits from the top 10 IP addresses requesting the path "/api/payments" from the access log in this common log format:

216.67.1.91 - leon [01/Jul/2002:12:11:52 +0000] "GET /index.html HTTP/1.1" 200 431

We can use a combination of grep, cut, sort, and uniq commands. Here is a sample command:

grep '/api/payments' access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -rn | head -10

Here's what each part of the command does:

- grep '/api/payments' access.log: This filters the lines containing "/api/payments" from the access.log file.

- cut -d ' ' -f 1: This extracts the first field (the IP address) from each line. The -d ' ' option specifies space as the field delimiter.

- sort: This sorts the IP addresses.

- uniq -c: This removes duplicate lines and prefixes lines by the number of occurrences.

- sort -rn: This sorts the lines in reverse order (highest first) numerically.

- head -10: This shows only the first 10 lines of the output, which correspond to the top 10 IP addresses.

###


Log parsing commands are essential for analyzing logs and extracting useful information, especially in system administration and troubleshooting. Below are some of the most commonly used log parsing commands along with examples.


### 1. **grep**

`grep` is used to search for specific patterns in a file or output.


- **Example: Search for errors in a log file**

  ```bash

  grep "ERROR" /var/log/syslog

  ```

  This command will search for the word "ERROR" in the syslog file and return all matching lines.


- **Example: Case-insensitive search**

  ```bash

  grep -i "error" /var/log/syslog

  ```

  This will return lines with "error", "Error", "ERROR", etc.


### 2. **awk**

`awk` is a powerful text-processing tool that allows manipulation and extraction of data based on patterns and actions.


- **Example: Extract the date and message from a log**

  ```bash

  awk '{print $1, $2, $3, $5}' /var/log/syslog

  ```

  This extracts the first three fields (date and time) and the 5th field (log message) from each line.


- **Example: Filter logs by a specific user**

  ```bash

  awk '$5 == "username"' /var/log/auth.log

  ```

  This searches for entries where the 5th field equals "username".


### 3. **sed**

`sed` is used for stream editing, such as searching, finding, and replacing text in logs.


- **Example: Replace "ERROR" with "WARNING"**

  ```bash

  sed 's/ERROR/WARNING/g' /var/log/syslog

  ```

  This replaces all occurrences of "ERROR" with "WARNING" in the syslog file.


- **Example: Delete lines containing "DEBUG"**

  ```bash

  sed '/DEBUG/d' /var/log/syslog

  ```

  This deletes all lines containing "DEBUG" from the output.


### 4. **cut**

`cut` is used to extract specific columns or fields from a log file.


- **Example: Extract the timestamp from logs**

  ```bash

  cut -d ' ' -f 1-3 /var/log/syslog

  ```

  This extracts the first three fields (assumed to be date and time) from each line, where fields are separated by spaces.


### 5. **tail**

`tail` shows the last few lines of a file, which is helpful for viewing recent log entries.


- **Example: Show the last 10 lines of the log**

  ```bash

  tail /var/log/syslog

  ```


- **Example: Continuously monitor new log entries (real-time)**

  ```bash

  tail -f /var/log/syslog

  ```


### 6. **head**

`head` is the opposite of `tail`; it shows the first few lines of a file.


- **Example: View the first 10 lines of a log file**

  ```bash

  head /var/log/syslog

  ```


### 7. **sort**

`sort` arranges the lines of a file or output in ascending or descending order.


- **Example: Sort log entries by timestamp**

  ```bash

  sort /var/log/syslog

  ```


- **Example: Sort in reverse order**

  ```bash

  sort -r /var/log/syslog

  ```


### 8. **uniq**

`uniq` filters out repeated lines, which is useful for finding unique log entries.


- **Example: Find unique IP addresses**

  ```bash

  cut -d ' ' -f 7 /var/log/syslog | sort | uniq

  ```

  This extracts the 7th field (assumed to be an IP address), sorts it, and removes duplicates.


### 9. **wc**

`wc` counts lines, words, or characters in a file.


- **Example: Count the number of log entries**

  ```bash

  wc -l /var/log/syslog

  ```


- **Example: Count words in a log file**

  ```bash

  wc -w /var/log/syslog

  ```


### 10. **less**

`less` is a pager command that allows you to view large log files one screen at a time.


- **Example: View a log file interactively**

  ```bash

  less /var/log/syslog

  ```


### 11. **find**

`find` is used to search for files based on criteria, which can be helpful when looking for log files across directories.


- **Example: Find log files modified in the last 24 hours**

  ```bash

  find /var/log -name "*.log" -mtime -1

  ```


### 12. **xargs**

`xargs` is used to build and execute commands based on the output of previous commands.


- **Example: Delete old log files**

  ```bash

  find /var/log -name "*.log" -mtime +30 | xargs rm

  ```

  This finds all log files older than 30 days and deletes them.


### 13. **logger**

`logger` is used to manually add entries to the system log.


- **Example: Log a custom message**

  ```bash

  logger "This is a custom log entry"

  ```


---


### Use Case Example: Combined Commands for Log Parsing


To find unique IP addresses from recent logs:

```bash

tail -n 1000 /var/log/syslog | grep "Accepted" | awk '{print $NF}' | sort | uniq

```

This command looks at the last 1000 lines of the syslog, filters for "Accepted" SSH login messages, extracts the IP addresses, sorts them, and removes duplicates.


---

These commands, either individually or in combination, can be extremely powerful for parsing, analyzing, and managing logs in Unix/Linux environments.




No comments:

Post a Comment