GNU awk: Effective AWK Programming

This article introduces the GNU awk and some often used awk commands.

Brief Introduction

AWK is an interpreted programming language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. While AWK has a limited intended application domain and was especially designed to support one-liner programs, the language is Turing-complete, and even the early Bell Labs users of AWK often wrote well-structured large AWK programs.

AWK was created at Bell Labs in the 1970s, and its name is derived from the surnames of its authors – Alfred Aho, Peter Weinberger, and Brian Kernighan. The acronym is pronounced the same as the name of the bird auk (which acts as an emblem of the language such as on The AWK Programming Language book cover – the book is often referred to by the abbreviation TAPL). When written in all lowercase letters, as awk, it refers to the Unix or Plan 9 program that runs scripts written in the AWK programming language.

Mind Map of GNU awk

GNU awk Mind Map

GNU_awk_Records_Fields

Often Used awk Commands

# Show the last five account which login the system
$ last -n 5 | awk '{print $1}'

# Show the account of /etc/passwd only
$ cat /etc/passwd | awk -F ':' '{print $1}'

# Show the account and corresponding shell, and split with TAB
$ cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'

# Search keyword root on file /etc/passwd
$ awk -F: '/root/' /etc/passwd

# Count the total size (byte) of files in current directory
$ ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'

# Count the total size (Megabytes) of files in current directory
$ ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}'

References