Sorting and counting
==========================
Imagine that you are administering a small unix system and you want to
know how many processes each user is running in parallel, and sort the
list in decreasing order of number of processes. The following
one-liner:
$ ps aux | cut -d " " -f 1 | tail -n +2 | sort | uniq -c | sort -rn
does the trick. Let's dissect it to understand how it works.
The command ps(1) can list all the processes currently running in your
system, together with the name of the user to whom each process belongs:
$ ps aux
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 11 100.0 0.0 0 16 - RNL 29Nov18 69599:49.33 [idle]
root 0 0.0 0.0 0 240 - DLs 29Nov18 0:13.81 [kernel]
root 1 0.0 0.0 5424 128 - ILs 29Nov18 0:01.03 /sbin/init --
root 2 0.0 0.0 0 16 - DL 29Nov18 0:00.00 [crypto]
...........
...........
uwu 30154 0.0 0.8 11680 8004 27 I+ 08:27 0:00.03 /usr/local/bin/lua52 /usr/local/bin/telem.lua uwu
uwu 27175 0.0 0.5 8520 5320 28 Is 06:35 0:00.03 -zsh (zsh)
uwu 27178 0.0 0.5 8188 5220 28 S+ 06:35 0:58.73 lua /usr/local/bin/odlli (lua52)
$
That is a fairly long list, but user names appear on the first column,
with other fields separated by (a variable number of) spaces. For the
moment we just need user names, so cut(1) comes handy:
$ ps aux | cut -d " " -f 1
USER
root
root
root
root
...........
...........
uwu
uwu
uwu
$
Notice that the first line contains "USER" which is not a real user name
(it's just part of the header added by ps(1)), so we will need to get
rid of it using the command tail(1):
$ ps aux | cut -d " " -f 1 | tail -n +2
root
...........
uwu
$
Now, each user name appears in that list a number of times equal to the
number of processes currently run by the user. How to count these
occurrencies? The trick is to use sort(1) and uniq(1). The command
sort(1) can sort a file (or a list of lines provided as input), and by
default it enforces a lexicographical order:
$ ps aux | cut -d " " -f 1 | tail -n +2 | sort
_dhcp
_pflogd
bbs
bbs
bbs
ben
ben
ben
...........
slugmax
slugmax
slugmax
spring
uwu
uwu
uwu
uwu
$
The command uniq(1) will remove contiguous repetitions of each line
given on input:
$ ps aux | cut -d " " -f 1 | tail -n +2 | sort | uniq
_dhcp
_pflogd
bbs
ben
cleber
irc
katolaz
leeb
lntl
nobody
postfix
root
slugmax
spring
uwu
$
Notice that this is just the list of users in the system currently
owning at least one running process, which is not exactly what we were
up to. However, the option '-c' of uniq(1) can do the job, since it
counts how many contiguous repetitions of the same line were found:
$ ps aux | cut -d " " -f 1 | tail -n +2 | sort | uniq -c
1 _dhcp
1 _pflogd
3 bbs
4 ben
5 cleber
1 irc
22 katolaz
10 leeb
3 lntl
1 nobody
3 postfix
56 root
12 slugmax
1 spring
8 uwu
$
This means that user _dhcp has 1 running process, user cleber has 5
running processes, user root has 56 running processes, and so on. We are
almost there. We just need to sort the resulting list according to the
numbers appearing at the beginning of each line. This is done by using
sort(1) again, with the option '-n':
$ ps aux | cut -d " " -f 1 | tail -n +2 | sort | uniq -c | sort -n
1 _dhcp
1 _pflogd
1 irc
1 nobody
1 spring
3 bbs
3 lntl
3 postfix
4 ben
5 cleber
8 uwu
10 leeb
12 slugmax
23 katolaz
50 root
$
If you want the list to to be sorted in descending order of number of
processes, you need to just reverse the ordering, which can be done by
passing the option '-r' to sort(1):
$ ps aux | cut -d " " -f 1 | tail -n +2 | sort | uniq -c | sort -rn
50 root
23 katolaz
12 slugmax
10 leeb
8 uwu
5 cleber
4 ben
3 postfix
3 lntl
3 bbs
1 spring
1 nobody
1 irc
1 _pflogd
1 _dhcp
$
This is the one-liner we had at the beginning of this post. The result
indicates that I should probably close some of the screens I am not
using... :P
-+-+-+-
Most of the tools we have seen here were forged by the ancient dwarven
blacksmiths at Murray Hill, in the Eastern Lands, and have survived
pretty unmodified in the unix environment for ages. In particular:
sort(1) appeared in UNIXv2 (March 1972)
uniq(1) appeared in UNIXv3 (February 1973)
tail(1) appeared in UNIXv7 (January 1979)
Some other tools, instead, were created in the Eastern Lands and
readjusted and perfected by the sapient master craftsmen of the West. In
particular:
ps(1) appeared in UNIXv4 (November 1973), although the syntax for
options that we have used here comes from early versions of
BSD2.x (ca 1979-1980)