Understanding the Shell Parsing Order

February 9, 2010

The shell understands the commandline as pipeline and list,

* Pipeline is sequence of one or more commands separated by the character |
* List is sequence of one or more pipelines separated by one of the operators: ; , & , && , ||

For each pipeline it will perform the following steps before executing the command,

1. Tokenize

Splits the commands into tokens that are separated by the fixed set of metacharacters.
Tokens => words, keywords, I/O redirectors and semicolons
metacharactes => space, tab, newline, ;(,),<,>,| and &

2. Compound commands
Checks the first token to see if its a keyword with no quotes or backslashes. It can be opening keyword ( like if , { or () ie. compound command) or control structure middle ( like then, else, do ) , end ( fi , done )

3. Aliases
Checks the first word of each command against the list of aliases.
3.i) if match is found, substitutes the alias’s definition and goto Step 1
eg. ll becomes ‘ls -l *’ ### where alias ll=”ls -l *” defined already
3.ii) Otherwise goto Step 4

4. Brace expansion
Similar to pathname expansion except file names generated need not exist. Its a mechanism by which arbitrary strings may be generated.

eg. a{b,c} becomes ab ac.

5. Tilde expansion
If a word begins with an unquoted tilde character ( ~ ), the characters following the tilde are treated as a possible login name.
eg. cd ~joe becomes cd ‘/home/joe’ ### where /home/joe is the $HOME of user joe

6. Parameter substitution
The value of the parameter is substituted.
eg. # echo ${HOME} becomes ‘/home/joe’

7. Command substitution
Command name will be replaced with the standard output of the command . It is of two forms $(string) or `string`.
eg. # `which find` or $(which find) becomes ‘/usr/bin/find’

8. Arithmetic expression
Performs evaluation of an arithmetic expression and substitute the results. It is of the form $((string)).
eg.1 # $(($a+$b)) becomes ‘9’ ### where a=5; b=4

eg.2 # $(($(cat /tmp/file1 | wc -l) + $b)) becomes 7 ### calculate no. of lines in /tmp/file1 ( 3 lines ) and add it to the value of $b ( 4 )

9. Word splitting
Takes part of the line resulted from parameter, command and arithmetic substitution and splits them into word again using $IFS as delimiters instead of metacharacters used in step 1.

The default value of IFS is exactly [space][tab][newline]

10. Pathname expansion
Also known as filename generation or wildcard expansion. The shell scans for the characters *, ? and [ ie. pattern and replace it with an alphabetically sorted list of file names matching the pattern.
eg.1 # file* becomes ‘file1 file2 file3 file1.txt file2.txt file3.txt’

eg. 2 #file? becomes file1 file2 file3

11. Functions, Built-ins and $PATH

At this stage the command has been already split into words containing a simple command and optional list of arguments. And if that simple command ie. first word, contains no slashes ( not an absolute path ) , the shell will attempt to locate it in the following search order

11.i) Function
If there exists a function by that name, it ‘ll be invoked
11.ii) Built-in commands
If there exists a built-in command, it ‘ll be invoked
11.iii) Search $PATH
Finally searches each element of the path for directory containing the command name. The first file found will be chosen.

12. Redirection
Before a command is executed, its input and output may be redirected. Redirection operators < , > , <<, >>
eg. cat /tmp/file1 > /tmp/file2