43 |
43 |
44 \begin{document} |
44 \begin{document} |
45 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019, 2020} |
45 \fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019, 2020} |
46 |
46 |
47 \section*{Handout 1} |
47 \section*{Handout 1} |
|
48 |
48 |
49 |
49 The purpose of a compiler is to transform a program a human can read and |
50 The purpose of a compiler is to transform a program a human can read and |
50 write into code the machine can run as fast as possible. Developing a |
51 write into code the machine can run as fast as possible. Developing a |
51 compiler is an old craft going back to 1952 with the first compiler |
52 compiler is an old craft going back to 1952 with the first compiler |
52 written by Grace Hopper.\footnote{Who many years ago was invited on a |
53 written by Grace Hopper.\footnote{Who many years ago was invited on a |
477 Parentheses are not really part of a regular expression, and |
478 Parentheses are not really part of a regular expression, and |
478 indeed we do not need them in our code because there the tree |
479 indeed we do not need them in our code because there the tree |
479 structure of regular expressions is always clear. But for |
480 structure of regular expressions is always clear. But for |
480 writing them down in a more mathematical fashion, parentheses |
481 writing them down in a more mathematical fashion, parentheses |
481 will be helpful. For example we will write $(r_1 + r_2)^*$, |
482 will be helpful. For example we will write $(r_1 + r_2)^*$, |
482 which is different from, say $r_1 + (r_2)^*$. The former means |
483 which is different from, say $r_1 + (r_2)^*$. This can be |
483 roughly zero or more times $r_1$ or $r_2$, while the latter |
484 seen if we write regular expressions as trees: |
|
485 |
|
486 \begin{center} |
|
487 \includegraphics[scale=0.65]{../pics/r1.pdf} |
|
488 \hspace{3cm} |
|
489 \includegraphics[scale=0.65]{../pics/r2.pdf} |
|
490 \end{center} |
|
491 |
|
492 \noindent |
|
493 The regular expression on the left means |
|
494 roughly zero or more times $r_1$ or $r_2$, while the one on the right |
484 means $r_1$, or zero or more times $r_2$. This will turn out to |
495 means $r_1$, or zero or more times $r_2$. This will turn out to |
485 be two different patterns, which match in general different |
496 be two different patterns, which match in general different |
486 strings. We should also write $(r_1 + r_2) + r_3$, which is |
497 strings. We should also write $(r_1 + r_2) + r_3$, which is |
487 different from the regular expression $r_1 + (r_2 + r_3)$, but |
498 different from the regular expression $r_1 + (r_2 + r_3)$, but |
488 in case of $+$ and $\cdot$ we actually do not care about the |
499 in case of $+$ and $\cdot$ we actually do not care about the |