--- a/LINKS Sat Dec 29 10:30:27 2018 +0000
+++ b/LINKS Tue Feb 12 21:23:00 2019 +0000
@@ -1,7 +1,22 @@
Thinking Forth (Book to read)
=================
+=================
+opinionated functional programming language compiling to the JVM
+https://flix.github.io/#/research/
+https://flix.github.io/#/principles/
+
+=================
+General tail call elimination on the JVM
+
+http://cs.au.dk/~magnusm/papers/cc18/paper.pdf
+
+=================
+incremental approach to compiler construction
+http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
+
+=================
Compiler courses
http://www.cse.chalmers.se/edu/year/2017/course/TDA283/lectures/
http://www.cs.columbia.edu/~sedwards/classes/2017/4115-fall/index.html
--- a/data.sty Sat Dec 29 10:30:27 2018 +0000
+++ b/data.sty Tue Feb 12 21:23:00 2019 +0000
@@ -73,6 +73,21 @@
28 34.79653
\end{filecontents}
+% JavaScript, example (a*)*b
+\begin{filecontents}{re-js.data}
+5 0.061
+10 0.061
+15 0.061
+20 0.070
+23 0.131
+25 0.308
+26 0.564
+28 1.994
+30 7.648
+31 15.881
+32 32.190
+\end{filecontents}
+
% Java 8, example (a*)*b
\begin{filecontents}{re-java.data}
5 0.00298
@@ -117,6 +132,8 @@
39000 26.15787
\end{filecontents}
+
+
%% re1.scala: example a?{n} a{n}
\begin{filecontents}{re1.data}
1 0.00179
Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex Sat Dec 29 10:30:27 2018 +0000
+++ b/handouts/ho01.tex Tue Feb 12 21:23:00 2019 +0000
@@ -36,29 +36,34 @@
%https://www.youtube.com/watch?v=gmhMQfFQu20
\begin{document}
-\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018}
+\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019}
\section*{Handout 1}
This module is about text processing, be it for web-crawlers,
-compilers, dictionaries, DNA-data, ad filters and so on. When looking for a
-particular string, like $abc$ in a large text we can use the
-Knuth-Morris-Pratt algorithm, which is currently the most efficient
-general string search algorithm. But often we do \emph{not} just look
-for a particular string, but for string patterns. For example, in
-program code we need to identify what are the keywords (\texttt{if}, \texttt{then},
-\texttt{while}, \texttt{for}, etc), what are the identifiers (variable names). A pattern for
-identifiers could be stated as: they start with a letter, followed by
-zero or more letters, numbers and underscores. Often we also face the
-problem that we are given a string (for example some user input) and
-want to know whether it matches a particular pattern---be it an email
-address, for example. In this way we can exclude user input that would
-otherwise have nasty effects on our program (crashing it or making it
-go into an infinite loop, if not worse). In tools like Snort, scanning
-for computer viruses or filtering out spam usually involves scanning
-for some signature (essentially a string pattern). The point is that
-the fast Knuth-Morris-Pratt algorithm for strings is not good enough
-for such string \emph{patterns}.\smallskip
+compilers, dictionaries, DNA-data, ad filters and so on. When looking
+for a particular string, like \pcode{"foobar"} in a large text we can
+use the Knuth-Morris-Pratt algorithm, which is currently the most
+efficient general string search algorithm. But often we do \emph{not}
+just look for a particular string, but for string patterns. For
+example, in program code we need to identify what are the keywords
+(\texttt{if}, \texttt{then}, \texttt{while}, \texttt{for}, etc), what
+are the identifiers (variable names). A pattern for identifiers could
+be stated as: they start with a letter, followed by zero or more
+letters, numbers and underscores. You might also be surprised what
+constraints programming languages impose about numbers: for example
+123 in JSON is OK, but 0123 is not.
+
+Often we also face the problem that we are given a string (for example
+some user input) and want to know whether it matches a particular
+pattern---be it an email address, for example. In this way we can
+exclude user input that would otherwise have nasty effects on our
+program (crashing it or making it go into an infinite loop, if not
+worse). In tools like Snort, scanning for computer viruses or
+filtering out spam usually involves scanning for some signature
+(essentially a string pattern). The point is that the fast
+Knuth-Morris-Pratt algorithm for strings is not good enough for such
+string \emph{patterns}.\smallskip
\defn{Regular expressions} help with conveniently specifying
such patterns. The idea behind regular expressions is that
@@ -194,7 +199,7 @@
before (remember the PRA module?). Why on earth then is there any
interest in studying them again in depth in this module? Well, one
answer is in the following two graphs about regular expression
-matching in Python, Ruby and Java (Version 8).
+matching in Python, Ruby, JavaScript and Java (Version 8).
\begin{center}
\begin{tabular}{@{\hspace{-1mm}}c@{\hspace{1mm}}c@{}}
@@ -214,11 +219,12 @@
axis lines=left,
width=5.5cm,
height=4.5cm,
- legend entries={Python, Java 8},
+ legend entries={Python, Java 8, JavaScript},
legend pos=north west,
legend cell align=left]
\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
\end{axis}
\end{tikzpicture}
&
@@ -248,7 +254,7 @@
\end{tabular}
\end{center}
-\noindent This first graph shows that Python and Java 8 need
+\noindent This first graph shows that Python, JavaScript and Java 8 need
approximately 30 seconds to find out that the regular expression
$\texttt{(a*)*\,b}$ does not match strings of 28 \texttt{a}s.
Similarly, the second shows that Python needs approximately 29 seconds
Binary file handouts/ho02.pdf has changed
--- a/handouts/ho02.tex Sat Dec 29 10:30:27 2018 +0000
+++ b/handouts/ho02.tex Tue Feb 12 21:23:00 2019 +0000
@@ -6,22 +6,23 @@
\begin{document}
-\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018}
+\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017, 2018, 2019}
\section*{Handout 2 (Regular Expression Matching)}
This lecture is about implementing a more efficient regular expression
matcher (the plots on the right below)---more efficient than the
-matchers from regular expression libraries in Ruby, Python and Java
-(the plots on the left). The first pair of plots shows the running time
-for the regular expression $(a^*)^*\cdot b$ and strings composed of
-$n$ \pcode{a}s (meaning this regular expression actually does not
-match the strings). The second pair of plots shows the running time for
-the regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings
-also composed of $n$ \pcode{a}s (this time the regular expressions
-match the strings). To see the substantial differences in the left
-and right plots below, note the different scales of the $x$-axes.
+matchers from regular expression libraries in Ruby, Python, JavaScript
+and Java (the plots on the left). For this consider some experimental
+data: The first pair of plots shows the running time for the
+regular expression $(a^*)^*\cdot b$ and strings composed of $n$
+\pcode{a}s (meaning this regular expression actually does not match
+the strings). The second pair of plots shows the running time for the
+regular expressions $a^?{}^{\{n\}}\cdot a^{\{n\}}$ and strings also
+composed of $n$ \pcode{a}s (this time the regular expressions match
+the strings). To see the substantial differences in the left and
+right plots below, note the different scales of the $x$-axes.
\begin{center}
@@ -41,11 +42,12 @@
axis lines=left,
width=5cm,
height=5cm,
- legend entries={Java 8, Python},
+ legend entries={Java 8, Python, JavaScript},
legend pos=north west,
legend cell align=left]
\addplot[blue,mark=*, mark options={fill=white}] table {re-python2.data};
\addplot[cyan,mark=*, mark options={fill=white}] table {re-java.data};
+\addplot[red,mark=*, mark options={fill=white}] table {re-js.data};
\end{axis}
\end{tikzpicture}
&
--- a/handouts/ho05.tex Sat Dec 29 10:30:27 2018 +0000
+++ b/handouts/ho05.tex Tue Feb 12 21:23:00 2019 +0000
@@ -6,6 +6,9 @@
% epsilon and left-recursion elimination
% http://www.mollypages.org/page/grammar/index.mp
+%% parsing scala files
+%%https://scalameta.org/
+
\begin{document}
\section*{Handout 5 (Grammars \& Parser)}
Binary file hws/hw01.pdf has changed
--- a/progs/pow.scala Sat Dec 29 10:30:27 2018 +0000
+++ b/progs/pow.scala Tue Feb 12 21:23:00 2019 +0000
@@ -45,6 +45,8 @@
val A = Set("a", "b", "c")
pow(A, 3)
+pow(A, 3).size
val B = Set("a", "b", "")
pow(B, 4)
+pow(B, 4).size
--- a/progs/re3.scala Sat Dec 29 10:30:27 2018 +0000
+++ b/progs/re3.scala Tue Feb 12 21:23:00 2019 +0000
@@ -58,15 +58,26 @@
// derivative w.r.t. a string (iterates der)
-def ders (s: List[Char], r: Rexp) : Rexp = s match {
+def ders(s: List[Char], r: Rexp) : Rexp = s match {
case Nil => r
case c::s => ders(s, simp(der(c, r)))
}
+// derivative w.r.t. a string (iterates der)
+def dersp(s: List[Char], r: Rexp) : Rexp = s match {
+ case Nil => r
+ case c::s => dersp(s, der(c, r))
+}
+
// main matcher function
def matcher(r: Rexp, s: String) : Boolean = nullable(ders(s.toList, r))
+//tests
+val q = SEQ(SEQ(CHAR('x'), CHAR('y')), CHAR('z'))
+dersp("x".toList, q)
+dersp("xy".toList, q)
+dersp("xyz".toList, q)
//one or zero
def OPT(r: Rexp) = ALT(r, ONE)
--- a/style.sty Sat Dec 29 10:30:27 2018 +0000
+++ b/style.sty Tue Feb 12 21:23:00 2019 +0000
@@ -8,6 +8,7 @@
\usepackage{menukeys}
\definecolor{darkblue}{rgb}{0,0,0.6}
\usepackage[colorlinks=true,urlcolor=darkblue,linkcolor=darkblue]{hyperref}
+\usepackage{soul}
%%% for regular expressions and values
\newcommand{\ZERO}{\mbox{\bf 0}}
@@ -76,4 +77,6 @@
regular feedback to me: for example
what were the most interesting, least interesting, or confusing
parts in this lecture? Any problems with my Scala code? Please
-feel free to share any other questions or concerns.}
+feel free to share any other questions or concerns. Also, all my
+material is \st{crap} imperfect. If you have any suggestions for
+improvement, I am very grateful to hear.}