HaLeX: A Haskell Library to Model, Manipulate and Animate Regular Languages
Joćo Saraiva
jas@di.uminho.pt
Joćo SaraivaDepartamento de Informįtica Universidade do MinhoPortugal
The Haskell library HaLeX enables us to model, manipulate and
animate regular languages. This library was developed in the context
of a programming methodology course for undergraduate students, and as
a consequence, it was defined mainly for educational purposes. Indeed,
it provides a clear, efficient and concise way to define, to
understand and to manipulate regular languages in
Haskell. Furthermore, the construction of the complete library has
been proposed as assignment projects to the students following the
course. HaLeX is now being used to support this course.
The HaLeX library introduces a number of Haskell datatypes
and the respective functions that manipulate them, providing the
following facilities:
The definition of deterministic finite automata, non-deterministic
finite automata, and regular expressions directly and
straightforwardly in Haskell.
The definition of acceptance functions for all those
models.
The transformation from regular expressions into
non-deterministic finite automata (NDFA) and from NDFA into
deterministic finite automata (DFA).
The minimization of the number of states of deterministic
finite automata.
The equivalence of regular expressions and finite automata.
The graphical representation of finite automata.
The definition of Lex-like facilities, for example,
from a regular expression it generates an efficient recognizer for the
language: a Haskell program based on the equivalent minimized DFA.
The definition of reactive finite automata. In order to allow
that an automaton reacts while accepting an input sentence, the
library defines a monadic finite automaton. The reactions are defined
straightforwardly as monadic functions, that are easily embedded in
the automaton definition.
The simplification of regular expressions
(to be included soon: this year students project) The automatic animation of the acceptance function of finite
automata.
The HaLeX library is now being used in
the HaGLr parser generator. The HaGLr produces top-down and
bottom-up parsers that use several parsing techniques, such as LL(1),
SLR(1) and the powerful Generalized LR parsing technology.
HaLeX is used to construct, manipulate and visualize the
underlying LR(0) automata for bottom-up parsers.
Regular expressions are modelled in Haskell by the abstract data type
RegExp , parameterized with the type
of the symbols. We define one data constructor for each "standard"
regular expression operator.
data RegExp sy = Empty | Epsilon | Literal sy | Or (RegExp sy) (RegExp sy) | Then (RegExp sy) (RegExp sy) | Star (RegExp sy) | OneOrMore (RegExp sy) | Optional (RegExp sy)
and we can write:
a = Literal 'a' aOrb = (Literal 'a') `Or` (Literal 'b')
Formally, a Deterministic Finite Automaton (DFA) is a five-tuple
A=(V,Q,S,Z,D). In order to make our code more readable, we prefer to
define a new data type, to express the automata's five-tuple, named
Dfa with a single constructor,
named Dfa as well, rather than
using the Haskell built-in tuples. To make our DFA's more general, we
parameterize the data type with the types of states and symbols
data Dfa st sy = Dfa [sy] -- Vocabulary [st] -- Finite set of states st -- The start state [st] -- The set of final states (st -> sy -> st) -- The transition function
where sets are modelled through the Haskell built-in lists.
Now, We can define DFA easily in Haskell as follows:
ex1 :: Dfa Char Char ex1 = Dfa ['a','b'] ['A','B','C','D'] 'A' ['C'] delta where delta 'A' 'a' = 'A' delta 'A' 'b' = 'B' delta 'B' 'a' = 'C' delta _ _ = 'D'
We define non-deterministic finite automata with epsilon
transitions and several initial states.
To epsilon transitions we use the pre-defined datatype Maybe, where
the constructor Nothing models epsilon-transitions, while the
constructor Just models transitions labeled by its argument symbol.
data Ndfa st sy = Ndfa [sy] -- Vocabulary [st] -- Finite set of states [st] -- The set of start states [st] -- The set of final states (st -> Maybe sy -> [st]) -- The transition function
Next, we show an example of a NDFA.
ex1 :: Ndfa Char Char ex1 = Ndfa ['a','b'] ['A','B','C'] ['A'] ['C'] delta where delta 'A' (Just 'a') = ['A','B'] delta 'A' (Just'b') = ['B'] delta 'B' (Just 'a') = ['C'] delta 'C' Nothing = ['A','B'] delta _ _ = []
Finite automata are more comprehensible in a graphical
representation. The following representation is used by
HaLeX: states are depicted as the nodes of a graph, final
states get a red double circle, and
the initial states are explicitly mentioned by using a green circle. The transition function induces
arrows connecting the nodes of the graph: for each transition
o to d thought symbol s there
is an arrow labeled by s from node o to
d.
Drawing graphs is a complex task and an intensive area of
research. Thus, instead of defining a graph visualization system from
scratch, and reinventing the wheel, HaLeX synthesizes a graph
representation for an existing high quality graph visualization
system:
the GraphViz system, developed and available at
AT\&T Labs .
GraphViz provides a collection of tools for manipulating graph
structures and generating graph layouts. The input is a description of
the graph in the Dot language. Next, we present an example of
the input Dot text (left) and the outputed graph as displayed
by one of GraphViz tools, the dot tool (right).
digraph HaLeX {rankdir = LR ;"1" [shape=circle , color=green];"2" [shape=doublecircle , color=red];"3" [shape=doublecircle , color=red]; node [shape=circle , color=black]; "1" -> "2" [label = "'a'"]; "1" -> "3" [label = "'b'"]; "2" -> "4" [label = "'a','b'"]; "3" -> "3" [label = "'a'"]; "3" -> "4" [label = "'b'"]; "4" -> "4" [label = "'a','b'"];}
The HaLeX library includes a pretty-printing function, named
toGraphViz that, given a FA, produces its (textual) graph
representation in the Dot language. In order to ``beautify''
the graphical representation, we have included options to ignore the
sink states and to fuse transitions with the same origin and
destination.
While accepting a sentence, we may wish the automaton to react. For
example, we may wish to compute some trace information (for debugging
purposes), or to perform some IO operations, or to compute semantic
functions (in our example of real numbers, to convert the sequence of
characters into its real value). A reactive finite automaton (RFA) is
a machine that reacts while moving from one state to another. To add
reactions to finite automata we use monads and we define monadic
finite automata. Because we wish to associate different effects to
finite automata, we parameterize the type of the DFA with the type of
the monad. The reactions are defined in the (monadic) transition
function. As a result, those functions not only have to indicate the
destination state(s), but also the reactions to be performed in the
transitions. Reactive deterministic finite automata are defined as
follows:
data Dfam st sy = Dfa [sy] -- Vocabulary [st] -- Finite set of states st -- The start state [st] -- The set of final states (st -> sy -> m st) -- The Monadic transition function
We want to construct a program to implement the communication protocol
between two electronic components. The communication is established by
sending an initial sequence of bits 000. Then, several 3-bit
values are sent in order to configure different parameters of the
components. Those values are separated by the special sequence of bits
001. The communication ends after the sequence of bits
111 is sent. Consider also that to configure the components
the program has to compute the integer values transmitted.
ptl :: Dfa (State ([Char],[Int])) Int Charptl = Dfa ['1','0'] [1,2,3,4,5,6,7,8,9,10,11,12,13] 1 [12] delta where delta 1 '0' = return 2 delta 2 '0' = return 3 delta 3 '0' = return 4 delta 4 s = do { bits s ; return 5 } delta 5 s = do { bits s ; return 6 } delta 6 s = do { bits s ; return 7 } delta 7 '0' = do { values ; return 8 } delta 7 '1' = do { values ; return 10 } delta 8 '0' = return 9 delta 9 '1' = return 4 delta 10 '1' = return 11 delta 11 '1' = return 12 delta _ _ = return 13 bits x = modify (\ s -> ((fst s)++[x],snd s)) values = modify (\ s -> ("",snd s ++ [bits2int (fst s)]))
We started developing the HaLeX library in 2001 in the
context of a third year course on programming methodology. This course
has a working load of 24 hours of theoretical classes and another 24
hours of laboratory classes, running for 12 weeks (ie, a
semester). The theoretical classes introduce the basic concepts of
regular expressions, finite automata and context-free
languages. HaLeX and HaGLr are used to support
such classes. In the laboratory (a two hour class per week) the
students have to solve
exercises using a computer. We have defined eleven exercise sets (one
per week), using literate Haskell, that the students have to
complete. Each set of exercises defines a module of the HaLeX
library. Thus, at the end of the course the students have a complete
documentation of all the exercises and topics covered in the course,
and, also, of the HaLeX library.
Furthermore, during the semester the students have to complete two
assignment projects. The construction of parts of the HaLeX
and HaGLr libraries have been proposed as such projects.
The HaLeX system is public domain and it is available as a
gzipped tar file at:
HaLeX_1.1.tgz
THe HaLeX API is available here
here
.
This documentation is also being distributed with the tool.
Joćo Saraiva, HaLeX: A Haskell Library to Model, Manipulate
and Animate Regular Languages, proceedings of the ACM Workshop on
Functional and Declarative Programming in Education (FDPE/PLI'02),
Pittsburgh, USA, October 2002.
ps
,
abstract
,
bibentry