A Haskell Library to Model, Manipulate and Animate Regular Languages

Joćo Saraiva

Department of Computer Science

University of Minho

Joćo Saraiva


1: The HaLeX Library

This library defines a number of Haskell data types and functions that manipulate them, which provide the following facilities:
  • Definition of regular expressions , non-deterministic and deterministic finite automata directly and straightforwardly in Haskell.
  • Definition of the acceptance functions for all those models.
  • Transformation of regular expressions into non-deterministic and deterministic finite automata.
  • Minimization of the states of deterministic finite automata.
  • Lex-like facilities , e.g. it generates a Haskell program from a regular expression (expressed as it's minimized finite automata).
  • Equivalence of regular expressions and finite automata.
  • Definition of reactive finite automata .
  • Graphical representation of finite automata.
  • Animation of the acceptance function of finite automata.

2: The HaLeX Library

The HaLeX library is a small extension to
  • Simon Thompson, "Regular Expressions and Automata using Haskell", January 2000.
  • Johan Jeuring and Doaitse Swierstra, "Advanced Programming Concepts in a Course on Grammars and Parsing", FDPE'99.
Basically, it extends definitions in these papers with
  • the graphic visualization of automata
  • the definition of reactive finite automata.

3: Motivation

This library was developed in the context of a programming methodology course for undergraduate students (3rd year of our degree on software engineering).
  • It was developed mainly with educational purposes :
    • It was developed in order to provide a clear , efficient and concise way to define, to understand and to manipulate regular languages in Haskell.
    • The construction of parts of this library has been proposed as assigment projects the students had to solve in that course.
    • HaLeX is now being used to support the programming methodology course.

4: Regular Expressions

Regular expressions are modelled in Haskell by the abstract data type RegExp . We define one data constructor for each "standard" regular expression operator.

data RegExp = Epsilon
| Literal Char
| Or RegExp RegExp
| Then RegExp RegExp
| Star RegExp
| OneOrMore RegExp
| Optional RegExp

and we can write:
a = Literal 'a'
aOrb = ( Literal 'a') ` Or ` ( Literal 'b')

5: Deterministic Finite Automata

Formally, a Deterministic Finite Automaton (DFA) is a five-tuple A = (V,Q,S,Z,D) . In order to make our code more readable, we prefer to define a new data type, to express the automata's five-tuple, named Dfa with a single constructor, named Dfa as well, rather than using the Haskell built-in tuples. To make our DFA's more general, we parameterize the data type with the types of states and symbols

data Dfa st sy = Dfa [sy] -- Vocabulary
[st] -- Finite set of states
st -- The start state
[st] -- The set of final states
(st - > sy - > st) -- The transition function

where sets are modelled through the Haskell built-in lists.

6: Haskell-based deterministic finite automata: example

ex1 :: Dfa Char Char
ex1 = Dfa ['a','b']
where delta 'A' 'a' = 'A'
delta 'A' 'b' = 'B'
delta 'B' 'a' = 'C'
delta _ _ = 'D'

7: Non-Deterministic Finite Automata with Epsilon Transitions

We define non-deterministic finite automata with epsilon transitions and several initial states.

To epsilon transitions we use the pre-defined datatype Maybe, where the constructor Nothing models epsilon-transitions, while the constructor Just models transitions labeled by its argument symbol.

data Ndfa st sy = Ndfa [sy] -- Vocabulary
[st] -- Finite set of states
[st] -- The set of start states
[st] -- The set of final states
(st - > Maybe sy - > [st]) -- The transition function

8: Non-Deterministic Finite Automata with Epsilon Transitions: Example

ex1 :: Ndfa Char Char
ex1 = Ndfa ['a','b']
where delta 'A' ( Just 'a') = ['A','B']
delta 'A' ( Just 'b') = ['B']
delta 'B' ( Just 'a') = ['C']
delta 'C' Nothing = ['A','B']
delta _ _ = []

9: The Acceptance Functions

We have defined acceptance functions for all these models.

For example, the acceptance function of a DFA expresses the recursion pattern captured by the Haskell prelude foldl function. Thus, the function accept is concisely defined as follows:

accept ( Dfa v q s z delta) simb = ( foldl delta s simb) `elem` z

10: Graphic Representation of Finite Automata

Finite automata are more comprehensible in a graphical representation. Usually automata are represented as graphs:
  • states are depicted as the nodes of a graph, final states get a double circle, and the initial states are explicitly mentioned usually with an incoming arrow.

But, drawing graphs is a complex task and an intensive area of research.

Rather than implementing a graph visualization system from scracth (and reinventing the wheel...), we compute a concrete representation for a high quality graph visualization system: the GraphViz system .
  • GraphViz provides a collection of tools for manipulating graph structures and generating graph layouts.

11: Graphic Representation of Finite Automata

digraph HaLeX {
rankdir = LR ;
"1" [shape = circle , color = green];
"2" [shape = doublecircle , color = red];
"3" [shape = doublecircle , color = red];
node [shape = circle , color = black];
"1" - > "2" [label = "'a'"];
"1" - > "3" [label = "'b'"];
"2" - > "4" [label = "'a','b'"];
"3" - > "3" [label = "'a'"];
"3" - > "4" [label = "'b'"];
"4" - > "4" [label = "'a','b'"];

12: Graphic Representation of Finite Automata

The HaLeX library includes a pretty-printing function, named toGraph that, given a FA, produces its (textual) graph representation in the Dot language.

In order to beautify the graphical representation, we have included options
  • to ignore sink states
  • to fuse transitions with the same origin and destination.

We omit here its trivial implementation.

13: Reactive Finite Automata

While accepting a sentence, we may wish the automaton to react. For example, we may wish
  • to compute some trace information
  • to perform some IO operations
  • to compute semantic functions
A reactive finite automaton (RFA) is a machine that reacts while moving from one state to another. To add reactions to finite automata we use monads and we define monadic finite automata.

14: Reactive Deterministic Finite Automata

Because we wish to associate different effects to finite automata, we parameterize the type of the DFA with the type of the monad. The reactions are defined in the (monadic) transition function. As a result, those functions not only have to indicate the destination state(s), but also the reactions to be performed in the transitions.

data Dfa m st sy = Dfa [sy] -- Vocabulary
[st] -- Finite set of states
st -- The start state
[st] -- The set of final states
(st - > sy - > m st) -- The Monadic transition function

15: Acceptance Function

The acceptance function is now defined as the standard \Haskell\ monadic foldM function:

dfaaccept ( Dfa _ _ s z delta) sent = do st < - foldM delta s sent
return (st `elem` z)

16: Communication protocol

ptl :: Dfa (State ([Char],[Int])) Int Char
ptl = Dfa ['1','0'] [1,2,3,4,5,6,7,8,9,10,11,12,13] 1 [12] delta
where delta 1 '0' = return 2
delta 2 '0' = return 3
delta 3 '0' = return 4
delta 4 s = do { bits s ; return 5 }
delta 5 s = do { bits s ; return 6 }
delta 6 s = do { bits s ; return 7 }
delta 7 '0' = do { values ; return 8 }
delta 7 '1' = do { values ; return 10 }
delta 8 '0' = return 9
delta 9 '1' = return 4
delta 10 '1' = return 11
delta 11 '1' = return 12
delta _ _ = return 13
bits x = modify (\ s - > ((fst s) + + [x],snd s))
values = modify (\ s - > ("",snd s + + [bits2int (fst s)]))

17: Animating Regular Languages

The acceptance function finds a path from the initial state into a final state of the automata.

HaLeX animations are just the step-by-step display of such a path in the graphic representation of the automaton

Thus, to animate finite automata we need to compute the sequence of visited nodes.

This trace information can be easily computed using our reactive FA: the monadic automata just have to accumulate in the state the nodes being visited.

18: Animating Regular Languages

For example, HaLeX derives such monadic DFA from regular expressions

mdfa :: Dfa (State [Char]) Int Char
mdfa = Dfa v q s z delta
v = " + -d."
q = [1,2,3,4,5,6]
s = 1
z = [3,6]
delta 1 ' + ' = do { accum '2' ; return 2 }
delta 1 '-' = do { accum '2' ; return 2 }
delta 1 'd' = do { accum '3' ; return 3 }
delta 1 '.' = do { accum '4' ; return 4 }

19: The halex Tool

HaLeX is available at:


Having defined the HaLeX library, we can now define useful tools to manipulate and visualize regular languages.

We have constructed the halex program entirely in HaLeX.

Tool Demo

20: More Animations?

Today, 16:30, GPCE
  • Joćo Saraiva, "Component-based Programming for Higher-Order Attribute Grammars"