HaLeX
A Haskell Library to Model, Manipulate and Animate Regular Languages
|
Joćo Saraiva
Department of Computer Science
University of Minho
|
Joćo Saraiva
jas@di.uminho.pt |
|
1: The HaLeX Library
This library defines a number of Haskell data types and functions that manipulate them, which provide the following facilities:
- Definition of
regular expressions
,
non-deterministic
and
deterministic finite automata
directly and straightforwardly in Haskell.
- Definition of the
acceptance functions
for all those models.
- Transformation
of regular expressions into non-deterministic and deterministic finite automata.
- Minimization
of the states of deterministic finite automata.
- Lex-like facilities
,
e.g.
it generates a Haskell program from a regular expression (expressed as it's minimized finite automata).
- Equivalence of regular expressions
and finite automata.
- Definition of
reactive finite automata
.
- Graphical representation
of finite automata.
- Animation
of the acceptance function of finite automata.
|
|
2: The HaLeX Library
The HaLeX library is a small extension to
- Simon Thompson, "Regular Expressions and Automata using Haskell", January 2000.
- Johan Jeuring and Doaitse Swierstra, "Advanced Programming Concepts in a Course on Grammars and Parsing", FDPE'99.
Basically, it extends definitions in these papers with
- the graphic visualization of automata
- the definition of reactive finite automata.
|
|
3: Motivation
This library was developed in the context of a programming methodology course for undergraduate students (3rd year of our degree on software engineering).
- It was developed mainly with
educational purposes
:
- It was developed in order to provide a
clear
,
efficient
and
concise
way to define, to understand and to manipulate regular languages in Haskell.
- The construction of parts of this library has been proposed as assigment projects the students had to solve in that course.
- HaLeX is now being used to support the programming methodology course.
|
|
4: Regular Expressions
Regular expressions are modelled in Haskell by the abstract data type
RegExp
. We define one data constructor for each "standard" regular expression operator.
data
RegExp |
= |
Epsilon |
|
|
| |
Literal |
Char |
|
| |
Or |
RegExp RegExp |
|
| |
Then |
RegExp RegExp |
|
| |
Star |
RegExp |
|
| |
OneOrMore |
RegExp |
|
| |
Optional |
RegExp |
and we can write:
a |
= |
Literal
'a' |
aOrb |
= |
(
Literal
'a') `
Or
` (
Literal
'b') | |
|
5: Deterministic Finite Automata
Formally, a Deterministic Finite Automaton (DFA) is a five-tuple
A = (V,Q,S,Z,D)
. In order to make our code more readable, we prefer to define a new data type, to express the automata's five-tuple, named
Dfa
with a single constructor, named
Dfa
as well, rather than using the Haskell built-in tuples. To make our DFA's more general, we parameterize the data type with the types of states and symbols
data
Dfa
st sy |
=
Dfa |
[sy] |
-- Vocabulary |
|
|
[st] |
-- Finite set of states |
|
|
st |
-- The start state |
|
|
[st] |
-- The set of final states |
|
|
(st - > sy - > st) |
-- The transition function |
where sets are modelled through the Haskell built-in lists. |
|
6: Haskell-based deterministic finite automata: example
|
ex1 |
::
Dfa |
Char Char |
ex1 |
=
Dfa |
['a','b'] |
|
|
['A','B','C','D'] |
|
|
'A' |
|
|
['C'] |
|
|
delta |
|
where |
delta 'A' 'a' = 'A' |
|
|
delta 'A' 'b' = 'B' |
|
|
delta 'B' 'a' = 'C' |
|
|
delta _ _ = 'D' | | |
|
7: Non-Deterministic Finite Automata with Epsilon Transitions
We define non-deterministic finite automata with
epsilon
transitions and several initial states.
To epsilon transitions we use the pre-defined datatype Maybe, where the constructor Nothing models epsilon-transitions, while the constructor Just models transitions labeled by its argument symbol.
data Ndfa st sy |
= Ndfa |
[sy] |
-- Vocabulary |
|
|
[st] |
-- Finite set of states |
|
|
[st] |
-- The set of start states |
|
|
[st] |
-- The set of final states |
|
|
(st - >
Maybe
sy - > [st]) |
-- The transition function | |
|
8: Non-Deterministic Finite Automata with Epsilon Transitions: Example
ex1 |
::
Ndfa |
Char Char |
ex1 |
=
Ndfa |
['a','b'] |
|
|
['A','B','C'] |
|
|
['A'] |
|
|
['C'] |
|
|
delta |
|
where |
delta 'A' (
Just
'a') = ['A','B'] |
|
|
delta 'A' (
Just
'b') = ['B'] |
|
|
delta 'B' (
Just
'a') = ['C'] |
|
|
delta 'C'
Nothing
= ['A','B'] |
|
|
delta _ _ = [] | |
| |
|
9: The Acceptance Functions
We have defined acceptance functions for all these models.
For example, the acceptance function of a DFA expresses the recursion pattern captured by the Haskell prelude foldl function. Thus, the function accept is concisely defined as follows:
accept
(
Dfa
v q s z delta) simb |
= |
(
foldl
delta s simb) `elem` z | |
|
10: Graphic Representation of Finite Automata
Finite automata are more comprehensible in a graphical representation. Usually automata are represented as graphs:
- states are depicted as the nodes of a graph, final states get a double circle, and the initial states are explicitly mentioned usually with an incoming arrow.
But, drawing graphs is a complex task and an intensive area of research.
Rather than implementing a graph visualization system from scracth (and reinventing the wheel...), we compute a concrete representation for a high quality graph visualization system: the
GraphViz system
.
- GraphViz provides a collection of tools for manipulating graph structures and generating graph layouts.
|
|
11: Graphic Representation of Finite Automata
|
digraph HaLeX { |
rankdir = LR ; |
"1" [shape = circle , color = green]; |
"2" [shape = doublecircle , color = red]; |
"3" [shape = doublecircle , color = red]; |
node [shape = circle , color = black]; |
"1" - > "2" [label = "'a'"]; |
"1" - > "3" [label = "'b'"]; |
"2" - > "4" [label = "'a','b'"]; |
"3" - > "3" [label = "'a'"]; |
"3" - > "4" [label = "'b'"]; |
"4" - > "4" [label = "'a','b'"]; |
} | | |
|
12: Graphic Representation of Finite Automata
The HaLeX library includes a pretty-printing function, named
toGraph
that, given a FA, produces its (textual) graph representation in the Dot language.
In order to beautify the graphical representation, we have included options
- to ignore sink states
- to fuse transitions with the same origin and destination.
We omit here its trivial implementation. |
|
13: Reactive Finite Automata
While accepting a sentence, we may wish the automaton to react. For example, we may wish
- to compute some trace information
- to perform some IO operations
- to compute semantic functions
A reactive finite automaton (RFA) is a machine that reacts while moving from one state to another. To add reactions to finite automata we use monads and we define monadic finite automata. |
|
14: Reactive Deterministic Finite Automata
Because we wish to associate different effects to finite automata, we parameterize the type of the
DFA
with the type of the monad. The reactions are defined in the (monadic) transition function. As a result, those functions not only have to indicate the destination state(s), but also the reactions to be performed in the transitions.
data
Dfa
m
st sy |
=
Dfa |
[sy] |
-- Vocabulary |
|
|
[st] |
-- Finite set of states |
|
|
st |
-- The start state |
|
|
[st] |
-- The set of final states |
|
|
(st - > sy - >
m
st) |
-- The Monadic transition function |
|
|
15: Acceptance Function
The acceptance function is now defined as the standard \Haskell\ monadic foldM function:
dfaaccept
(
Dfa
_ _ s z delta) sent = do |
st < -
foldM
delta s sent |
|
return (st `elem` z) | |
|
16: Communication protocol
ptl |
:: Dfa (State ([Char],[Int])) Int Char |
ptl |
= Dfa ['1','0'] [1,2,3,4,5,6,7,8,9,10,11,12,13] 1 [12] delta |
|
where delta 1 '0' = return 2 |
|
delta 2 '0' = return 3 |
|
delta 3 '0' = return 4 |
|
delta 4 s = do { bits s ; return 5 } |
|
delta 5 s = do { bits s ; return 6 } |
|
delta 6 s = do { bits s ; return 7 } |
|
delta 7 '0' = do { values ; return 8 } |
|
delta 7 '1' = do { values ; return 10 } |
|
delta 8 '0' = return 9 |
|
delta 9 '1' = return 4 |
|
delta 10 '1' = return 11 |
|
delta 11 '1' = return 12 |
|
delta _ _ = return 13 |
|
bits x = modify (\ s - > ((fst s) + + [x],snd s)) |
|
values = modify (\ s - > ("",snd s + + [bits2int (fst s)])) | |
| |
|
17: Animating Regular Languages
The acceptance function finds a path from the initial state into a final state of the automata.
HaLeX animations are just the step-by-step display of such a path in the graphic representation of the automaton
Thus, to animate finite automata we need to compute the sequence of visited nodes.
This trace information can be easily computed using our reactive FA: the monadic automata just have to accumulate in the state the nodes being visited. |
|
18: Animating Regular Languages
For example, HaLeX derives such monadic DFA from regular expressions
mdfa |
:: Dfa (State [Char]) Int Char |
mdfa |
= Dfa v q s z delta |
where |
|
|
v = " + -d." |
|
q = [1,2,3,4,5,6] |
|
s = 1 |
|
z = [3,6] |
|
delta 1 ' + ' = do { accum '2' ; return 2 } |
|
delta 1 '-' = do { accum '2' ; return 2 } |
|
delta 1 'd' = do { accum '3' ; return 3 } |
|
delta 1 '.' = do { accum '4' ; return 4 } | |
|
19: The halex Tool
HaLeX is available at:
http://www.di.uminho.pt/~jas/Research/HaLeX/HaLeX.html
Having defined the HaLeX library, we can now define useful tools to manipulate and visualize regular languages.
We have constructed the
halex
program entirely in HaLeX.
Tool Demo |
|
20: More Animations?
Today, 16:30, GPCE
- Joćo Saraiva, "Component-based Programming for Higher-Order Attribute Grammars"
|
|