U.Minho EDFS 2001/2002: Exercises
[ DI/UM ]

Index:
[ Problem 1 - Modelling with Sets |
Problem 2 - Modelling a Memory Cache |
Problem 3 - A Naive Model of the WWW |
Problem 4 - Modelling with Mappings |
Problem 5 - Modelling with Inductive Types |
Problem 6 - Multisets «are» Mappings |
Problem 7 - Hash tables «are» Mappings ]

  Problem 1 [Modelling with Sets]

[Back to Index ]

To improve the operation of fire-brigades fighting forest fires in the hot season, the National Service for Social Protection is developing a mobile station-based geographical information system which keeps track of rural area accessibility.

In this exercise we are concerned with the design of the central database component which records road maps (distances, road alternatives and so on) - a component which we are going to model with sets.

We start from a very simple model (to be improved at a later stage): a road-map is a directed graph whose arcs A->B record the you can drive from A to B relation:

	Map = set of Path;
	Path ::	From: Location
		To: Location;
	Location = token;

  1. Specify functions in VDM-SL to obtain:
    1. The symmetric of a map (ie. the map of the «all ways back»)
    2. The symmetric closure of a map (ie. the map and all its ways back)
    3. The composition of two maps (ie. the map whose paths are obtained by arriving at a location accessible in the first map and proceeding from there to another location accessible in the second map)
    4. The reflection of a map (set of all trivial paths between a location in the map and itself).
    5. The transitive closure of a map (set of all direct or composed paths in a map).

  2. The combined symmetric, transitive and reflexive closure of a map is a map which records an equivalence relation on locations. What is the informal meaning of this equivalence?

  3. Upgrade your Map model by adding further information to each path, namely the name of the road, the distance involved and its (average) speed:
            UMap = set of UPath;
            UPath :: From: Location
                     Info: PathInfo
                     To: Location;
            PathInfo :: name: token
    		    distance: real
    		    speed: real;
            Location = token;
    

    Scale up the function definitions above from Map to UMap. Clearly, you have to pay attention to the handling of PathInfos, by eg. summing up distances, computing average speeds and concatenating road names. (Hint: recall the notion of a weighted graph.)

  Problem 2 [Modelling a Memory Cache]

[Back to Index ]

Analyse the formal VDM-SL model below of a memory cache. Try to improve it and design a test suite in the toolbox to check its behaviour.
types
        Addr = token;
        Data = token;

        Memory :: addrs: set of Addr
                  mapx: map Addr to Data;

        MainMemory = Memory;

        Cache :: addrs: set of Addr
                 mapx: map Addr to Data
                 dirty: set of Addr;

        System :: cache: Cache
                  main: MainMemory
                  inv system == system.cache.addrs subset system.main.addrs;

functions

	badAddrs: System -> set of Addr
	badAddrs (s) == { x | x in set s.cache.addrs &
                              s.cache.mapx(x) <> s.main.mapx(x) };

	load: System * Addr -> System
	load (s, a) == mk_System(
                          mk_Cache(s.cache.addrs union {a},
                                   s.cache.mapx munion {a|->s.main.mapx(a)},
                                   s.cache.dirty),
                          s.main)
	pre a not in set s.cache.addrs;

	systemWrite: System * Data * Addr -> System
	systemWrite (s, d, a) ==
                          mk_System(
                             mk_Cache(s.cache.addrs,
                                      s.cache.mapx ++ {a|->d},
                                      s.cache.dirty union {a}),
                          s.main);

	flush: System -> System
	flush (s) == let x in set s.cache.addrs 
		     in mk_System(
                             mk_Cache(s.cache.addrs \ {x},
                                      {x} <-: s.cache.mapx,
                                      s.cache.dirty \ {x}),
                             mk_Memory(s.main.addrs,
                                       s.main.mapx ++ {x|->s.cache.mapx(x)}));

  Problem 3 [A Naive Model of the WWW]

[Back to Index ]

Consider the following VDM-SL model specifying, at abstract level, the structure of an information system based on the `World Wide Web' over the INTERNET, where Ref (page address) is a datatype of which no further details are required:
        WWW  = map Ref to URL;           -- (URL=Universal Resource Location)
        URL  = seq of Unit;
        Unit = PlainText | HyperLink;
        PlainText = seq of Word;
	Word = seq of char;
        HyperLink :: link: Ref
                     txt: PlainText;     -- "underlined text"
  1. Add an invariant to WWW ensuring that no URL mentions a non-existing URL (NB: although this cannot be ensured for the WWW as a whole, it makes sense for the particular fragment which embodies our information system.)

  2. Assuming
            wc : seq of char -> nat
    	-- counts the number of words in a string
    
    specify
            nrofwd : URL -> nat
    	-- counts the number of words in a URL
    

  3. Specify
            refsTo : Ref * WWW -> set of Ref
            -- retrieves the addresses of all URLs which point at Ref
    

  4. Search engines on the WWW are based on text inversion, that is, on a function
            invert : WWW -> map Word to map Ref to nat1
    
    which computes, for every word, the URLs which mention it, weighted by the number of occurrences, ie. a multiset. Specify invert. (NB: this inversion operation is far more elaborate in practice!)

  Problem 4 [Modelling with Mappings]

[Back to Index ]

  1. Complete the «..........» in the VDM-SL model below of a toy bank account management system (BAMS) in which accounts (Account) are uniquely identified by account identifiers (AccId). Every account records both a set of account holders (set of AccHolder) and the account's current balance (Amount):

    types
    
    BAMS = map AccId to Account;
    Account :: H: set of AccHolder
               B: Amount;
    
    AccId     = seq of char;
    AccHolder = seq of char;
    Amount = int;
    
    functions
    
    Init : () -> BAMS
    Init () == {|->};
    
    OpenAccount : BAMS * AccId * set of AccHolder * Amount -> BAMS
    OpenAccount(bams,n,h,m) ==
          bams munion .........................
    pre not n in set dom bams;
    
    AddAccHolders: BAMS * AccId * set of AccHolder -> BAMS
    AddAccHolders(bams,n,h) ==
          bams ++ { n |-> let a = bams(n)
                          in mk_Account(a.H union h,a.B) }
    pre ...................;
    
    Credit: BAMS * AccId * Amount -> BAMS
    Credit(bams,n,m) == ...............................
    pre n in set dom bams;
    
    Withdraw: BAMS * AccId * Amount -> BAMS
    Withdraw(bams,n,m) ==
          bams ++ { n |-> let a = bams(n)
                                in mk_Account(a.H, a.B - m) }
    pre ....................................;
    
    GoodCostumers: BAMS * Amount -> set of AccId
    GoodCostumers(bams,m) ==
          --- should select the accounts whose balance is greater than m
    

  2. Attach an invariant to BAMS ensuring:
    1. that every account has at least one account holder.
    2. that the balance of every account is positive.

  Problem 5 [Modelling with Inductive Types]

[Back to Index ]

Goods in a store are packaged inside boxes which are moved around on palettes assembled by a robot system. The dimensions of each palette ( 100cm × 141cm ) follow a normalization principle whereby the base of all boxes (and palettes) is a rectangule whose length is sqrt 2 wider than its width.

Such a normalization not only helps in the packaging binary partition algorithm adopted by the robot system but also makes it possible to identify the dimensions of a particular box by a single number (its width in cm). We shall deliberately ignore the 3rd dimension of the problem.

The picture on the right shows the structure of a palette containing three boxes A, B and C, where space can still be found to accept a box with the dimensions of C or be split in two space areas to accept boxes of smaller dimensions and so and so on. (Areas are always divided widthwise.)

A B
C  

The following VDM-SL inductive datatype models palette binary partitioning trees:

      Space = [ S ];
      S     = Box | BoxSplit;
      Box   :: info: BoxInfo
               width: Width;
      BoxSplit :: one: Space
                  two: Space;
        
      BoxInfo = token;
      Width = real;

  1. Show that Space has polynomial «shape» F X = 1 + Box + X˛. Then build the diagrams of its cata/ana/paramorphisms and record (in VDM-SL notation) the relevant patterns of recursion.

  2. Specify and check (in the toolbox) the following operators over Space:
    whichBoxes : Space -> set of BoxInfo
    -- delivers the information of all boxes in the palette
    whichBoxes(s) == ..............;
    
    freeSpace : Space -> Width
    -- yields the width of the widest free space in the palette
    freeSpace(s) == ..............;
    
    insertBox : Box * Space -> Space
    -- "best fit" insertion of a box, if possible
    insertBox(b,s) == .....................;
    
    removeBox : Box * Space -> Space
    removeBox(b,s) == .....................;
    
    defragment : Space -> Space
    -- collapse all empty subareas as much as possible
    defragment(s) == .....................;
    

  3. Which insertion order would save more space in a palette: wider boxes first? narrower ones first?

  Problem 6 [Multisets «are» Mappings]

[Back to Index ]

A multiset (vulg. «bag») is an «intermediate» notion between a set (set of A) and a sequence (seq of A). In a multiset, a given element a of A may occur more than once, its number of occurrences being called its multiplicity:
    map @A to nat

Multisets are very common in formal modelling. For instance, an inventory

WhatQuantity
chair 12
table 3
cupboard 1
is modelled by a multiset:
m = {"chair"|->12,"table"|->3,"cupboard"|->1}
Invoices are multisets, bank statements are multisets, etc.

The empty mapping {|->} models the empty multiset and set-union generalizes to multiplicity addition, that is, given two multisets n and m, their union is defined as follows:

    mseCup[@A] : map @A to nat * map @A to nat -> map @A to nat
    mseCup(m,n) == m ++ n ++
                   { a |-> m(a) + n(a) | a in set dom n inter dom m };
  1. What is the outcome of mseCup(m,m)? Check it in the toolbox for m the multiset above. Check that mseCup(m,m) has the same elements of mseCup(m) but that their multiplicity is two times larger.

    Generalize this by defining

    mseExMul[@A] : map @A to nat * nat -> map @A to nat
    
    the multiplication of a multiset by a numeric factor.
  2. Define a function
    mseDiff[@A] : map @A to nat * map @A to nat -> map @A to nat
    
    which computes multiset difference.
  3. Generalize multiset union to a finite set of multisets:
    mseCUP[@A] : seq of map @A to nat -> map @A to nat
    

  Problem 7 [Hash tables «are» Mappings]

[Back to Index ]

Hash tables are well known data structures whose purpose is to efficiently combine the advantages of both static and dynamic storage of data. Static structures such as arrays provide random access to data but have the disadvantage of filling too much primary storage. Dynamic, pointer-based structures (e.g. search lists, search trees etc.) are more versatile with respect to storage requirements but access to data is not as immediate.

The idea of hashing is suggested by the informal meaning of the term itself: a large database file is «hashed» into as many «pieces» as possible, each of which is randomly accessed. Since each sub-database is smaller than the original, the time spent on accessing data is shortened by some order of magnitude. Random access is normally achieved by a so-called hash function,

hash : Data -> Location

which computes, for each data item, its location in the hash table. Standard terminology regards as synonyms all data competing for the same location. A set of synonyms is called a bucket. There are several ways in which data collision is handled, e.g. linear probing or overflow handling. Below we consider the latter.

  1. Write and validate a VDM-SL model HTable for hash-tables (with overflow handling) in which you should model (a) locations as integers; (b) collision buckets as finite sets of (data) synonyms; (c) hash tables as mappings from locations to collision buckets.
  2. Assuming some predefined hash function hash , add to HTable an invariant stating that all data in the same collision bucket share the same hash-index. Can you infer from this invariant that buckets are mutually disjoint?
  3. Specify the following functionality on top of HTable so that your invariant (above) is preserved:
    Init : () -> HTable
    
    Insert : Data * HTable -> HTable
    
    Find   : Data * HTable -> bool
    
    Remove : Data * HTable -> HTable
    
  4. Specify a function rep able to represent any set of Data as a HTable.
  5. Further specify a left inverse abs of rep, i.e. a function ablo to retrieve the collection of data which is stored into a hash-table.

(Further readings: Hash Tables -- A Case Study in Sets-calculation. Technical Report DI/INESC 94-12-1, INESC Group 2361, Braga, December 1994.


Back to MFP-I's main URL.

J. N. Oliveira
2002-01-19