Carendf 2.0

The CAREN System is an Apriori Java based Implementation of an Association rules Generator.
A new depth-first implementation is now available.
The main purpose was to develop an association rules generator target for classification.

This program can deal with two different dataset formats. Attribute/Value and basket format.
In attribute/value format, the first line of the dataset should contain the names of the attributes.
One should use the switch (-Att) or (-Bas) to declare attribute/value mode or basket mode. Default mode is -Bas.
In -bas mode the dataset should be of the form:
	TransactionID Items
Example:
	001            1
        001            2
        003            1
or
        001            1 2
        003            1


Defining the right separator character is vital for obtaining meaningful results. The option is -s
and should be used as ex: java caren dataset 0.1 0.5 -s, -d.  Notice no space between switch and character!

In -Att mode one can make use of three different discretization methods. Binary, Class Intervals  and Srikant (see README for details).
It can also declare that the dataset contains null values (using switch -null).

One can generate a list of options using the command prompt: 

>java carendf -help.

or 

>java carenclass -help.




The two modules suit different applications. $carendf$ is the more general
association rules generator. However, $carenclass$ is a faster version but it
requires that a consequente for the rules must be defined. It also contains
more features (see README2 for details).




Two new metrics can be used to filter rules. Conviction and Lift. (see -help version for details).

Several items filtering options exist to filter rules. Two for consequent filtering and three for antecedent filtering.
In -Att mode, it is possible to filter consequents by item and by attribute occurrence.

A new option for defining maximal and minimal size of rules exists (-RS and -rs). 

There are four different output formats. Standard (text file), CSV format, Prolog file format and PMML (XML decision model) format.
In CSV format all filtering metrics (confidence, conviction lift and chi^2) are written along the rules.

Use the -max switch to expand the expected transaction size. Maximal transaction size is equal to number of frequente items.
The default transaction size is 500 items.

A prediction module is now available. It makes use of prediction models generated from association rules.
A model can be generated using the $carendf$ or $carenclass$ program (option -p). Each model can be used by the command $predict$.
Do not forget to specify the class attribute (class item) using the switch -H (or -h) and (-class) when using $carendf$ or $carenclass$. 
Only rules with this consequent must be generated (and will be considered by the prediction model).
This module implements three different classification methods: BestRule (decision list), Voting and Class Distribution.
These methods can also make use of the different metrics available in the caren system: confidence, lift and conviction.
It can deal with all kinds of prediction models generated by Carendf and Carenclass (using any discretization method, null values, etc).
Use the 

>java predict -help.

command to list available options.

Notice that $carenclass$ is more efficient association rules generator that was developed for classification purposes.
It includes more features (for instance subgroups rules generation) and should be the one to use for prediction models generation.
The main difference to $carendf$ is that it always requires a specified consequente.


A new module $convert$ for discretizing numeric attributes is available. Check 

> java convert -help



Notice: the implementation is for the 1.4 (or higher) Java package. This version was compiled using JSDK1.4.2


Write questions, comments or send email to Paulo Azevedo (pja@di.uminho.pt)


