Title: | Harvest the Classification Tree |
Version: | 1.1 |
Date: | 2015-07-30 |
Author: | Bingyuan Liu/Yan Yuan/Qian Shi |
Maintainer: | Bingyuan Liu <adler1016@gmail.com> |
Depends: | R (≥ 3.0.1) |
Imports: | rpart,stats |
Description: | Aimed at applying the Harvest classification tree algorithm, modified algorithm of classic classification tree.The harvested tree has advantage of deleting redundant rules in trees, leading to a simplify and more efficient tree model.It was firstly used in drug discovery field, but it also performs well in other kinds of data, especially when the region of a class is disconnected. This package also improves the basic harvest classification tree algorithm by extending the field of data of algorithm to both continuous and categorical variables. To learn more about the harvest classification tree algorithm, you can go to http://www.stat.ubc.ca/Research/TechReports/techreports/220.pdf for more information. |
License: | GPL-2 |
Packaged: | 2015-07-30 17:13:18 UTC; Bingyuan |
NeedsCompilation: | no |
Repository: | CRAN |
Date/Publication: | 2015-07-31 00:54:59 |
Harvest the classification tree
Description
Aimed at applying the Harvest classification tree algorithm, modified algorithm of classic classification tree.The harvested tree has advantage of deleting redundant rules in trees, leading to a simplify and more efficient tree model.It was firstly used in drug discovery field, but it also performs well in other kinds of data, especially when the region of a class is disconnected. This package also improves the basic harvest classification tree algorithm by extending the field of data of algorithm to both continuous and categorical variables.
To learn more about the harvest classification tree algorithm, you can go to http://www.stat.ubc.ca/Research/TechReports/techreports/220.pdf for more information.
Details
Package: | Harvest.Tree |
Type: | Package |
Version: | 1.1 |
Date: | 2015-07-30 |
License: | GPL-2 |
The main function of package called 'harvest', it can be used to analyze the data which is stored in a data frame, where first column stores the class of response data, and the second to last column stores explantory variables accordingly.The 'predict' funciton offers function to predict the unclassified data based on training model. The 'harfunc' function is the fundemental part of 'harvest', which can be used to analyze the data which has already been classified by rpart function(traditional classification tree). Please check the help file of these three functions for more information.
Author(s)
Bingyuan Liu \ Yan Yuan \ Qian Shi
Maintainer: Bingyuan Liu<adler1016@gmail.com>
Bound of rules
Description
This function takes in a ruleset and output the lower and upper bounds of each rule.
Usage
extrule(myrules, varname)
Arguments
myrules |
A 3 column matrix output of function "hughs.path.rpart" |
varname |
the names of x variables |
Value
A p*2 matrix, p is the length of varname. The first column is the lower bound, the second column is the upper bound. The default lower bound is "-Inf",the default upper bound is "Inf". row corresponse to x variables ordered in the data matrix given to rpart.
A harvested classification tree
Description
Basic function to apply the harvest algorithm to the training data set, computing whether we can harvest any nodes based on the classic classification tree algorithm.
Usage
harfunc(rpart.object, data, varname, sig = 0.95)
Arguments
rpart.object |
classification result of training data from traditional classification tree(rpart function). |
data |
original training data where 'y' stores classmembership |
varname |
the name of each explaanatory variables |
sig |
significance level (default 0.95) |
Value
the list of orginial result of classification, likelihood improvment and harvested classification result.
A harvested classification tree
Description
The main function of the package, aiming at develop the harvest classification tree. Training data input and
Usage
harvest(training, num.var, numeric.info, sig = 0.95)
Arguments
training |
original data where 'y' stores classmembership 0 and 1,in the first column, with explanatory variable stores in the second to the last column. |
num.var |
number of explanatory variables |
numeric.info |
the vector stores the number of which variable is continuous |
sig |
significance level (default 0.95) |
Details
The function will return the harvested tree model. Missing values are allowed, and they will be treated accordingly. To use the trained tree model to predict, you can use predict
function in this package.
Value
An object of class "harvest", which is the result of algorithm with the following elements for each nodes(nodes are ordered in sequence of harvesting):
rule
constriants of the node
total
total number of data points in the node
'1'
the number of data points belonging to class 1 in the node
'logchange'
the improvement of log likelihood of deleting the redundent rules by the algorithm for the node
Examples
data(training)
harvest(training,4,3)
Predictions from a harvested tree
Description
The function predict
computes the prediction of membership from a new data set classified by harvested classification model of training data.
Usage
predict(harfunc.object, data, num.var)
Arguments
harfunc.object |
the output of harfunc function. |
data |
test data |
num.var |
number of explaining varibles |
Details
To run the predict
function, a trained harvested classification tree formed by harvest
function is required.
Value
pred.mat
is a data frame stored the information of result of prediction with the following columns:
belong
the node that data point belongs to
possibility
the probability of point being in class 1
predict
the simple perdict based on whether probability is larger than 0.5.
Ranking of nodes
Description
Rank harvested node by lower p value
Usage
rank.nodes(harfunc.object)
Arguments
harfunc.object |
an object of class "harfunc" |
Value
the ranked harvest nodes
A logical matrix for a terminal node
Description
Return a logical matrix of the rule sets which define a terminal node
Usage
rulesets(noden, newsim, varn, nodenumb)
Arguments
noden |
a terminal node defined by a set of rules, from function "treemat" |
newsim |
data to be harvested |
varn |
x variable names |
nodenumb |
all the labels of terminal nodes |
Value
A nxnn logical matrix, n=number of data points to be harvested, nn=number of rules defining a terminal node. Each column of the matrix corresponding to a node that is defined by one variable/rule, its name corresponds to that variable. Note the original terminal node is just the intersection of these nodes.
training
Description
A simulated data set of symptoms of breast cancer patients
Usage
data(training)
Format
A data frame with 300 observations on the following 5 variables.
y
a numeric vector
x1
a factor with levels
21
22
23
24
25
26
27
28
29
x2
a factor with levels
39-
40-49
50-69
70-74
75+
x3
a numeric vector
x4
a factor with levels
2004
2005
2006
2007
2008
2009
2010
Source
simulated data for breast cancer diagnosis