This book describes theoretical and experimental
studies of instance selection to improve data mining
model. Data preparation is one of the most important
and time consuming phases in knowledge discovery.
Preparation tasks often determine the success of data
mining engagements. The importance of instance
selection is the primary focus because the size of
current and future databases often exceeds the amount
of data which current data mining algorithms can
handle properly. Instance selection thus can be used
to improve scalability of data mining algorithms as
well as improve the quality of the data mining results.
This book presents a new optimization-based approach
for instance selection that uses a genetic algorithm
to select a subset of instances to produce a simpler
decision tree model with acceptable accuracy. The
resultant trees are easier to comprehend and
interpret by the decision maker and hence more useful
in practice. Numerical results are obtained for
several difficult test data sets indicating that
GA-based instance selection can often reduce the size
of the decision tree by an order of magnitude while
still maintaining good prediction accuracy.
studies of instance selection to improve data mining
model. Data preparation is one of the most important
and time consuming phases in knowledge discovery.
Preparation tasks often determine the success of data
mining engagements. The importance of instance
selection is the primary focus because the size of
current and future databases often exceeds the amount
of data which current data mining algorithms can
handle properly. Instance selection thus can be used
to improve scalability of data mining algorithms as
well as improve the quality of the data mining results.
This book presents a new optimization-based approach
for instance selection that uses a genetic algorithm
to select a subset of instances to produce a simpler
decision tree model with acceptable accuracy. The
resultant trees are easier to comprehend and
interpret by the decision maker and hence more useful
in practice. Numerical results are obtained for
several difficult test data sets indicating that
GA-based instance selection can often reduce the size
of the decision tree by an order of magnitude while
still maintaining good prediction accuracy.