- Motivation
- Approach
- Results
- Future work
05-10-2014
A model-based classifier is an abstraction of data used to make predictions
Better classifiers are beneficial
Where should overlapping classes be separated?
Where should overlapping classes be separated?
Where should overlapping classes be separated?
Where should overlapping classes be separated?
Should these outliers be accommodated?
Should these outliers be accommodated?
Does capturing the minority class sacrifice accuracy?
Does capturing the minority class sacrifice accuracy?
We believe that selecting which instances to learn from can improve the accuracy of a classifier.
This is called
instance selection
!
\({\mathbf {Max} \ \ \ \ Classifier \ Accuracy \\ \mathbf {s.t} \\ \ \ \ \ \ \ \ \ \ \ \ \ \ x_i \in \{0,1\} \ \forall \ i \in I}\)
- This is a combinatorial optimization problem
- There are \(2^n\) possible solutions
- There is no closed form for the objective function
A VAST majority rely on evolutionary algorithms to find a solution
Other optimization problems look similar to instance selection if the problem is reformulated . This allows us to take advantage of optimization theory .
Number of test instances misclassified:
Damp Grey Soil | Total | |
---|---|---|
Original Training Data | 9 | 51 |
With Instance Selection | 14 | 28 |
The ability to classify the "Damp Grey Soil" is likely sacrificed in an effort to make it easier to separate the remaining classes
A Population-based Assessment of Perioperative Mortality After Nephroureterectomy for Upper-tract Urothelial Carcinoma
(I'll be calling this NU for UTUC!!)
Data: SEER database
Attributes: age, gender, histopathology, extraglandular
                    involvement, tumor grade, tumor size, and
                    mortality
Patients: 2,328 (9% mortality)
Classification task: predict mortality
Classifier: logistic regression
Walter Bennette
315-330-4957
walter.bennette.1@us.af.mil
C. Reeves, S. Taylor, Selection of training sets for neural networks by a genetic algorithm, Parallel Problem Solving from Nature- PSSN V, (1998) 633-642.
C. Reeves, D Bush, Using genetic algorithms for training data selection in RBF networks, in: Instance Selection and Construction for Data Mining, H. Liu and H. Motoda (Eds), Kluwer, Norwell, MA, (2001) pp.339–356.
T. Endou, Q. Zhao, Generation of Comprehensible Decision Trees Through Evolution of Training Data, in proceedings of the 2002 Congress on Evolutionary Computation, (2002) 1221-1225.
J. Cano, F. Herrera, M. Lozano, Using Evolutionary Algorithms as Instance Selection for Data Reduction in KDD: An Experimental Study, IEEE Transactions on Evolutionary Computation, 7(6) (2003) 561-575.
J. Cano, F. Herrera, M. Lozano, Evolutionary Stratified Training Set Selection for Extracting Classification Rules with Trade off Precision-Interpretability, Data & Knowledge Engineering, 60 (2006) 90-108.
N. Garcia-Pedrajas, Evolutionary computation for training set selection, WIREs Data Mining and Knowledge Discovery, 1 (2011) 512-523.
Kim K-J, Artificial neural networks with evolutionary instance selection for financial forecasting, Expert Syst Appl, 30 (2006) 519-526.
Wu, Shuing. Optimal instance selection for improved decision tree. (2007 Dissertation)
Walter Bennette, Instance selection for simplified decision trees through the generation and selection of instance candidate subsets. (2009 Master’s thesis)
Walter Bennette, Sigurdur Olafsson, Model based classifier improvement through the generation and selection of instance candidate subsets, Data and Knowledge Engineering (under revision).