
Type of Document Dissertation Author Jiang, Fuhua URN etd-07272006-122000 Title SVM-Based Negative Data Mining to Binary Classification Degree Ph.D. Department Computer Science Advisory Committee
Advisor Name Title A.P. Preethy Committee Chair Yan-Qing Zhang Committee Member Yi Pan Committee Member Yichuan Zhao Committee Member Keywords
- Data partition
- Data classification
- Vector similarity
- Multiple passes learning
- Machine learning
- Bagging
- Boosting
- Support vector machines
- Data preparation
Date of Defense 2006-07-14 Availability unrestricted Abstract The properties of training data set such as size, distribution and the number of attributes significantly contribute to the generalization error of a learning machine. A not well-distributed data set is prone to lead to a partial overfitting model. Two approaches proposed in this dissertation for the binary classification enhance useful data information by mining negative data. First, an error driven compensating hypothesis approach is based on Support Vector Machines (SVMs) with (1+k)-iteration learning, where the base learning hypothesis is iteratively compensated k times. This approach produces a new hypothesis on the new data set in which each label is a transformation of the label from the negative data set, further producing the positive and negative child data subsets in subsequent iterations. This procedure refines the base hypothesis by the k child hypotheses created in k iterations. A prediction method is also proposed to trace the relationship between negative subsets and testing data set by a vector similarity technique. Second, a statistical negative example learning approach based on theoretical analysis improves the performance of the base learning algorithm learner by creating one or two additional hypotheses audit and booster to mine the negative examples output from the learner. The learner employs a regular Support Vector Machine to classify main examples and recognize which examples are negative. The audit works on the negative training data created by learner to predict whether an instance is negative. However, the boosting learning booster is applied when audit does not have enough accuracy to judge learner correctly. Booster works on training data subsets with which learner and audit do not agree. The classifier for testing is the combination of learner, audit and booster. The classifier for testing a specific instance returns the learnerĄ¯s result if audit acknowledges learnerĄ¯s result or learner agrees with auditĄ¯s judgment, otherwise returns the boosterĄ¯s result. The error of the classifier is decreased to O(e^2) comparing to the error O(e) of a base learning algorithm.Files
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access jiang_fuhua_200608_phd.pdf 819.95 Kb 00:03:47 00:01:57 00:01:42 00:00:51 00:00:04