Electronic Theses and Dissertation Database
Library Home  |  ` Library Catalog  |  ETD Home  |  Browse ETDs  |  Search ETDs  |  ETD Resources

Title page for ETD etd-07242006-200443


Type of Document Dissertation
Author Zhong, Wei
Author's Email Address jetzhong@yahoo.com
URN etd-07242006-200443
Title Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction
Degree Ph.D.
Department Computer Science
Advisory Committee
Advisor Name Title
Yi Pan Committee Chair
Martin Fraser Committee Member
Phang C. Tai Committee Member
Robert Harrison Committee Member
Keywords
  • granular computing
  • SVM (Support Vector Machine)
  • K-means clustering algorithm
  • sequence motif
  • protein structure prediction
Date of Defense 2006-05-23
Availability unrestricted
Abstract
Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related

proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction.

In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity.

Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results.

In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  zhong_wei_200608_phd.pdf 668.33 Kb 00:03:05 00:01:35 00:01:23 00:00:41 00:00:03

Browse All Available ETDs by ( Author | Department )

Click here to send a comment to ETD Support