Electronic Theses and Dissertation Database
Library Home  |  ` Library Catalog  |  ETD Home  |  Browse ETDs  |  Search ETDs  |  ETD Resources

Title page for ETD etd-07162008-152309


Type of Document Dissertation
Author Chen, Bernard
Author's Email Address bchen3@cs.gsu.edu
URN etd-07162008-152309
Title DISCOVERY AND EXTRACTION OF PROTEIN SEQUENCE MOTIF INFORMATION THAT TRANSCENDS PROTEIN FAMILY BOUNDARIES
Degree Ph.D.
Department Computer Science
Advisory Committee
Advisor Name Title
Dr. Yi Pan Committee Chair
Dr. Phang C. Tai Committee Member
Dr. Robert. W. Harrison Committee Member
Dr. Yanqing Zhang Committee Member
Keywords
  • Positional Association Rule
  • Super-Rule
  • protein sequence motif
  • FIK model
  • FGK model
  • Super GSVM-FE
  • HHK clustering algorithm
Date of Defense 2008-05-13
Availability unrestricted
Abstract
Protein sequence motifs are gathering more and more attention in the field of sequence analysis. The recurring patterns have the potential to determine the conformation, function and activities of the proteins. In our work, we obtained protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is essential. We use two granular computing models, Fuzzy Improved K-means (FIK) and Fuzzy Greedy K-means (FGK), in order to efficiently generate protein motif information. After that, we develop an efficient Super Granular SVM Feature Elimination model to further extract the motif information. During the motifs searching process, setting up a fixed window size in advance may simplify the computational complexity and increase the efficiency. However, due to the fixed size, our model may deliver a number of similar motifs simply shifted by some bases or including mismatches. We develop a new strategy named Positional Association Super-Rule to confront the problem of motifs generated from a fixed window size. It is a combination approach of the super-rule analysis and a novel Positional Association Rule algorithm. We use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified HHK clustering, which requires no parameter setup to identify the similarities and dissimilarities between the motifs. The positional association rule is created and applied to search similar motifs that are shifted some residues. By analyzing the motifs results generated by our approaches, we realize that these motifs are not only significant in sequence area, but also in secondary structure similarity and biochemical properties.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Bernard_Chen_dissertation.pdf 7.78 Mb 00:36:00 00:18:31 00:16:12 00:08:06 00:00:41

Browse All Available ETDs by ( Author | Department )

Click here to send a comment to ETD Support