The three-dimensional (3D) structure prediction of proteins :is an important task in bioinformatics. Finding energy functions that can better represent residue-residue and residue-solvent interactions is a crucial way to improve the prediction accu- racy. The widely used contact energy functions mostly only consider the contact frequency between different types of residues; however, we find that the contact frequency also relates to the residue hydrophobic environment. Accordingly, we present an improved contact energy function to integrate the two factors, which can reflect the influence of hydrophobic interaction on the stabilization of protein 3D structure more effectively. Furthermore, a fold recognition (threading) approach based on this energy function is developed. The testing results obtained with 20 randomly selected proteins demonstrate that, compared with common contact energy functions, the proposed energy function can improve the accuracy of the fold template prediction from 20% to 50%, and can also improve the accuracy of the sequence-template alignment from 35% to 65%.
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.
G-protein coupled receptors (GPCRs) represent one of the most important classes of drug targets for pharmaceutical industry and play important roles in cellular signal transduction. Predicting the coupling specificity of GPCRs to G-proteins is vital for further understanding the mechanism of signal transduction and the function of the receptors within a cell, which can provide new clues for pharmaceutical research and development. In this study, the features of amino acid compositions and physiochemical properties of the full-length GPCR sequences have been analyzed and extracted. Based on these features, classifiers have been developed to predict the coupling specificity of GPCRs to G-protelns using support vector machines. The testing results show that this method could obtain better prediction accuracy.
Many researchers have used microarray gene expression data to investigate gene regulatory networks in specific life stages. In these analyses,Bayesian network was widely applied to regulatory network building from expression profiles because of its solid mathematical foundation and its robust analysis ability in noisy data. However,the building of Bayesian network is time consuming and the searching space is really large. Considering the biological feature of transcription factors (TFs) and targets (TGs),the regulatory network is possible to be separated into core TFs networks and the interactions from TFs to TGs. We developed an R package named ModuleNet which used Bayesian network model to the inner TFs network building and genetic algorithm on TF-TG interactions prediction. With determined number of transcription factors,the searching space and time requirements of ModuleNet is linear increasing according to the number of targets. After application to yeast cell-cycle expression profile,the results demonstrated the prediction accuracy of ModuleNet. Furthermore,significantly enriched Gene Ontology (GO) terms with similar expression behaviors were detected automatically by ModuleNet from expression profile,and the relationships from TFs to GO terms were figured out. The source code is available by asking for the author.