With the traditional Chinese medicine herbs angelicae dahuricae radix (ADR or Baizhi) and salviae miltiorrhizae radix (SMR or Danshen) as two examples, this work studies the automatic discrimination of the geographic origins of the herbs using near infrared (NIR) reflectance spectroscopy. Multi-class support vector machine (SVM) is utilized for the purpose, and recursive SVM is utilized to select the feature spectral segments that are decisive for the discrimination. With only 5 and 8 short spectral segments, discriminative accuracies of 92% are achieved on independ- ent test sample sets. This work not only provides a prototype of accurate rapid discriminating systems for quality control of herbal medicines, but also opens new possibilities in studying subtle differences in the chemical compositions of herbs from different cultivation conditions and investigating their associations with the effectiveness of the herbs.
G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR sequences have been collected, the ligand specificity of many GPCRs is still unknown and only one crystal structure of the rhodopsin-like family has been solved. Therefore, identifying GPCR types only from sequence data has become an important research issue. In this study, a novel technique for identifying GPCR types based on the weighted Levenshtein distance between two receptor sequences and the nearest neighbor method (NNM) is introduced, which can deal with receptor sequences with different lengths directly. In our experiments for classifying four classes (acetylcholine, adrenoceptor, dopamine, and serotonin) of the rhodopsin-like family of GPCRs, the error rates from the leave-one-out procedure and the leave-half-out procedure were 0.62% and 1.24%, respectively. These results are prior to those of the covariant discriminant algorithm, the support vector machine method, and the NNM with Euclidean distance.