To resolve the ontology understanding problem, the structural features and the potential important terms of a large-scale ontology are investigated from the perspective of complex networks analysis. Through the empirical studies of the gene ontology with various perspectives, this paper shows that the whole gene ontology displays the same topological features as complex networks including "small world" and "scale-free",while some sub-ontologies have the "scale-free" property but no "small world" effect.The potential important terms in an ontology are discovered by some famous complex network centralization methods.An evaluation method based on information retrieval in MEDLINE is designed to measure the effectiveness of the discovered important terms.According to the relevant literature of the gene ontology terms,the suitability of these centralization methods for ontology important concepts discovering is quantitatively evaluated.The experimental results indicate that the betweenness centrality is the most appropriate method among all the evaluated centralization measures.
全词消歧(All-Words Word Sense Disambiguation)可以看作一个序列标注问题,该文提出了两种基于序列标注的全词消歧方法,它们分别基于隐马尔可夫模型(Hidden Markov Model,HMM)和最大熵马尔可夫模型(Maximum Entropy Markov Model,MEMM)。首先,我们用HMM对全词消歧进行建模。然后,针对HMM只能利用词形观察值的缺点,我们将上述HMM模型推广为MEMM模型,将大量上下文特征集成到模型中。对于全词消歧这类超大状态问题,在HMM和MEMM模型中均存在数据稀疏和时间复杂度过高的问题,我们通过柱状搜索Viterbi算法和平滑策略来解决。最后,我们在Senseval-2和Senseval-3的数据集上进行了评测,该文提出的MEMM方法的F1值为0.654,超过了该评测上所有的基于序列标注的方法。