Ontology construction is the core task of ontology-based knowledge representation. This paper explores a semantic description approach based on primitive structure, which benefits ontological relation description in a more precise and concrete way. In view of primitive structure, this paper introduces an approach to extract primitive structures of words based on a multi-label learning model, correlated label propagation. Also, this paper proposes an approach to recognize clustering nucleuses in word clusters heuristically. By this approach, more precise ontological relations are able to be discovered automatically.
Corpus is a kind of important resource for knowledge acquisition in the natural language processing (NLP). However, up to now, in the biomedical domain comparatively fewer corpus focus on semantic association among all tokens in a sentence. We proposed an annotation scheme based on feature structure theory for enriching biomedical domain corpora with token semantic association (TSA). There are 227 documents of the BioNLP GE ST training data annotated to form TSA corpus in which each annotated item shows a token semantic association that appears as a triple. The annotation of token semantic association has the potential to significantly advance biomedical text mining by providing rich token semantic information for NLP systems especially for the sophisticated IE systems, such as bio-event extraction.