您的位置: 专家智库 > >

国家自然科学基金(s60573187)

作品数:1 被引量:2H指数:1
发文基金:国家自然科学基金更多>>
相关领域:文化科学更多>>

文献类型

  • 1篇中文期刊文章

领域

  • 1篇文化科学

主题

  • 1篇SELECT...
  • 1篇TERM

传媒

  • 1篇Tsingh...

年份

  • 1篇2009
1 条 记 录,以下是 1-1
排序方式:
Non-Independent Term Selection for Chinese Text Categorization被引量:2
2009年
Chinese text categorization differs from English text categorization due to its much larger term set (of words or character n-grams), which results in very slow training and working of modern high-performance classifiers. This study assumes that this high-dimensionality problem is related to the redundancy in the term set, which cannot be solved by traditional term selection methods. A greedy algorithm framework named "non-independent term selection" is presented, which reduces the redundancy according to string-level correlations. Several preliminary implementations of this idea are demonstrated. Experiment results show that a good tradeoff can be reached between the performance and the size of the term set.
李景阳孙茂松
共1页<1>
聚类工具0