您的位置: 专家智库 > >

国家教育部博士点基金(20120121120046)

作品数:4 被引量:23H指数:2
相关作者:张开旭周昌乐陈毅东史晓东苏劲松更多>>
相关机构:厦门大学更多>>
发文基金:国家教育部博士点基金国家自然科学基金福建省自然科学基金更多>>
相关领域:自动化与计算机技术理学更多>>

文献类型

  • 4篇中文期刊文章

领域

  • 4篇自动化与计算...
  • 1篇理学

主题

  • 1篇短语
  • 1篇中文
  • 1篇中文分词
  • 1篇最大熵
  • 1篇无监督学习
  • 1篇分词
  • 1篇TOPIC
  • 1篇TOPICA...
  • 1篇APPROA...
  • 1篇ED
  • 1篇LANGUA...
  • 1篇词汇
  • 1篇词汇特征
  • 1篇词性
  • 1篇词性标注
  • 1篇PIVOT
  • 1篇GRAPH
  • 1篇REORDE...

机构

  • 2篇厦门大学

作者

  • 1篇苏劲松
  • 1篇史晓东
  • 1篇周昌乐
  • 1篇张开旭
  • 1篇陈毅东

传媒

  • 2篇中文信息学报
  • 1篇China ...
  • 1篇Journa...

年份

  • 3篇2014
  • 1篇2013
4 条 记 录,以下是 1-4
排序方式:
Graph-based Lexicalized Reordering Models for Statistical Machine Translation
2014年
Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method.
SU JinsongLIU YangLIU QunDONG Huailin
引入集成学习的最大熵短语调序模型被引量:3
2014年
基于最大熵的括号转录语法模型具有翻译能力强、模型训练简单的优点,成为近些年统计机器翻译研究的热点。然而,该模型存在短语调序实例样本分布不平衡的缺点。针对该问题,该文提出了一种引入集成学习的短语调序模型训练方法。在大规模数据集上的实验结果表明,我们的方法能有效改善调序模型的训练效果,显著提高翻译系统性能。
何钟豪苏劲松史晓东陈毅东黄研洲
关键词:最大熵
基于自动编码器的中文词汇特征无监督学习被引量:20
2013年
大规模未标注语料中蕴含了丰富的词汇信息,有助于提高中文分词词性标注模型效果。该文从未标注语料中抽取词汇的分布信息,表示为高维向量,进一步使用自动编码器神经网络,无监督地学习对高维向量的编码算法,最终得到可直接用于分词词性标注模型的低维特征表示。在宾州中文树库5.0数据集上的实验表明,所得到的词汇特征对分词词性标注模型效果有较大帮助,在词性标注上优于主成分分析与k均值聚类结合的无监督特征学习方法。
张开旭周昌乐
关键词:中文分词词性标注
Topic-aware pivot language approach for statistical machine translation
2014年
The pivot language approach for statistical machine translation(SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivotside context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.
Jin-song SUXiao-dong SHIYan-zhou HUANGYang LIUQing-qiang WUYi-dong CHENHuai-lin DONG
共1页<1>
聚类工具0