公共文化服务平台

基于依存树库的文本聚类研究被引量：4: 2011年; 文本聚类是信息检索的重要内容。为了避免使用计算过程复杂的聚类算法,并能从语言学角度对聚类特征和聚类结果进行分析和解释,该文提出了采用句法分布信息进行文本聚类的方法。在汉语依存树库中,得出10种具有显著差异的词类依存关系,以其中5种依存关系作为聚类特征,访谈会话类和新闻播报类文本的相似度分别为71.98%和83.13%。实验结果验证了该方法利用依存关系对文本聚类的可行性和有效性。; 高松冯志伟; 关键词：文本聚类聚类特征词类

汉语语体的计量特征在文本聚类中的应用被引量：35: 2009年; 提出了将语言计量研究成果应用于文本聚类研究的方法。通过两个50万词的语料样本发现了在现代汉语口语体和书面语体中具有显著分布差异的16个语言结构特征;以其中7个作为文本表示特征准确地将实验文本聚类为口语体(相似度89.84%)和书面语体(相似度86.93%)两类。以语言结构的计量特征表示文本的方法加强了聚类/分类研究的可解释性,具有较高的理论和应用价值。以语料库和统计方法进行语体特征计量研究是汉语语体描写研究的重要方法,阐述了其理论基础。; 黄伟刘海涛; 关键词：文本聚类语体特征语言结构汉语口语汉语书面语

Statistical properties of Chinese semantic networks被引量：16: 2009年; Almost all language networks in word and syntactic levels are small-world and scale-free. This raises the questions of whether a language network in deeper semantic or cognitive level also has the similar properties. To answer the question, we built up a Chinese semantic network based on a treebank with semantic role (argument structure) annotation and investigated its global statistical properties. The results show that although semantic network is also small-world and scale-free, it is different from syntactic network in hierarchical structure and K-Nearest-Neighbor correlation.; LIU HaiTao; 关键词：语义网络统计性质统计特性小世界标度

Language clustering with word co-occurrence networks based on parallel texts被引量：6: 2013年; This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification.14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages,respectively.With appropriate combinations of major parameters of these networks,cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches.Moreover,the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches.The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network-based language classification.; LIU HaiTaoCONG Jin; 关键词：网络构建文本聚类复杂网络

汉语句法网络的中心节点研究被引量：19: 2011年; 以两种语体的汉语依存句法树库为基础,根据词频及分布率统计结果,选取3个汉语虚词作为研究对象.对提取的3个虚词节点进行了节点度数、点出度、点入度、接近性、内接近性、外接近性、中间度等网络特征的统计,并将这3个节点从网络中移除,对比分析网络前后的节点数、平均度、平均路径长度、网络直径、孤立节点数、最大范围、密度等网络特征的变化.结果表明,3个虚词均是网络的中心节点,但地位各有不同,它们对网络整体结构的影响也有较大区别.本研究不仅为汉语虚词的研究提供了新方法,也为复杂网络中的节点特性研究提供了新的思路.; 陈芯莹刘海涛; 关键词：复杂网络中心节点语言网络虚词

计量语言学的现状、理论与方法被引量：52: 2012年; 计量语言学以真实语言交际活动中呈现的各种语言现象、语言结构、结构属性以及它们之间的相互关系作为研究对象,通过概率论、随机过程、微分与微分方程、函数论等数学的定量方法对其进行精确的测量、观察、模拟、建模和解释,寻找语言现象背后的数理规律,揭示各种语言现象形成的内在原因,探索语言系统的自适应机制和语言演化的动因。对计量语言学现状、理论与方法进行分析,厘清该学科进一步发展的走向,旨在推动中国语言学的国际化与语言学研究的科学化水平。; 刘海涛黄伟; 关键词：汉语 ZIPF 数理语言学

基于依存树库的现代汉语名词语法功能的计量研究被引量：5: 2010年; 现代汉语自动句法分析需要词类句法功能的量化信息。本文基于概率配价模式理论,利用汉语依存树库,对现代汉语名词的句法功能进行了计量研究。文章把名词各句法功能,按其出现频率的高低,区分出典型功能和非典型功能,给出了名词句法功能的关联标记模式和概率配价模式,从而从定量分析的角度,对前人的研究结论进行了验证和补充,有助于更清晰地认识汉语名词的句法功能,并对对外汉语语法教学提供参考。; 高松; 关键词：名词句法功能关联标记模式对外汉语教学

依存结构树的计数被引量：2: 2009年; 树是一种很重要的数据结构,依存结构树是一种特定的树,在语言信息处理领域应用广泛。研究了依存结构树的计数问题。首先给出了依存结构树的形式描述,给出了其5条性质;然后利用n个有序元素的分隔方案和分隔序列给出了依存森林和依存结构树的计数公式;最后给出了8个词语以内的依存结构树的计数结果。; 胡凤国黄伟刘海涛

Language clusters based on linguistic complex networks被引量：5: 2010年; To investigate the feasibility of using complex networks in the study of linguistic typology,this paper builds and explores 15 linguistic complex networks based on the dependency syntactic treebanks of 15 languages. The results show that it is possible to classify human languages by means of the following main parameters of complex networks:(a) average degree of the node,(b) cluster coefficients,(c) average path length,(d) network centralization,(e) diameter,(f) power exponent of degree distribution,and (g) the determination coefficient of power law distributions. The precision of this method is similar to the results achieved by means of modern word order typology. This paper tries to solve two problems of current linguistic typology. First,the language sample of a typological study is not real text; second,typological studies pay too much attention to local language structures in the course of choosing typological parameters. This study performs better in global typological features of language and not only enhances typological methods,but it is also valuable for developing the applications of complex networks in the humanities,social,and life sciences.; LIU HaiTaoLI WenWen; 关键词：人类语言类型学

基于语料库的汉语动词句法配价历时研究被引量：5: 2011年; 本文旨在从历时的角度研究动词句法配价的演变历程,构建了古文言、古白话、现代白话三种形式的语料库,选取了十个主要动词作为研究对象,给出了例句选取和分析的方法,对这些动词的补足语和说明语进行了统计与分析。结果表明,自然情况下,汉语的语法是采用渐变的方式演化的,汉语经过古文言、古白话和现代白话三种形式,句法结构出现复杂化的趋势。还发现古文言和古白话在句子结构上更为相近,汉语从古白话转化到现代白话后,句子的复杂程度明显提高。; 刘丙丽刘海涛; 关键词：动词配价

渝B2-20050021-1　渝公网安备 50019002500403号　违法和不良信息举报中心　互联网出版许可证　新出网证(渝)字10号

国家社会科学基金(09BYY024)

文献类型

领域

主题

机构

作者

传媒

年份

用户反馈

国家社会科学基金(09BYY024)

文献类型

领域

主题

机构

作者

传媒

年份

用户登录

用户反馈