期刊文献+

基于标签的微博关键词抽取排序方法 预览

Keyword extraction method for microblog based on hashtag
在线阅读 下载PDF
收藏 分享 导出
摘要 针对微博关键词抽取准确率不高的问题,提出一种基于标签优先的抽取排序方法。该方法利用微博本身具有的社交特征——标签,从微博内容集中抽取关键词。该方法首先根据微博自身建立初始词与微博之间的加权图,再将基于标签的随机游走方法应用于图中,随机游走反复跳跃到标签词节点上,经过一系列迭代得出每个词的平稳概率,并通过概率决定词的最终排序。该抽取方法根据真实的新浪微博内容进行测验,结果显示,与通过词与词的加权图来抽取关键词相比,基于标签的微博关键词抽取方法准确率提高了50%,在实际应用中能够有效提高关键词抽取的正确率。 A hashtag based method was proposed to solve the problem how to accurately extract keywords from microblog.Hashtag, the social feature of a microblog was used to extract keywords from microblog content. A word-post weighted graph was built firstly, then a random walker was used on the graph by jumping to any hashtag node repeatedly. At last, every word rank was determined by its probability which would not change after walker iteration. The experiments were conducted on real microblogs from Sina platform. The results show that, compared to word-word graph method, the proposed hashtag-based approach gets higher accuracy of keyword extraction by 50%.
作者 叶菁菁 李琳 钟珞 YE Jingjing,LI Lin,ZHONG Luo(School of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China)
出处 《计算机应用》 CSCD 北大核心 2016年第2期563-567,585共6页 journal of Computer Applications
基金 国家社会科学基金资助项目(15BGL048) 国家863计划项目(2015BAA072)
关键词 抽取 微博 标签 随机游走 加权策略 keyword extraction microblog hashtag random walk weighting strategy
作者简介 叶菁菁(1992-),女,江苏盐城人,硕士研究生,主要研究方向:自然语言处理、大数据分析; 通信作者电子邮箱cathylilin@whut.edu.cn.李琳(1977-),女,湖南衡阳人,副教授,博士,CCF会员,主要研究方向:社会计算、信息检索及推荐系统; 钟珞(1957-),男,湖北武汉人,教授,博士,CCF会员,主要研究方向:智能方法、软件工程。
  • 相关文献

参考文献25

  • 1KWAK H, LEE C, PARK H, et al. What is Twitter, a social network or a news media?[C]//WWW '10: Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010: 591-600. 被引量:1
  • 2WU W, ZHANG B, OSTENDORF M. Automatic generation of personalized annotation tags for Twitter users[C]//HLT '10: Human Language Technologies: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2010: 689-692. 被引量:1
  • 3ZHAO W X, JIANG J, HE J, et al. Topical keyphrase extraction from Twitter[C]//HLT '11: Human Language Technologies: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2011: 379-388. 被引量:1
  • 4WANG W, XU H, YANG W, et al. Constrained-hLDA for topic discovery in Chinese microblogs[C]//PAKDD '14: Proceedings of the 18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNCS 8444. Berlin: Springer, 2014: 608-619. 被引量:1
  • 5HU X, TANG J L, LIU H. Leveraging knowledge across media for spammer detection in microblogging[C]//SIGIR '14: Proceedings of the 37th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 547-556. 被引量:1
  • 6VOSECKY J, LEUNG K W-T, NG W. Collaborative personalized Twitter search with topic-language models[C]//SIGIR '14: Proceedings of the 37th Annual ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 53-62. 被引量:1
  • 7LIU Z Y, CHEN X X, SUN M S. Mining the interests of Chinese microbloggers via keyword extraction[J]. Frontiers of Computer Science, 2012, 6(1): 76-87. 被引量:1
  • 8HAVELIWALA T H. Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4): 784-796. 被引量:1
  • 9GRINEVA M, GRINEV M, LIZORKIN D. Extracting key terms from noisy and multitheme documents[C]//WWW '09: Proceedings of the 18th International Conference on World Wide Web. New York: ACM, 2009: 661-670. 被引量:1
  • 10MIHALCEA R, TARAU P. TextRank: bringing order into texts[C]//EMNLP '04: Proceedings of the 2004 Conference on Empirical Methods on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2004: 404-411. 被引量:1

二级参考文献24

  • 1李素建 ,王厚峰 ,俞士汶 ,辛乘胜 .关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:80
  • 2Yih W, Goodman J, Carvalho V R. Finding advertising keywords on Web pages [C]//Proc of WWW'06. New York: ACM, 2006:213-222. 被引量:1
  • 3Kelleher D, I.uz S. Automatic hypertext kcyphrase detection [C] //Proc of IJCAI-05. San Francisco: Morgan Kaufmann, 2005:1608-1609. 被引量:1
  • 4Turney P D. Coherent keyphrase extraction via web mining [C]//Proc of IJCAI 03. San Francisco: Morgan Kaufmann, 2003:434-439. 被引量:1
  • 5Hulth A. Improved automatic keyword extraction given more linguistic knowledge[C] //Proc of EMNLP'03. Stroudsburg: ACL, 2003:216-223. 被引量:1
  • 6A1 Khalifa H S, Davis H C. Folksonomies versus automatic keyword extraction: An empirical study [C]//Proc of IAD1S Web Applications and Research 2006. Southampton: ECS, 2006: 132-143. 被引量:1
  • 7Mihaleea R, Tarau P. TextRank.- Bringing order into texts [C] //ProeofEMNLP'04. Stroudsburg: ACL, 2004:404 - 411. 被引量:1
  • 8Wan Xiaojun, Yang Jianwu, Xiao Jianguo. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction[C] //Proe of ACL'07. Stroudsburg: ACL, 2007: 552-559. 被引量:1
  • 9Turney P D. Learning algorithms for keyphrase extraction [J]. Information Retrieval, 2000, 2(4): 303-336. 被引量:1
  • 10Frank E, Paynter G W, Witten I H, et al. Domain specific keyphrase extraction [C] //Proc of IJCAI-99. San Francisco: Morgan Kaufmann, 1999:668-673. 被引量:1

共引文献37

投稿分析

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部 意见反馈
新型冠状病毒肺炎防控与诊疗专栏