期刊文献+

基于分布式文件系统的MPP数据库扫描调度研究 预览

Research on scan scheduling in MPP databases on distributed file systems
在线阅读 下载PDF
分享 导出
摘要 基于分布式文件系统的MPP(大规模并行处理)数据库是目前的研究热点,为改善其执行查询扫描操作前调度执行单元读取数据块的过程,提出一种基于节点负载的调度策略NLS。这种策略同时结合数据本地性和节点负载,通过本地读分配保证调度结果满足良好的数据本地性,基于节点的实时工作负载对中间调度结果进行重分配调整,以达到减少数据扫描操作完成时间的目标。实验结果表明,相比连续性调度策略FCS,NLS在保持90%以上数据本地性的同时,在完成时间上的优化最多达到32%,在测试的9种情况中平均优化25%。 MPP(Massive Parallel Processing) database over distributed file systems has become one of research hotspots currently. In order to improve the procedure that schedule execution units to read data blocks before executing query scan operations, a scheduling strategy NLS based on nodes workload is proposed, which combines data locality and nodes workload. On the one hand, the phase of data locality allocating ensures that the scheduling results meet good data locality.On the other hand, reallocating on middle scheduling results based on nodes workload attains the goal of reducing the makespan of scanning data. The experimental results show that compared with the continuity strategy FCS, NLS keeps data locality over 90%. Moreover, the improvement on makespan achieves 32% at most and the average improvement is 25% in all nine test cases.
作者 郭凯 龚才鑫 龚奕利 雷迎春 GUO Kai1,GONG Caixin1,GONG Yili1,LEI Yingchun2(1 .Computer School, Wuhan University, Wuhan 434000, China ;2.Beijing Daowoo Technology Co., Ltd., Beijing 100000, China)
出处 《计算机工程与应用》 CSCD 北大核心 2018年第13期84-87,174共5页 Computer Engineering and Applications
基金 国家自然科学青年科学基金(No.61100020) 国家自然科学基金面上项目(No.61572373)
关键词 分布式文件系统 数据库 查询调度 负载优化 distributed file system database query scheduling workload optimization
作者简介 郭凯(1991-),男,硕士研究生,研究领域为分布式系统、数据库,E-mail:kayguo@whu.edu.cn;;龚才鑫(1993-),男,硕士研究生,研究领域为数据库,分布式存储;;龚奕利(1976-),通讯作者,女,博士,副教授,研究领域为云计算、分布式系统;;雷迎春(1973-),男,博士,副教授,研究领域为分布式文件系统、计算机网络。
  • 相关文献

参考文献5

二级参考文献37

  • 1韩蕾,孙徐湛,吴志川,陈立军.MapReduce上基于抽样的数据划分最优化研究[J].计算机研究与发展,2013,0(S2):77-84. 被引量:7
  • 2段凡丁.关于最短路径的SPFA快速算法[J].西南交通大学学报,1994,29(2):207-212. 被引量:41
  • 3ARMBRUST M, FOX A, GRIFFITH R, et al. Above the clouds: a Berkeley view of cloud computing [ J]. Communications of the ACM, 2010, 53(4): 50-58. 被引量:1
  • 4DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters[ J]. Communications of the ACM, 2008, 51(1) : 107 -113. 被引量:1
  • 5ISARD M, BUDIU M, YU Y, et al. Dryad: distributed data-paral- lel programs from sequential building blocks [ C ]// EuroSys '07: Proceedings of the 2007 2nd ACM SIGOPS/EuroSys European Con- ference on Computer Systems. New York: ACM, 2007:59 -72. 被引量:1
  • 6Hadoop [ EB/OL]. [ 2012 - 12 - 27]. http://hadoop, apache. org/. 被引量:1
  • 7THUSOO A, SHAO Z, ANTHONY S, et al. Data warehousing and analytics in-frastructure at Facebook [ C]// SIGMOD '10: Proceed- ings of the 2010 ACM SIGMOD International Conference on Manage- ment of Data. New York: ACM, 2010:1013 - 1020. 被引量:1
  • 8WHITE T. Hadoop: the definitive guide[M]. Sebastopol, CA, USA: O'Reilly Media, 2009. 被引量:1
  • 9Fair Scheduler for Hadoop [ EB/OL]. [2012 - 12 - 10]. http:// Hadoop. apache, org/common/docs/current/Fair_scheduler, html. 被引量:1
  • 10ZAHARIA M, BORTHAKUR D, SARMA J S, et al. Delay schedu- ling: a simple technique for achieving locality and fairness in cluster scheduling [ C]// EuroSys '10: Proceedings of the 5th European Conference on Computer Systems. New York: ACM, 2010: 265- 278. 被引量:1

共引文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部 意见反馈