您的位置: 专家智库 > >

国家自然科学基金(61170049)

作品数:5 被引量:5H指数:1
相关作者:杜云飞吴强王锋陈娟杨灿群更多>>
相关机构:国防科学技术大学更多>>
发文基金:国家自然科学基金国家高技术研究发展计划更多>>
相关领域:自动化与计算机技术动力工程及工程热物理更多>>

文献类型

  • 5篇期刊文章
  • 1篇会议论文

领域

  • 4篇自动化与计算...
  • 1篇动力工程及工...
  • 1篇电子电信

主题

  • 2篇GPU
  • 1篇等离子体
  • 1篇数据通信
  • 1篇双缓冲
  • 1篇通信
  • 1篇图形处理器
  • 1篇内存
  • 1篇激光
  • 1篇激光等离子体
  • 1篇共享内存
  • 1篇PHI
  • 1篇PIC
  • 1篇PROGRA...
  • 1篇SCALE
  • 1篇XEON
  • 1篇ACCELE...
  • 1篇BARRIE...
  • 1篇CENTRA...
  • 1篇FAST
  • 1篇FRAMEW...

机构

  • 3篇国防科学技术...

作者

  • 3篇杜云飞
  • 2篇杨灿群
  • 2篇王锋
  • 1篇陈娟
  • 1篇吴强

传媒

  • 3篇Journa...
  • 2篇计算机工程与...
  • 1篇第十七届计算...

年份

  • 1篇2014
  • 4篇2013
  • 1篇2012
5 条 记 录,以下是 1-6
排序方式:
GPGPU性能模型研究被引量:1
2013年
GPGPU的发展为并行程序带来了丰富的计算资源,但是对程序优化提出了更高的要求。程序性能模型对定位程序性能瓶颈,指导优化方法,平衡与其他设备的负载等方面起着重要作用。描述了当前性能模型的研究现状,并对其进行分类和分析。总体上性能模型分为基于统计方法的性能模型和性能解析模型,性能解析模型又分为性能度量模型、计算和访存并行性感知的模型和分部件定量分析性能模型。每种模型都给出了优缺点,并且实现了一个基于统计信息的插值性能模型,用于指导负载平衡。最后对存在的问题和未来的挑战进行了阐述。
王锋杜云飞陈娟
关键词:GPGPUGPU
一种基于共享内存的多进程共享GPU技术
GPU的发展,大量的并行科学计算程序都采用GPU进行加速计算.然而,现有的GPU不支持多个进程同时访问.当一个进程初始化GPU之后,在释放GPU之前,其它的进程是无法使用GPU的.提出了一种基于共享内存的多进程共享GPU...
杜云飞杨灿群王锋
关键词:图形处理器共享内存数据通信
基于Intel Xeon Phi的激光等离子体粒子模拟研究被引量:1
2014年
激光等离子体粒子模拟广泛用于探索极端物质状态下的科学问题。将一种基于粒子云网格方法的三维等离子体粒子模拟程序LARED-P移植到Intel Xeon Phi协处理器上。在移植的过程中,综合运用了Native和Offload两种编程模式:首先运用Native模式对LARED-P程序中热点计算任务进行优化研究,通过采用SIMD扩展指令使该计算任务获得了4.61倍的加速;然后运用Offload模式将程序移植到CPU-Intel Xeon Phi异构系统上,并通过使用异步数据传输和双缓冲技术分别提升了程序性能9.8%和21.8%。
姚文科杜云飞吴强杨灿群
关键词:INTELXEONPHI双缓冲
Energy optimization of representative barrier algorithms
2012年
Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems.The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms:tournament barrier and central counter barrier.Furthermore,energy optimization methods of these two barrier algorithms were implemented on parallel computing platform.The experimental results validate the effectiveness of the energy optimization methods.67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%?8.80% performance loss.Furthermore,LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.
陈娟董勇
关键词:LOGP
Programming for scientific computing on peta-scale heterogeneous parallel systems被引量:1
2013年
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.
杨灿群吴强唐滔王锋薛京灵
Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems被引量:2
2013年
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.
杨灿群吴强胡慧俐石志才陈娟唐滔
共1页<1>
聚类工具0