公共文化服务平台

共 5 条记录，以下是 1-6

全选清除导出

排序方式：

GPGPU性能模型研究被引量：1: 2013年; GPGPU的发展为并行程序带来了丰富的计算资源,但是对程序优化提出了更高的要求。程序性能模型对定位程序性能瓶颈,指导优化方法,平衡与其他设备的负载等方面起着重要作用。描述了当前性能模型的研究现状,并对其进行分类和分析。总体上性能模型分为基于统计方法的性能模型和性能解析模型,性能解析模型又分为性能度量模型、计算和访存并行性感知的模型和分部件定量分析性能模型。每种模型都给出了优缺点,并且实现了一个基于统计信息的插值性能模型,用于指导负载平衡。最后对存在的问题和未来的挑战进行了阐述。; 王锋杜云飞陈娟; 关键词：GPGPU GPU

一种基于共享内存的多进程共享GPU技术: GPU的发展,大量的并行科学计算程序都采用GPU进行加速计算.然而,现有的GPU不支持多个进程同时访问.当一个进程初始化GPU之后,在释放GPU之前,其它的进程是无法使用GPU的.提出了一种基于共享内存的多进程共享GPU...; 杜云飞杨灿群王锋; 关键词：图形处理器共享内存数据通信

基于Intel Xeon Phi的激光等离子体粒子模拟研究被引量：1: 2014年; 激光等离子体粒子模拟广泛用于探索极端物质状态下的科学问题。将一种基于粒子云网格方法的三维等离子体粒子模拟程序LARED-P移植到Intel Xeon Phi协处理器上。在移植的过程中,综合运用了Native和Offload两种编程模式:首先运用Native模式对LARED-P程序中热点计算任务进行优化研究,通过采用SIMD扩展指令使该计算任务获得了4.61倍的加速;然后运用Offload模式将程序移植到CPU-Intel Xeon Phi异构系统上,并通过使用异步数据传输和双缓冲技术分别提升了程序性能9.8%和21.8%。; 姚文科杜云飞吴强杨灿群; 关键词：INTEL XEON PHI 双缓冲

Energy optimization of representative barrier algorithms: 2012年; Too high energy consumption is widely recognized to be a critical problem in large-scale parallel computing systems.The LogP-based energy-saving model and the frequency scaling method were proposed to reduce energy consumption analytically and systematically for other two representative barrier algorithms:tournament barrier and central counter barrier.Furthermore,energy optimization methods of these two barrier algorithms were implemented on parallel computing platform.The experimental results validate the effectiveness of the energy optimization methods.67.12% and 70.95% energy savings are obtained respectively for tournament barrier and central counter barrier on platforms with 2048 processes with 1.55%?8.80% performance loss.Furthermore,LogP-based energy-saving analytical model for these two barrier algorithms is highly accurate as the predicted energy savings are within 9.67% of the results obtained by simulation.; 陈娟董勇; 关键词：LOGP

Programming for scientific computing on peta-scale heterogeneous parallel systems被引量：1: 2013年; Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains （e.g., physics and biology） rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems.; 杨灿群吴强唐滔王锋薛京灵

Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems被引量：2: 2013年; Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU.; 杨灿群吴强胡慧俐石志才陈娟唐滔

全选清除导出

共1页<1>

国家自然科学基金(61170049)

文献类型

领域

主题

机构

作者

传媒

年份

用户反馈

国家自然科学基金(61170049)

文献类型

领域

主题

机构

作者

传媒

年份

用户登录

用户反馈