Software transactional memory for gpu architectures in los angeles

Program committee, the 19th ieee international conference on parallel and. Software transactional memory for gpu architectures proceedings. A stm system that supports perthread transactions faces new challenges due to the distinct characteristics of gpus. Software transactional memory for gpu architectures acm digital. Hardware support for local memory transactions on gpu.

Kalia, aiichiro nakano, priya vashishta collaboratory for advanced computing and simulations. In this paper, we propose a highly scalable, livelockfree software transactional memory stm system for gpus, which supports perthread transactions. Modern gpus have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. Performance characteristics of hardware transactional memory for molecular dynamics application on bluegeneq. Transactional memory tm is an optimistic approach to imple. Amd gpu architectures are composed of several simd computation. Stm software transactional memory htm hardware transactional memory. While transactional memory for processors with hundreds of cores is likely to require hardware support, software implementations will be required for backward compatibility with current and near. Hardware transactional memory for gpu architectures.

To make applications with dynamic data sharing benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpustm. Alejandro villegas, angeles navarro, rafael asenjo, oscar plata toward a software transactional memory for heterogeneous cpugpu processors, in press. Transactional memory, data consistency, and multicore synchronization. Software register rollback rarely needed linear memory write logs in local memory rarely s. It may be viewed as a generalized version of the atomic compareandswap instruction, which can operate on an arbitrary set of data instead of just one machine word.

Xu liu, acm transactions on architecture and code optimization, 2018. To evaluate tlll, we use it to implement six widely used programs, and compare it with the stateoftheart adhoc gpu synchronization, gpu software transactional memory stm, and. J39 sangpil lee and won woo ro, parallel gpu architecture simulation. This gpu dictionary explains the difference between memory clocks and core clocks, pcie transfer rates, shader specs, what a rop is. Performance characteristics of hardware transactional. Department of computer architecture at the university of malaga. The local memory is usually used as a scratchpad due to its low latency. An ultrafast scalable manycore motif discovery algorithm for multiple gpus. Speculative contention avoidance in software transactional memory.

133 1253 1038 118 1030 1234 305 779 190 1069 700 358 1336 1084 1035 1276 1429 150 214 35 218 288 1143 421 1376 506 1265 587 268