Efficient Address Generation for Affine Subscripts in Data-Parallel Programs

doi:10.1023/A:1008190606079

機構典藏 > College of Engineering > Graduate Institute & Department of Computer Science and Information Engineering > Journal Article > Item 987654321/59898

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/59898

Title:	Efficient Address Generation for Affine Subscripts in Data-Parallel Programs
Authors:	Shih, Kuei-ping;石貴平;Sheu, Jang-ping;Chang, Chih-yung
Contributors:	淡江大學資訊工程學系
Keywords:	address generation;affine subscripts;data distribution;distributed-memory;multicomputers;data-parallel languages;multiple induction variables (MIVs);single program multiple data (SPMD)
Date:	2000-09-01
Issue Date:	2011-10-05 22:25:05 (UTC+8)
Publisher:	Dordrecht: Kluwer Academic Publishers
Abstract:	Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. Moreover, we also show that the repetitive iterations for outer loop are Pk/gcd(Pk, s1), where P is the number of processors and s1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with affine subscript in a two-nested loop is O(Pk/gcd(Pk, s1)+k+s2).
Relation:	The Journal of Supercomputing 17(2), pp.205-227
DOI:	10.1023/A:1008190606079
Appears in Collections:	[Graduate Institute & Department of Computer Science and Information Engineering] Journal Article

Files in This Item:

File	Description	Size	Format
33.pdf		811Kb	Adobe PDF	32	View/Open

Loading...