English  |  正體中文  |  简体中文  |  Items with full text/Total items : 57517/91034 (63%)
Visitors : 13473943      Online Users : 355
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/59898

    Title: Efficient Address Generation for Affine Subscripts in Data-Parallel Programs
    Authors: Shih, Kuei-ping;石貴平;Sheu, Jang-ping;Chang, Chih-yung
    Contributors: 淡江大學資訊工程學系
    Keywords: address generation;affine subscripts;data distribution;distributed-memory;multicomputers;data-parallel languages;multiple induction variables (MIVs);single program multiple data (SPMD)
    Date: 2000-09-01
    Issue Date: 2011-10-05 22:25:05 (UTC+8)
    Abstract: Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. Moreover, we also show that the repetitive iterations for outer loop are Pk/gcd(Pk, s1), where P is the number of processors and s1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with affine subscript in a two-nested loop is O(Pk/gcd(Pk, s1)+k+s2).
    Relation: The Journal of Supercomputing 17(2), pp.205-227
    DOI: 10.1023/A:1008190606079
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Journal Article

    Files in This Item:

    There are no files associated with this item.

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback