淡江大學機構典藏:Item 987654321/59898
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 64178/96951 (66%)
造访人次 : 10204075      在线人数 : 18032
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/59898


    题名: Efficient Address Generation for Affine Subscripts in Data-Parallel Programs
    作者: Shih, Kuei-ping;石貴平;Sheu, Jang-ping;Chang, Chih-yung
    贡献者: 淡江大學資訊工程學系
    关键词: address generation;affine subscripts;data distribution;distributed-memory;multicomputers;data-parallel languages;multiple induction variables (MIVs);single program multiple data (SPMD)
    日期: 2000-09-01
    上传时间: 2011-10-05 22:25:05 (UTC+8)
    出版者: Dordrecht: Kluwer Academic Publishers
    摘要: Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. Moreover, we also show that the repetitive iterations for outer loop are Pk/gcd(Pk, s1), where P is the number of processors and s1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with affine subscript in a two-nested loop is O(Pk/gcd(Pk, s1)+k+s2).
    關聯: The Journal of Supercomputing 17(2), pp.205-227
    DOI: 10.1023/A:1008190606079
    显示于类别:[資訊工程學系暨研究所] 期刊論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    33.pdf811KbAdobe PDF32检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈