- Introduction
- Some links to further HPF information
- Data storage
- Execution
- Intrinsic functions
- Storage rules
- HPF Subset
- HPF 2.0

A group based in Houston, High Performance Fortran Forum, has developed a proposal in the form of an extension to Fortran 90. The aim of this project High Performance Fortran or HPF is to offer a portable language which permits an efficient use on different parallel systems. The project has issued a final proposal in May 1993, and aims towards a de facto standard, and not formal standard from ANSI or ISO. In order to facilitate the introduction and general acceptance of HPF the group has also defined a subset based on Fortran 77 and just a few parts of Fortran 90. A number of manufacturers were involved in the group, and the proposal will hopefully get a fast acceptance, with many implementations. Those parts of the proposal that were controversial, and therefore requires further study, is available in a separate document "Journal of Development".

A very good book on HPF is the one by Koelbel et al.

The High Performance Fortran Forum has continued to develop the HPF language and are aiming to publish a new specification later this year. New features address asynchronous I/O, tasking, more general distributions, interfacing to ANSI C, reduction operations and more. It is likely that the core language will not be significantly expanded and that the specification will define extensions which are standardized but optional for a given implementation. There will also be a definition of a kernel of facilities which should be most efficient across a wide range of implementations.

The proposal is a Fortran-based language with facilities to control distribution of arrays onto distributed memory parallel computers and includes extensions for

- Parallel execution
- Good performance both on MIMD and SIMD computers
- Distribution of data on the available processors
- Location of data within a specific processor

- New directives
- New syntax (the statement
`FORALL`and new intrinsic functions) - Restrictions in the rules for storage

- MIL-STD-1753 (US military extension to Fortran 77, it is also included in Fortran 90)
- Array-operations
- All HPF-constructs, but with certain limitations in their functionality

CHPF$ directive ! Fortran 77 and fix form Fortran 90 !HPF$ directive ! Fortran 90 (both fix and free form)Since the founding of the group HPFF, much has been accomplished and commercial implementations of HPF version 1.0 are now appearing: Applied Parallel Research, Digital, Intel, Kuck and Associates, Meiko, Motorola, NEC, ACE, Hitachi, the Portland Group, Inc. and SofTech have already announced commercial products based on the HPF 1.0 standard (some of these are joint ventures).

Further information on the DEC Fortran 90, which includes HPF, is available.

The Portland Group has a High Performance Fortran Compiler available.

The Applied Parallel Research has a High Performance Fortran Compiler available.

The current (source: September 1995 HPFF Minutes) vendor implementation list is:

#### Announced Products

Applied Parallel Research, Digital, Hitachi, Intel, Meiko, Motorola, NA Software, NEC, Pacific Sierra Research, The Portland Group, Inc. (PGI), and SofTech.#### Announced Efforts

ACE, Fujitsu, IBM, Lahey, MasPar, NAG, nCube, and Thinking Machines.#### Interested

Cray Research, EPC, Convex, HP, Silicon Graphics, and Sun.

- High Performance Fortran Forum
- High Performance Fortran Forum European mirror
- Latest versions of the reports, available in PostScript and HTML
- High Performance Fortran Language Specification HPF 1, May 3, 1993
- HPF Journal of Development May 3, 1993

- An operation on several data objects ought to be faster if all the objects are on the same processor.
- It ought to be possible to perform many operations at the same time if the operations are performed on different processors.

REAL A(1000), B(1000), C(1000), X(500), Y(0:501) INTEGER INX(1000) !HPF$ PROCESSORS PROCS(10) !HPF$ DISTRIBUTE A(BLOCK) ONTO PROCS !HPF$ DISTRIBUTE B(BLOCK) ONTO PROCS !HPF$ DISTRIBUTE INX(BLOCK) ONTO PROCS !HPF$ DISTRIBUTE C(CYCLIC) ONTO PROCS !HPF$ ALIGN X(I) WITH Y(I-1) A(I) = B(I) ! (1) X(I) = Y(I-1) ! (2) A(I) = C(I) ! (3) A(I) = A(I-1) + A(I) + A(I+1) ! (4) C(I) = C(I-1) + C(I) + C(I+1) ! (5) X(I) = Y(I) ! (6) A(I) = A(INX(I)) + B(INX(I)) ! (7)We here work with 10 processors, and we have distributed the floating point vectors

Without instructing the system exactly how the two vectors

The assignments in the cases (1) and (2) above can thus be done without any inter-processor communication, since all data all the time is on the same processor.

The assignment in case (3) will in most instances be between different processors, only in exceptional cases will the data be on the same processor.

The assignment in case (4) on the other hand will in most instances be between data on the same processor, except in the exceptional cases at block boundaries. The case (5) looks like case (4), but in this case the data is all the time on different processors.

For case (6) we only know that

We do not know very much about the distribution in case (7). It is however easy to realize that

Also the formal arguments in a function or a subroutine can be stored in different ways using directives. Assume that we have a subroutine with two arguments, both floating point vectors.

SUBROUTINE CAROLUS (KARL, CHARLES) REAL, DIMENSION (:) :: KARL, CHARLES !HPF$ INHERIT :: KARL !HPF$ ALIGN WITH KARL :: CHARLES ! ... END SUBROUTINE CAROLUSThis is called with

PROGRAM APP83B REAL, DIMENSION (1718) :: GUSTAV, ADOLF INTERFACE SUBROUTINE CAROLUS (KARL, CHARLES) REAL, DIMENSION (:) :: KARL, CHARLES END SUBROUTINE CAROLUS END INTERFACE ! .. CALL CAROLUS (GUSTAV, ADOLF) END PROGRAM APP83BThis means that the first formal argument

Another example is when you require the four corner submatrices of the size

!HPF$ TEMPLATE, DISTRIBUTE (BLOCK, BLOCK) :: EARTH (N+1, N+1) REAL, DIMENSION (N, N) :: NW, NE, SW, SE !HPF$ ALIGN NW(I, J) WITH EARTH(I, J) !HPF$ ALIGN NE(I, J) WITH EARTH(I, J+1) !HPF$ ALIGN SW(I, J) WITH EARTH(I+1, J) !HPF$ ALIGN SE(I, J) WITH EARTH(I+1, J+1)Since a

A somewhat more complicated directive is

!HPF$ ALIGN A(:) WITH D(:,*)which means that a copy of the vector

!HPF$ TEMPLATE, D1(N), D2(N, N) REAL, DIMENSION (N, N) :: X, A, B, C, AR1, AR2, & P, Q, R, S !HPF$ ALIGN X(:,*) WITH D1(:) !HPF$ ALIGN (:,*) WITH D1 :: A, B, C, AR1, AR2 !HPF$ ALIGN WITH D2, DYNAMIC :: P, Q, R, SThe following is a more complete example, where an intrinsic function finds the present number of processors, which is used at the distribution of the arrays. In addition a usual (external) function is being used.

PROGRAM APP83C IMPLICIT NONE INTERFACE FUNCTION F(X) REAL :: F REAL, DIMENSION (:) :: X END FUNCTION F END INTERFACE REAL, DIMENSION (1000) :: X, Y, Z REAL, DIMENSION (5000) :: V, W REAL :: TEMP REAL, DIMENSION (10,1000) :: A !HPF$ PROCESSORS MPP(NUMBER_OF_PROCESSORS()) !HPF$ TEMPLATE T(1000), S(5000) !HPF$ DISTRIBUTE T(BLOCK) ONTO MPP ! Block size may !HPF$ DISTRIBUTE S(CYCLIC) ONTO MPP ! be specified !HPF$ ALIGN WITH T :: X, Y, Z !HPF$ ALIGN WITH S :: V, W !HPF$ ALIGN A(*,:) WITH T(:) ! Vector of columns ! ... TEMP = F(V) ! ... END PROGRAM APP83C REAL FUNCTION F(X) IMPLICIT NONE REAL, DIMENSION (:) :: X !HPF$ INHERIT :: X ! Distribute the formal argument ! X as the real argument V REAL, DIMENSION (SIZE(X)) :: S !HPF$ ALIGN WITH X :: S ! Distribute the local vector S ! as the formal argument X, or ! as the actual argument V ! ... F = SUM(S) ! Only dummy example RETURN END FUNCTION FYou can specify different distributions along different dimensions. The following specifications

REAL A(100,100), B(100,100), C(200) !HPF$ DISTRIBUTE A(BLOCK,*), B(*,CYCLIC), C(BLOCK(5))means that the first processor of a four processor computer stores the following array sections

A(1:25, 1:100) B(1:100, 1:97:4) C(1:5), C(21:25), C(41:45), C(61,65), C(81:85), C(101:105), C(121:125), C(141:145), C(161,165), C(181:185)It is also possible to distribute several dimensions in completely independent ways,

REAL D(8,100,100) !HPF$ DISTRIBUTE D(*,BLOCK, CYCLIC)means that the first processor of a four processor computer, configured as a 2*2 matrix, stores the following array sections

D(1:8, 1:50, 1:99:2)In addition to the static directives discussed above, there are also the two dynamic directives

- A formal argument can inherit the distribution from the actual argument.
- A formal argument can be redistributed to a requested distribution.
- A formal argument can be specified to have a certain distribution, and is an error if the actual argument does not have that distribution.

FORALL (I = 1:N, J = 1:N) H(I, J) = 1.0/REAL(I+J-1) FORALL (I = 1:N, J = 1:N, Y(I, J) .NE. 0.0) & X(I,J) = 1.0/Y(I,J) FORALL (I = 1:N) A(I,I+1:N) = 3.141592654The first of these define a Hilbert matrix of order

In all the three statements above, `FORALL ` can be considered as a
double loop, which can be executed in arbitrary order. The general
form of the `FORALL ` statement is

FORALL ( v1 = l1:u1:s1, ... , vn = ln:un:sn, mask ) & a(e1, ... , em) = right_hand_sideand is evaluated according to certain well specified rules, in principle all indices are evaluated first.

In addition there is

REAL, DIMENSION(N, N) :: A, B ... FORALL (I = 2:N-1, J = 2:N-1) A(I,J) = 0.25*(A(I,J-1)+A(I,J+1)+A(I-1,J)+A(I+1,J)) B(I,J) = A(I,J) END FORALLWhen these statements have been executed the arrays

In addition a directive `INDEPENDENT ` has been introduced for both `DO`
loops and `FORALL ` constructs. This directive is placed immediately
before the `DO ` statement or `FORALL ` construct, and is valid until the
corresponding `END DO ` (or the old form of terminating a
`DO ` loop) or `END `
`FORALL`. The directive assures the system that this part of the program
can be executed in an arbitrary order, including parallel, without any
computational differences in the result (no semantic change).

In the example below it is thus assured that the integer vector
`P `
does not have any repeated values (which would have meant that last
one wins at a normal sequential execution). A potential conflict at
parallel execution is thus avoided. It is also implicitly assured that
all values of `P ` are within the permitted bounds of 1 and 200.

REAL, DIMENSION(200) :: A REAL, DIMENSION(100) :: B INTEGER, DIMENSION(100) :: P ... !HPF$ INDEPENDENT DO I = 1, 100 A(P(I)) = B(I) END DOIt is also possible to indicate that certain parts of a nested loop shall be considered as independent. In the example below the innermost loop is not independent since each element of

REAL, DIMENSION(:, :, :), ALLOCATABLE :: A, B, C ... ALLOCATE (A(N,N,N)) ALLOCATE (B(N,N,N)) ALLOCATE (C(N,N,N)) ... !HPF$ INDEPENDENT, NEW (I2) DO I1 = 1, N1 !HPF$ INDEPENDENT, NEW (I3) DO I2 = 1, N2 !HPF$ INDEPENDENT, NEW (I4) DO I3 = 1, N3 DO I4 = 1, N4 ! The innermost loop is NOT ! independent ! A(I1, I2, I3) = A(I1, I2, I3) & + B(I1, I2, I4) * C(I2, I3, I4) END DO END DO END DO END DOThe

A

To the functions

0 -5 8 -3 A = 3 4 -1 2 1 5 6 -4gives the following values

MAXLOC(A) = (/ 1, 3 /) MAXLOC(A, DIM = 1) = (/ 2, 3, 1, 2 /) MAXLOC(A, DIM = 2) = (/ 3, 2, 3 /)The following completely new functions have been added. The inquiry functions are to be intrinsic, but the others may instead be available in a library as external functions.

- Inquiry functions:
`NUMBER_OF_PROCESSORS(DIM)`

Total number of processors`PROCESSORS_SHAPE()`

Shape for the processors

Example. For a DECmpp 12000 Model 8B we getNUMBER_OF_PROCESSORS() 8192 NUMBER_OF_PROCESSORS(DIM=1) 128 NUMBER_OF_PROCESSORS(DIM=2) 64 PROCESSORS_SHAPE() (/ 128, 64 /)

while on an ordinary workstation we getNUMBER_OF_PROCESSORS() 1 PROCESSORS_SHAPE() (/ 1 /)

- Bit Manipulation Functions:
These three use the model in section 13.5.7 of the Fortran 90 standard. Note that the results are machine dependent, since they require the number of bits in an integer, which in Fortran 90 is available with the intrinsic function

`BIT_SIZE`.`POPCNT`, gives the number of 1-bits in the integer argument`POPPAR`, gives the result 1 if the number of 1-bits in the integer argument is odd, else the result 0.`LEADZ`, gives the number of leading zeros in the integer argument.

- Other functions:

- The function
`ILEN`is an integer function which returns the number of bits required to store a signed integer in the form of a 2 complement. Examples of its use are that`2**ILEN(N-1)`rounds an arbitrary positive integer`N`upwards towards the closest power of 2, and`2**(ILEN(N)-1)`rounds down. - The following routines are not required to be intrinsic but may be in
a normal external library, and in that case they require a
`USE`statement. They are not explained below, we refer the reader to the HPF standard.The array reduction functions

`IALL, IANY, IPARITY`and`PARITY`are available and they correspond to the following intrinsic functions of Fortran 90, namely`IAND, IOR, IEOR`and the operator`.NEQV.`A large number of functions are available to gather and scatter data, they have names of the form

`XXX_SCATTER`, where`XXX`can be`SUM, COUNT, PRODUCT, MAXVAL, MINVAL, IALL, IANY, IPARITY, ALL, ANY`and`PARITY`. For parallel operations there are the functions`XXX_PREFIX`and`XXX_SUFFIX`, where`XXX`has the same possibilities as for`XXX_SCATTER`.

**Example:**`SUM_PREFIX`sums successively the elements of the array, the first remains unchanged, the second becomes the sum of the first two, and so on. With`SUFFIX`the summation is done in the other direction (backwards). In addition there are two sorting functions,`GRADE_UP`and`GRADE_DOWN`. Operations for parallel input and output are being considered, and can be found in the Journal of Development.

- The function

Fortran has storage association and sequence association. The Fortran 90 standard states in (14.6.3) that storage association is the association of two or more data objects that occurs when two or more storage sequences share or are aligned with one or more storage units, and in (12.4.1.4) that sequence association is the order that Fortran requires when an array is associated a formal argument. The rank and shape of the actual argument neeed not agree with the rank and shape of the dummy argument, but the number of elements in the dummy argument must not exceed the number of elements in the element sequence of the actual argument. If the dummy argument is assumed size, the number of elements in the dummy argument is exactly the number of elements in the element sequence.

Note that HPF has no problem with array parameters distributed over
the processors, as long as both the actual and the dummy arguments
have the same rank and shape. It is when the properties of Fortran,
with respect to `COMMON ` and `EQUIVALENCE`, are used too much, that we get
into problems. If we use a subroutine that contains the following
specifications

SUBROUTINE HOME(X) DIMENSION X(20,10)it can be called with

A directive

In order to get old programs to produce the same results with HPF the following is recommended:

- Include a facility to check that all parts of
`COMMON`blocks agree within the whole program. - Include a facility so that all
`COMMON`blocks are considered sequential, if no explicit directive for the opposite case has been given.

- Ordinary concepts:
- Identifiers with up to 31 characters including the underline symbol _
- Comments starting with an exclamation mark !

- Array concepts
- Array sections
- Array assignment
- The statements
`WHERE`and`ELSEWHERE` - Array valued external functions
- Automatic arrays
- Allocatable arrays
- Arrays with assumed shape
- New intrinsics
- Array valued new intrinsics

- Additional concepts from Fortran 90
- Optional arguments
- Keyword arguments
`INTERFACE`(but not generic or for modules)- Type specifications (but not
`KIND`and`TYPE`)

- The military extension MIL-STD-1753 to Fortran 77
- The statements
`DO WHILE, END DO, IMPLICIT NONE`and`INCLUDE` - The bit
manipulation functions
`BTEST, IAND, IBCLR, IBITS, IBSET, IEOR, IOR, ISHFT, ISHFTC`and`NOT` - The bit copy routine
`MVBITS` - The binary, octal and hexadecimal constants.

- The statements

- The directives
`DYNAMIC, EXTRINSIC, LOCAL, PROCESSOR_VIEW, PURE, REALIGN`and`REDISTRIBUTE` - The construct
`FORALL ... END FORALL` - The
`HPF`-library - The
`HPF_LIBRARY`module

- The HPF 2.0 Language
- HPF 2.0 Approved Extensions

- Basic data distribution features (
`ALIGN`and`DISTRIBUTE`, with some simplifications from HPF 1.1 in the procedure calls) - Data parallel features (
`FORALL`and`INDEPENDENT`, including a new`REDUCTION`clause for`INDEPENDENT`) - Intrinsic and library routines (HPF 1.1, plus two new forms of sorting)
- The
`EXTRINSIC`mechanism (as in HPF 1.0)

- New data distribution patterns (including
`INDIRECT`mappings and`SHADOW`regions) and`DYNAMIC`data distributions (`REALIGN`and`REDISTRIBUTE`) - New parallel control mechanisms (including the
`ON`clause and`TASK_REGION`directive) - Asynchronous I/O operations
- Several predefined
`EXTRINSIC`types (some, like`HPF_LOCAL`, supported by the HPF Forum; other, like`HPF_CRAFT`, supported by other groups)

- Anonymous FTP from
`ftp://titan.cs.rice.edu/public/HPFF/draft/`This directory contains:

`hpf-v20-final.ps.gz`PostScript version of the document`hpf-v20-final.dvi.gz`DVI version of the document`hpf-v20-final.tar.gz`The entire document in LaTeX 2e format: To run LaTeX, unzip and untar the file to create a directory`hpf-v20-final`, then go to that directory and run LaTeX on`hpf-report.tex`

- Hardcopy
Send a message to Theresa Chatman, CITI/CRPC, Mail Stop 41, Rice University, 6100 Main Street Houston, TX 77005, USA; or e-mail to

`tlc@cs.rice.edu`to request a copy of CRPC-TR-92225. There is a $50.00 charge to cover copying and shipping costs. - WWW
The project hopes to have the draft available through the WWW shortly. Due to the length and complexity of the document, turning it into HTML is "highly nontrivial" (i.e.

`latex2html`chokes on the macros).

Last modified: 1 July 1997