Cray Compilers


Slide 1) Overview


There are two primary compilers available for the Cray, FORTRAN and C.
We will discuss FORTRAN first and more intensely.  It is the more developed
language for the Cray.

The compiler turns your text into machine instructions.  On the Cray,
a single command, cf77, compiles your program, but can also do preprocessing,
and the creation of an executable program.

If you are porting a code from another computer, there are a number of
differences to consider.  The compiler will give you a variety of warnings
and error messages.  If your program compiles properly, but fails to run 
correctly, there are some things the compiler can do to help you find the 
mistake.

There is a built in library of scientific software routines.  It is
also easy to use other precompiled libraries supplied by the center.

The compiler does a great deal of vectorization automatically.  You
can ask for a report from the compiler, listing which loops vectorized.
You may be able to make some simple program changes to improve vectorization.
In some cases, you may want to use compiler directives, which are 
messages to the compiler inserted directly in the text of your program.

It is also possible to run a program in parallel across several of the
Cray's processors.  You can request that the compiler search for portions
of the code to parallelize automatically, and you can insert compiler
directives to control the process.


Slide 2) Typical uses of compiler


In the simplest case, a user brings a C or FORTRAN file from a PSC front end,
or CFS or AFS, or a remote machine, runs the CC or CF77 script to compile
the program and create an executable called A.OUT, which can then be run.

For instance:

  cc myfile.c 
  a.out

The compiler can be told to only compile, without proceeding to create
the executable.  This can be useful for debugging a small portion of a
bigger program, or if a program is broken into two parts, only one of which
needs to be recompiled each time:

  cf77 -c myfile.f

In this case, the result is a file MYFILE.O, called an "object" file, which is 
NOT executable.  If MYFILE.F contains the whole program, then we can create the 
executable with a second call to the compiler:

  cf77 myfile.o

which creates the A.OUT executable.  If MYFILE.F is not complete, but needs
routines from, say, MYSUB.F, this could be done with the commands

  cf77 -c mysub.f
  cf77 myfile.o mysub.o

or, in fact,

  cf77 myfile.o mysub.f

The compiler "knows" that the MYSUB.F must be compiled, and then linked with 
MYFILE.O to create the executable file A.OUT.


Slide 3) What the compiler does for you


  * Checks your program's syntax

  * Determines data storage

  * Notes calls to routines not supplied

  * Includes routines requested from system or local libraries

  * Optimizes your code

  * Searches for vectorization opportunities

  * Searches for parallel opportunities


SLIDE 4) The compiling process


The CF77 compiler is actually made up of a sequence of steps.   When you 
compile a program, only the last two steps are carried out by default.  That's 
usually all you need for a beginner.

The steps include:

  fpp    - automatic parallel and enhanced vector analysis.
  fmp    - translation of parallel processing directives.
  cft77  - The actual language compiler (automatic vectorization).
  segldr - The loader, creates the executable program.

Specifying the switch:

  -c  carries out the CFT77 step.
  -Zv gets you enhanced vectorization, and carries out FPP, CFT77 and SEGLDR.  
  -Zp gets you all four steps.  

Each of the steps is actually a full-fledged command, with its own set of
switches.  CF77 supplies various switches to these commands, but you can
communicate directly with any of the commands using the " -W " switch, though
this tends to get ugly.  For instance, the command

  cf77 -Wf"-e m" myprog.f

passes the switch "-e m" to the CFT77 command.  We'll see more examples of
this later.


SLIDE 5) Portability


You should expect an ANSI FORTRAN code to compile and run on the Cray.  But
it's a wise idea to see if you really are using ANSI FORTRAN!

If you are using a VAX/VMS system, you might try the command:

  $ FORTRAN/STANDARD MYPROG.FOR

On the Cray itself, you can use the command:

  cf77 -Wf"-e n" myprog.f


One common source of trouble is the name of the random number generator.
On the Cray, the proper way to handle random numbers is as follows:

      ISEED=1952
      CALL RANSET(ISEED)       <-- Initializes the sequence
      DO 10 I=1,N
        X=RANF()               <-- Returns a random value
        ....
10      CONTINUE


SLIDE 6) Data Representation


Another feature of the Cray that may cause you conversion problems is the
fact that the standard word size is 64 bits.  This means that a real
variable is roughly as accurate as a double precision variable on a
32 bit machine.  In fact, one of the design features of the Cray was
to provide higher accuracy automatically.

Cray REAL numbers range from about 10**(-2465) to 10**2466, and the 
machine precision is roughly 0.7E-14.

If your program already uses DOUBLE PRECISION, however, it is important
that you convert it to REAL before running it on the Cray.  Otherwise,
it will execute extremely slowly.  There is a Cray compiler switch that can 
help you.  It essentially tells the compiler to interpret every double 
precision usage as a real usage.  The command is:

  cf77 -Wf"-d p" myprog.f

By the way, "REAL*8" is automatically interpreted as REAL on the Cray, although 
it is interpreted as DOUBLE PRECISION on most computers.

Oddly enough, INTEGER variables are assigned a 64 bit word, but generally
are only allowed to use 46 of those bits.  You don't need to worry about
this unless you are planning to compute extremely large integers, or to
do bit manipulations using integer arithmetic.

Unfortunately, the Cray does NOT store data using the IEEE format!  It
is not to hard, however, to have the Cray write data files in IEEE format.


SLIDE 7) Language Features


Cray FORTRAN allows:

  INCLUDE files;

  TAB characters;
 
  DO WHILE statements;

  Unnumbered DO/ENDDO statements;

  Variable names up to 32 characters long;

  MIL-STD bit routines: BTEST, IBSET, IBCLR, ISHFT, ISHFTC, MVBITS;

  Namelist input;

  Recursive subroutine calls.


SLIDE 8) FORTRAN 90


  Cray FORTRAN includes a limited set of FORTRAN 90 features, including:

    Assumed size arrays:       REAL A(*)      INTEGER B(10,*)

    Dynamic arrays:            Work arrays created in subroutines.

    Arbitrary based indexing:  REAL A(-10:30)

    Array assignments:         A=B    A=B+C   A=B/C   A=SQRT(B)   IF(Y.EQ.Z)...

    Array selector:            A=B(1:10)      A=B(4:20:2)

    Arbitrary array selector:  A=B(INDX)


SLIDE 9) Stack/static Data Storage


One of the most common sources of problems when compiling programs on the
Cray occurs because of faulty data storage.

Most compilers have used STATIC allocation in the past.  This meant that
every variable in a program had its own unique and permanent storage
(a subroutine's dummy arguments don't count, of course!).  

In order to reduce memory usage, and to allow parallel processing, it
became necessary to move to a STACK based memory system.  In this
system, variables in the main program have permanent storage.  Variables
unique to a subroutine are thrown away when the subroutine is left.

If you are having very strange results, you might try asking the Cray
to compile your program using STATIC storage everywhere:

  cf77 -Wf"-e v" myprog.f

If, on the other hand, you know that a particular variable must be saved
between calls, you can insert a SAVE statement in the subroutine.


SLIDE 10) STATIC/STACK example


Many programs, without realizing it, depend on a subroutine variable
"retaining" its value between calls:

      FUNCTION AREA(R)
      DATA ICALL /1/
      IF(ICALL.EQ.1)THEN
        ICALL=2
        PI=ATAN(1.0)*4.0
        ENDIF
      AREA=PI*R*R
      RETURN
      END

The Cray will actually realize that ICALL is special, because of the
DATA statement.  It will set aside permanent storage for it.  But PI
will not be treated this way.  The second time the program is called,
PI will not have a predictable value.  It might be 0, it might be anything.

However, if you simply insert the line

      SAVE PI

before the executable statements, all should be well again.


SLIDE 11) Compilation Errors


Compiler messages come in various levels, including:

  INFO:               useful information
  VECTOR:             vectorization information
  COMMENT or NOTE:    inefficient or outmoded programming.
  CAUTION or WARNING: possible or probable errors.
  ERROR:              a fatal error (program must be changed!)
  ANSI:               points out non-ANSI programming

By default, the compiler will generally only issue WARNING and ERROR messages.
If a compiled listing is requested, VECTOR messages, and some other 
optimization messages will be printed as well. 

The compiler will print messages to the STDERR output device, which is
your screen if you're interactive, or your ".e" file if you're running
NQS, or your ".CPR" or ".LOG" file if you're submitting from another machine.

A compiler message looks like this:

      4      4.       A(1,2)=A(1,2)+X
                             ^
 cft77-125 cft77: ERROR $MAIN, Line = 4, File = temp.f, Line = 4
   The number of subscripts is greater than the number of declared dimensions.

You can increase or decrease the level of messages using the "-e m" switch:

  cf77 -Wf"-m 4" myprog.f

will cause only ERROR messages to be printed.


SLIDE 12) An I/O Run Time Error


Unfortunately, it can be difficult to figure out what went wrong when a
running program "crashes".  Some error messages are actually helpful.  If
you OPEN a file that isn't there, but you specify STATUS='OLD', then you'll
get the message:

sys-2 a.out: UNRECOVERABLE error on system request
  No such file or directory
apparent state: unit 1 named fred.dat
last format:
lately writing direct unformatted internal
Abort
TB001 - BEGINNING OF TRACEBACK
      - $TRBK    WAS CALLED BY f_sig    AT  200122b (LINE NUMBER      102)
      - f_sig    WAS CALLED BY __handlr AT  104165b
      - __handlr WAS CALLED BY $STKOFEN AT   72047a
      - $STKOFEN WAS CALLED BY killm    AT  102245a
      - killm    WAS CALLED BY raise    AT  102322a (LINE NUMBER       15)
      - raise    WAS CALLED BY abort    AT   52763a (LINE NUMBER       46)
      - abort    WAS CALLED BY $ferror  AT  172743a (LINE NUMBER       46)
      - $ferror  WAS CALLED BY f$open   AT  201251a (LINE NUMBER      430)
      - f$open   WAS CALLED BY $OPN     AT  154070a
      - $OPN     WAS CALLED BY COMPUT   AT     345c (LINE NUMBER        4)
      - COMPUT   WAS CALLED BY $MAIN    AT     325a (LINE NUMBER       13)
      - $MAIN    WAS CALLED BY $START$  AT     303a
TB002 - END OF TRACEBACK
Abort (core dumped)

This is overkill!  But here's what's going on:

  First, we have an error message, whose code is "sys-2".
  The error message was generated by "A.OUT", and the text of the message
  is "no such file".  

  Secondly, we have some information about the file name, the last
  I/O operation carried out (none), and the type of I/O associated
  with the file.

  Finally, we have a traceback.  The traceback is read "backwards", starting
  with TRBK (the routine responsible for producing the traceback), and
  backs up through a series of system routines culminating in $OPN
  which corresponds to the OPEN statement in the user routine COMPUT.
  From the traceback, OPEN was called at line 4 of COMPUT, and that's
  where things went wrong.


SLIDE 13) Other Run Time Errors


The most common run time error messages on the Cray are more mysterious:

  Floating Point Exception
and
  Operand Range Error

Usually, you will get a traceback, but occasionally even this will be
defective or missing.  We will see in other talks some things you can do to 
try to track down these errors.  However, the compiler can sometimes
help you.  

In particular, floating point exception, which occurs when you try to
take the square root of a negative number, or divide 1 by 0, or some other
illegal operation, can happen because you used an uninitialized variable,
or because your program is using stack memory, but at least some variables
in a subroutine are losing their values between calls.  If this is the
case, then the compile statement

  cf77 -Wf"-e v" myprog.f

may help.  You should still try to find the problem, since STATIC memory
allocation increases your memory requirements, and means you will not be
able to do any parallel processing.

The "Operand Range Error" message can also be caused by stack memory,
but it might be caused as well by an array going out of bounds, or by
a subroutine called with the wrong number of arguments, or by a subroutine
that attempts to alter a dummy argument that is actually a constant.

Some compiler switches that MIGHT help include:

  cf77 -Wf"-R a" myprog.f         Checks number and type of routine arguments.

  cf77 -Wf"-R b" myprog.f         Attempts to catch out of bounds arrays

  cf77 -Wf"-i 64" myprog.f        Use 64 bit integers.

  cf77 -Wf"-e i" myprog.f         Catch use of uninitialized variables.

  cf77 -Wf"-e z" myprog.f         Create symbol table for debugger use.


SLIDE 14) SCILIB


Cray Research provides an extensive library of scientific software, which
is available to you by default.  This includes the following four libraries:

  BLAS     dot products, vector norms, matrix-vector products.
  LINPACK  LU factors, determinants, linear system solution, QR factors.
  EISPACK  Eigenvalues and eigenvectors.
  LAPACK   Modern, high speed versions of LINPACK and EISPACK

as well as FFT, filter, and recurrence routines.  

SCILIB was originally installed on the Cray for convenience.  Some of the
routines, especially the matrix products, the LAPACK dense linear system
solver, and the FFT routines, have very high speed.  But there is no
guarantee that all the routines will be good performers.

Most SCILIB routines are described in the online document SCILIB.BIG.  Because
they are "built in", you may call any of the routines without having to use
a special switch in your compile statement.


SLIDE 15) Using precompiled libraries


PSC has collected a large library of software.  Compiled copies of these
libraries are available on the Cray.  

A user program that calls routines from one of these libraries must tell
the compiler to include that library at linking time.  Instructions for
how to do this are usually given with the documentation for each library.
For instance, if a program calls any routine from the BCSLIB library,
the compile statement must look something like this:

  cf77 myprog.f -lbcslib

The actual file containing the compiled BCSLIB library is stored in
the /usr/local/lib area, and is called libbcslib.a.  You might want to
look in that directory and see what's available.

Files like libbcslib.a, with an extension of ".a", are a special kind of
library file.  In this case, the file is actually a collection of "object"
or ".o" files.  The compiler will extract copies of the needed object files
to build the executable.


SLIDE 16) Duplicate Entry Points


If you are using libraries, it's possible that your own program will contain
a subroutine of the same name as one of the library routines.  In that case,
the compiler prints a warning like this:

 ldr-290 segldr: CAUTION 
     Duplicate entry point 'CFILL' was encountered.
     Entry in module 'CFILL' from file 'test.o' has been used.
     Entry in module 'CFILL' from file '/usr/local/lib/libbcslib.a' has been
     ignored.

Here, my program included a routine CFILL, that I wrote.  I also wanted to
use some routines from BCSLIB, which, it turns out, also has a routine called
CFILL.  The compiler (actually, SEGLDR) warns me that there are two CFILL's
to chose from, and that it's going to take the one in my program.

In cases like this, if you can, it's best to rename your routine (say, to
DFILL), to avoid any chance that the name conflict will cause you problems
at run time.


SLIDE 17) Vectorization 


In a later talk, you will hear what vectorization is, how it speeds up your
program's execution, and how you can improve the amount of vectorization that
occurs in your program.

For now, we will simply discuss a few ideas about vectorization related to
the compiler.

The first thing to note is that vectorization is a process that allows
SOME inner DO loops in a FORTRAN program to execute much faster than they
normally would.  On the Cray, this speedup may be a factor of 10, 20 or 30,
depending on the loop.

Only some loops can vectorize.  The compiler, all by itself, will try
to determine whether each loop of your program can vectorize.  It will
err on the side of caution.  You can find out how the compiler is handling
your program by getting a compiled listing of the program.  This can be
done with the command:

  cf77 -Wf"-e s" myprog.f

The listing file will be stored in a MYPROG.L, which you can print or type out 
or save for later.  It will contain a copy of the program, with comments about
which loops vectorized or did not, and why.


SLIDE 18) Sample LOOPMARK output


By using a different compiler option, you can get a more visually striking
version of the compiler listing.  The command

  cf77 -Wf"-e m" myprog.f

produces what is called a "LOOPMARK" listing, which might look something
like this:

  
    486      1.                               subroutine s222(a,b,c,n)
    487      2.                         c
    488      3.                         c
    489      4.                         c     loop distribution
    490      5.                         c     partial loop vectorization
    491      6.                         c
    492      7.                               integer n
    493      8.                               real a(*),b(*),c(*)
    494      9. Sr---------------------<      do 240 i = 2,n
    495     10. Sr                               a(i) = a(i) + b(i)
    496     11. Sr                               b(i) = b(i-1)*b(i-1)*a(i)
    497     12. Sr                               a(i) = a(i) - b(i)
    498     13. Sr--------------------->  240 continue
    499     14.                               return
    500     15.                               end
  
                V E C T O R I Z A T I O N   I N F O R M A T I O N
                -------------------------------------------------
 cft77-8044 cf77: VECTOR S222, Line = 9, File = testvector.f, Line = 494
   Loop starting at line 9 was not vectorized.  It contains a recurrence on "B" 
at line 11.
  
                O P T I M I Z A T I O N   I N F O R M A T I O N
                -----------------------------------------------
 cft77-8135 cf77: SCALAR S222, Line = 9, File = testvector.f, Line = 494
   Loop starting at line 9 was unrolled 16 times.


Note that each message has a standard message number format: cft77-8044, and
cft77-8135.  This is similar to the error and warning messages we've seen
earlier.  All of these message numbers may be used to obtain a more
extensive explanation of the message from the EXPLAIN command.


SLIDE 19)  FLOWTRACE


A special compiler option is available if you wish to get some information
about how much time parts of your program are taking.  Note that this is
an EXPENSIVE operation.  Your program will be significantly slowed down by
this technique, so you should only do it occasionally!

The compiler option is known as the "FLOWTRACE" option.  It is invoked 
as follows:

  cf77 -F myprog.f

All that does is insert special code into your program, which keeps track
of which routine is currently executing, and for how long.  To conduct the
test, you must then take this special version of your program and run it:

  a.out

and then run the FLOWVIEW program to examine the report file that is created:

  flowview -LA | more

The report will include:

  A calling tree;
  How many times each routine was called;
  How long each routine was executing;
  The average duration of a call;
  Whether the routine is small and frequently used;

Information like this can help you locate inefficient portions of the code
which are causing the whole program to slow down.  In particular, Cray
has found that many programs will run faster if small routines or functions
are copied directly into the routines that call them.  This is known as
"inlining".  The compiler can actually try to do this for you automatically
at compile time, via the "-o inline" switch:

  cf77 -Wf"-o inline" myprog.f


SLIDE 20) Compiler Directives


Most of the time, the Cray compiler does its work with no help from you.
Once you become used to using the Cray compiler, you may be able to
spot cases where the compiler needs to be helped.  

For instance, the compiler is hesitant to vectorize some loops in which it
thinks that results of some early iterations may be needed before later
iterations can be computed:

      DO 10 I=1,N
        X(I+200)=X(I+200)-X(I)
10      CONTINUE

If I happen to know that this loop will always use a value of N that is no
greater than 200, then I can tell the compiler not to worry.  This is actually
done with a COMPILER DIRECTIVE, which looks like this:

CDIR$ IVDEP
      DO 10 I=1,N
        X(I+200)=X(I+200)-X(I)
10      CONTINUE

Here, "IVDEP" is an abbreviation for "Ignore Vector Dependencies".  You'll
see more of an explanation of when and why to use such directives in the talk 
on vectorization.

There are other directives used for vectorization, including:

  CDIR$ NEXTSCALAR   means "don't vectorize the next loop!"
  CDIR$ NOVECTOR     means "don't vectorize any more loops til I say so!"

Others are for parallel processing.  For instance:

  CMIC$ GETCPUS      specifies the number of CPU's to use;
  CMIC$ DOALL        states that the next loop should be executed in parallel.


SLIDE 21) Parallelization


Our Cray YMP has 8 processors.  They normally work independently, but a
user program can get 2 or more processors to cooperatively execute portions
of the program.  For this to occur, special directives must be placed in 
the user program, either by the user, or by the FPP preprocessor.

The FPP part of the Cray compiler can try to automatically select portions of 
your program suitable for parallel execution.  To do so, it may actually
change parts of your program, inserting compiler directives, creating
temporary variables, breaking DO loops into smaller portions, and so on.
You can have the FPP program carry out this procedure by using the command:

  cf77 -Zp myprog.f

This will create a program that CAN use parallel processing.  However, to
try to get more than one processor, you would also have to request multiple
CPU's before running the program.  In the C shell, type:

  setenv NCPUS 8
  a.out

or in the Bourne shell:

  NCPUS=8
  export NCPUS
  a.out

However, whether or not the Cray will actually give you 8 CPU's depends on
the system load.  The program will still execute, but the amount of parallel
execution will vary.  


SLIDE 22) Documentation


PSC provides some online documents, including:

  ARRAYS.DOC     discusses the use of FORTRAN 90 statements,
  FORTRAN.DOC    discusses some simple issues of FORTRAN usage,
  FORTRAN90.BIG  a copy of the FORTRAN 90 language standard.

There are also many examples available for FORTRAN.  Look in the AFS directory

  /usr/local/examples/ymp/fortran

The exact name of this directory is

 /afs/psc.edu/common/usr/local/examples/ymp/fortran

Cray Research provides a four manual set on the Cray compiler:

  CF77 Compiling System,

  Volume 1: FORTRAN Reference Manual, SR-3071
  Volume 2: Compiler Message Manual, SR-3072
  Volume 3: Vectorization Guide, SR-3073
  Volume 4: Parallel Processing Guide, SR-3074

These manuals may be ordered from 

  Cray Research, Incorporated
  Distribution Center
  2360 Pilot Knob Road
  Mendota Heights, Minnesota, 55120

  (612)-681-5907

On the Cray itself, there are MAN pages for the following topics:

  cf77, the overall compiling system

  fpp, the parallel and vector preprocessor,
  fmp, the mid-processor,
  cft77, the language compiler,
  segldr, the linker and loader.