Cray Compilers Slide 1) Overview There are two primary compilers available for the Cray, FORTRAN and C. We will discuss FORTRAN first and more intensely. It is the more developed language for the Cray. The compiler turns your text into machine instructions. On the Cray, a single command, cf77, compiles your program, but can also do preprocessing, and the creation of an executable program. If you are porting a code from another computer, there are a number of differences to consider. The compiler will give you a variety of warnings and error messages. If your program compiles properly, but fails to run correctly, there are some things the compiler can do to help you find the mistake. There is a built in library of scientific software routines. It is also easy to use other precompiled libraries supplied by the center. The compiler does a great deal of vectorization automatically. You can ask for a report from the compiler, listing which loops vectorized. You may be able to make some simple program changes to improve vectorization. In some cases, you may want to use compiler directives, which are messages to the compiler inserted directly in the text of your program. It is also possible to run a program in parallel across several of the Cray's processors. You can request that the compiler search for portions of the code to parallelize automatically, and you can insert compiler directives to control the process. Slide 2) Typical uses of compiler In the simplest case, a user brings a C or FORTRAN file from a PSC front end, or CFS or AFS, or a remote machine, runs the CC or CF77 script to compile the program and create an executable called A.OUT, which can then be run. For instance: cc myfile.c a.out The compiler can be told to only compile, without proceeding to create the executable. This can be useful for debugging a small portion of a bigger program, or if a program is broken into two parts, only one of which needs to be recompiled each time: cf77 -c myfile.f In this case, the result is a file MYFILE.O, called an "object" file, which is NOT executable. If MYFILE.F contains the whole program, then we can create the executable with a second call to the compiler: cf77 myfile.o which creates the A.OUT executable. If MYFILE.F is not complete, but needs routines from, say, MYSUB.F, this could be done with the commands cf77 -c mysub.f cf77 myfile.o mysub.o or, in fact, cf77 myfile.o mysub.f The compiler "knows" that the MYSUB.F must be compiled, and then linked with MYFILE.O to create the executable file A.OUT. Slide 3) What the compiler does for you * Checks your program's syntax * Determines data storage * Notes calls to routines not supplied * Includes routines requested from system or local libraries * Optimizes your code * Searches for vectorization opportunities * Searches for parallel opportunities SLIDE 4) The compiling process The CF77 compiler is actually made up of a sequence of steps. When you compile a program, only the last two steps are carried out by default. That's usually all you need for a beginner. The steps include: fpp - automatic parallel and enhanced vector analysis. fmp - translation of parallel processing directives. cft77 - The actual language compiler (automatic vectorization). segldr - The loader, creates the executable program. Specifying the switch: -c carries out the CFT77 step. -Zv gets you enhanced vectorization, and carries out FPP, CFT77 and SEGLDR. -Zp gets you all four steps. Each of the steps is actually a full-fledged command, with its own set of switches. CF77 supplies various switches to these commands, but you can communicate directly with any of the commands using the " -W " switch, though this tends to get ugly. For instance, the command cf77 -Wf"-e m" myprog.f passes the switch "-e m" to the CFT77 command. We'll see more examples of this later. SLIDE 5) Portability You should expect an ANSI FORTRAN code to compile and run on the Cray. But it's a wise idea to see if you really are using ANSI FORTRAN! If you are using a VAX/VMS system, you might try the command: $ FORTRAN/STANDARD MYPROG.FOR On the Cray itself, you can use the command: cf77 -Wf"-e n" myprog.f One common source of trouble is the name of the random number generator. On the Cray, the proper way to handle random numbers is as follows: ISEED=1952 CALL RANSET(ISEED) <-- Initializes the sequence DO 10 I=1,N X=RANF() <-- Returns a random value .... 10 CONTINUE SLIDE 6) Data Representation Another feature of the Cray that may cause you conversion problems is the fact that the standard word size is 64 bits. This means that a real variable is roughly as accurate as a double precision variable on a 32 bit machine. In fact, one of the design features of the Cray was to provide higher accuracy automatically. Cray REAL numbers range from about 10**(-2465) to 10**2466, and the machine precision is roughly 0.7E-14. If your program already uses DOUBLE PRECISION, however, it is important that you convert it to REAL before running it on the Cray. Otherwise, it will execute extremely slowly. There is a Cray compiler switch that can help you. It essentially tells the compiler to interpret every double precision usage as a real usage. The command is: cf77 -Wf"-d p" myprog.f By the way, "REAL*8" is automatically interpreted as REAL on the Cray, although it is interpreted as DOUBLE PRECISION on most computers. Oddly enough, INTEGER variables are assigned a 64 bit word, but generally are only allowed to use 46 of those bits. You don't need to worry about this unless you are planning to compute extremely large integers, or to do bit manipulations using integer arithmetic. Unfortunately, the Cray does NOT store data using the IEEE format! It is not to hard, however, to have the Cray write data files in IEEE format. SLIDE 7) Language Features Cray FORTRAN allows: INCLUDE files; TAB characters; DO WHILE statements; Unnumbered DO/ENDDO statements; Variable names up to 32 characters long; MIL-STD bit routines: BTEST, IBSET, IBCLR, ISHFT, ISHFTC, MVBITS; Namelist input; Recursive subroutine calls. SLIDE 8) FORTRAN 90 Cray FORTRAN includes a limited set of FORTRAN 90 features, including: Assumed size arrays: REAL A(*) INTEGER B(10,*) Dynamic arrays: Work arrays created in subroutines. Arbitrary based indexing: REAL A(-10:30) Array assignments: A=B A=B+C A=B/C A=SQRT(B) IF(Y.EQ.Z)... Array selector: A=B(1:10) A=B(4:20:2) Arbitrary array selector: A=B(INDX) SLIDE 9) Stack/static Data Storage One of the most common sources of problems when compiling programs on the Cray occurs because of faulty data storage. Most compilers have used STATIC allocation in the past. This meant that every variable in a program had its own unique and permanent storage (a subroutine's dummy arguments don't count, of course!). In order to reduce memory usage, and to allow parallel processing, it became necessary to move to a STACK based memory system. In this system, variables in the main program have permanent storage. Variables unique to a subroutine are thrown away when the subroutine is left. If you are having very strange results, you might try asking the Cray to compile your program using STATIC storage everywhere: cf77 -Wf"-e v" myprog.f If, on the other hand, you know that a particular variable must be saved between calls, you can insert a SAVE statement in the subroutine. SLIDE 10) STATIC/STACK example Many programs, without realizing it, depend on a subroutine variable "retaining" its value between calls: FUNCTION AREA(R) DATA ICALL /1/ IF(ICALL.EQ.1)THEN ICALL=2 PI=ATAN(1.0)*4.0 ENDIF AREA=PI*R*R RETURN END The Cray will actually realize that ICALL is special, because of the DATA statement. It will set aside permanent storage for it. But PI will not be treated this way. The second time the program is called, PI will not have a predictable value. It might be 0, it might be anything. However, if you simply insert the line SAVE PI before the executable statements, all should be well again. SLIDE 11) Compilation Errors Compiler messages come in various levels, including: INFO: useful information VECTOR: vectorization information COMMENT or NOTE: inefficient or outmoded programming. CAUTION or WARNING: possible or probable errors. ERROR: a fatal error (program must be changed!) ANSI: points out non-ANSI programming By default, the compiler will generally only issue WARNING and ERROR messages. If a compiled listing is requested, VECTOR messages, and some other optimization messages will be printed as well. The compiler will print messages to the STDERR output device, which is your screen if you're interactive, or your ".e" file if you're running NQS, or your ".CPR" or ".LOG" file if you're submitting from another machine. A compiler message looks like this: 4 4. A(1,2)=A(1,2)+X ^ cft77-125 cft77: ERROR $MAIN, Line = 4, File = temp.f, Line = 4 The number of subscripts is greater than the number of declared dimensions. You can increase or decrease the level of messages using the "-e m" switch: cf77 -Wf"-m 4" myprog.f will cause only ERROR messages to be printed. SLIDE 12) An I/O Run Time Error Unfortunately, it can be difficult to figure out what went wrong when a running program "crashes". Some error messages are actually helpful. If you OPEN a file that isn't there, but you specify STATUS='OLD', then you'll get the message: sys-2 a.out: UNRECOVERABLE error on system request No such file or directory apparent state: unit 1 named fred.dat last format: lately writing direct unformatted internal Abort TB001 - BEGINNING OF TRACEBACK - $TRBK WAS CALLED BY f_sig AT 200122b (LINE NUMBER 102) - f_sig WAS CALLED BY __handlr AT 104165b - __handlr WAS CALLED BY $STKOFEN AT 72047a - $STKOFEN WAS CALLED BY killm AT 102245a - killm WAS CALLED BY raise AT 102322a (LINE NUMBER 15) - raise WAS CALLED BY abort AT 52763a (LINE NUMBER 46) - abort WAS CALLED BY $ferror AT 172743a (LINE NUMBER 46) - $ferror WAS CALLED BY f$open AT 201251a (LINE NUMBER 430) - f$open WAS CALLED BY $OPN AT 154070a - $OPN WAS CALLED BY COMPUT AT 345c (LINE NUMBER 4) - COMPUT WAS CALLED BY $MAIN AT 325a (LINE NUMBER 13) - $MAIN WAS CALLED BY $START$ AT 303a TB002 - END OF TRACEBACK Abort (core dumped) This is overkill! But here's what's going on: First, we have an error message, whose code is "sys-2". The error message was generated by "A.OUT", and the text of the message is "no such file". Secondly, we have some information about the file name, the last I/O operation carried out (none), and the type of I/O associated with the file. Finally, we have a traceback. The traceback is read "backwards", starting with TRBK (the routine responsible for producing the traceback), and backs up through a series of system routines culminating in $OPN which corresponds to the OPEN statement in the user routine COMPUT. From the traceback, OPEN was called at line 4 of COMPUT, and that's where things went wrong. SLIDE 13) Other Run Time Errors The most common run time error messages on the Cray are more mysterious: Floating Point Exception and Operand Range Error Usually, you will get a traceback, but occasionally even this will be defective or missing. We will see in other talks some things you can do to try to track down these errors. However, the compiler can sometimes help you. In particular, floating point exception, which occurs when you try to take the square root of a negative number, or divide 1 by 0, or some other illegal operation, can happen because you used an uninitialized variable, or because your program is using stack memory, but at least some variables in a subroutine are losing their values between calls. If this is the case, then the compile statement cf77 -Wf"-e v" myprog.f may help. You should still try to find the problem, since STATIC memory allocation increases your memory requirements, and means you will not be able to do any parallel processing. The "Operand Range Error" message can also be caused by stack memory, but it might be caused as well by an array going out of bounds, or by a subroutine called with the wrong number of arguments, or by a subroutine that attempts to alter a dummy argument that is actually a constant. Some compiler switches that MIGHT help include: cf77 -Wf"-R a" myprog.f Checks number and type of routine arguments. cf77 -Wf"-R b" myprog.f Attempts to catch out of bounds arrays cf77 -Wf"-i 64" myprog.f Use 64 bit integers. cf77 -Wf"-e i" myprog.f Catch use of uninitialized variables. cf77 -Wf"-e z" myprog.f Create symbol table for debugger use. SLIDE 14) SCILIB Cray Research provides an extensive library of scientific software, which is available to you by default. This includes the following four libraries: BLAS dot products, vector norms, matrix-vector products. LINPACK LU factors, determinants, linear system solution, QR factors. EISPACK Eigenvalues and eigenvectors. LAPACK Modern, high speed versions of LINPACK and EISPACK as well as FFT, filter, and recurrence routines. SCILIB was originally installed on the Cray for convenience. Some of the routines, especially the matrix products, the LAPACK dense linear system solver, and the FFT routines, have very high speed. But there is no guarantee that all the routines will be good performers. Most SCILIB routines are described in the online document SCILIB.BIG. Because they are "built in", you may call any of the routines without having to use a special switch in your compile statement. SLIDE 15) Using precompiled libraries PSC has collected a large library of software. Compiled copies of these libraries are available on the Cray. A user program that calls routines from one of these libraries must tell the compiler to include that library at linking time. Instructions for how to do this are usually given with the documentation for each library. For instance, if a program calls any routine from the BCSLIB library, the compile statement must look something like this: cf77 myprog.f -lbcslib The actual file containing the compiled BCSLIB library is stored in the /usr/local/lib area, and is called libbcslib.a. You might want to look in that directory and see what's available. Files like libbcslib.a, with an extension of ".a", are a special kind of library file. In this case, the file is actually a collection of "object" or ".o" files. The compiler will extract copies of the needed object files to build the executable. SLIDE 16) Duplicate Entry Points If you are using libraries, it's possible that your own program will contain a subroutine of the same name as one of the library routines. In that case, the compiler prints a warning like this: ldr-290 segldr: CAUTION Duplicate entry point 'CFILL' was encountered. Entry in module 'CFILL' from file 'test.o' has been used. Entry in module 'CFILL' from file '/usr/local/lib/libbcslib.a' has been ignored. Here, my program included a routine CFILL, that I wrote. I also wanted to use some routines from BCSLIB, which, it turns out, also has a routine called CFILL. The compiler (actually, SEGLDR) warns me that there are two CFILL's to chose from, and that it's going to take the one in my program. In cases like this, if you can, it's best to rename your routine (say, to DFILL), to avoid any chance that the name conflict will cause you problems at run time. SLIDE 17) Vectorization In a later talk, you will hear what vectorization is, how it speeds up your program's execution, and how you can improve the amount of vectorization that occurs in your program. For now, we will simply discuss a few ideas about vectorization related to the compiler. The first thing to note is that vectorization is a process that allows SOME inner DO loops in a FORTRAN program to execute much faster than they normally would. On the Cray, this speedup may be a factor of 10, 20 or 30, depending on the loop. Only some loops can vectorize. The compiler, all by itself, will try to determine whether each loop of your program can vectorize. It will err on the side of caution. You can find out how the compiler is handling your program by getting a compiled listing of the program. This can be done with the command: cf77 -Wf"-e s" myprog.f The listing file will be stored in a MYPROG.L, which you can print or type out or save for later. It will contain a copy of the program, with comments about which loops vectorized or did not, and why. SLIDE 18) Sample LOOPMARK output By using a different compiler option, you can get a more visually striking version of the compiler listing. The command cf77 -Wf"-e m" myprog.f produces what is called a "LOOPMARK" listing, which might look something like this: 486 1. subroutine s222(a,b,c,n) 487 2. c 488 3. c 489 4. c loop distribution 490 5. c partial loop vectorization 491 6. c 492 7. integer n 493 8. real a(*),b(*),c(*) 494 9. Sr---------------------< do 240 i = 2,n 495 10. Sr a(i) = a(i) + b(i) 496 11. Sr b(i) = b(i-1)*b(i-1)*a(i) 497 12. Sr a(i) = a(i) - b(i) 498 13. Sr---------------------> 240 continue 499 14. return 500 15. end V E C T O R I Z A T I O N I N F O R M A T I O N ------------------------------------------------- cft77-8044 cf77: VECTOR S222, Line = 9, File = testvector.f, Line = 494 Loop starting at line 9 was not vectorized. It contains a recurrence on "B" at line 11. O P T I M I Z A T I O N I N F O R M A T I O N ----------------------------------------------- cft77-8135 cf77: SCALAR S222, Line = 9, File = testvector.f, Line = 494 Loop starting at line 9 was unrolled 16 times. Note that each message has a standard message number format: cft77-8044, and cft77-8135. This is similar to the error and warning messages we've seen earlier. All of these message numbers may be used to obtain a more extensive explanation of the message from the EXPLAIN command. SLIDE 19) FLOWTRACE A special compiler option is available if you wish to get some information about how much time parts of your program are taking. Note that this is an EXPENSIVE operation. Your program will be significantly slowed down by this technique, so you should only do it occasionally! The compiler option is known as the "FLOWTRACE" option. It is invoked as follows: cf77 -F myprog.f All that does is insert special code into your program, which keeps track of which routine is currently executing, and for how long. To conduct the test, you must then take this special version of your program and run it: a.out and then run the FLOWVIEW program to examine the report file that is created: flowview -LA | more The report will include: A calling tree; How many times each routine was called; How long each routine was executing; The average duration of a call; Whether the routine is small and frequently used; Information like this can help you locate inefficient portions of the code which are causing the whole program to slow down. In particular, Cray has found that many programs will run faster if small routines or functions are copied directly into the routines that call them. This is known as "inlining". The compiler can actually try to do this for you automatically at compile time, via the "-o inline" switch: cf77 -Wf"-o inline" myprog.f SLIDE 20) Compiler Directives Most of the time, the Cray compiler does its work with no help from you. Once you become used to using the Cray compiler, you may be able to spot cases where the compiler needs to be helped. For instance, the compiler is hesitant to vectorize some loops in which it thinks that results of some early iterations may be needed before later iterations can be computed: DO 10 I=1,N X(I+200)=X(I+200)-X(I) 10 CONTINUE If I happen to know that this loop will always use a value of N that is no greater than 200, then I can tell the compiler not to worry. This is actually done with a COMPILER DIRECTIVE, which looks like this: CDIR$ IVDEP DO 10 I=1,N X(I+200)=X(I+200)-X(I) 10 CONTINUE Here, "IVDEP" is an abbreviation for "Ignore Vector Dependencies". You'll see more of an explanation of when and why to use such directives in the talk on vectorization. There are other directives used for vectorization, including: CDIR$ NEXTSCALAR means "don't vectorize the next loop!" CDIR$ NOVECTOR means "don't vectorize any more loops til I say so!" Others are for parallel processing. For instance: CMIC$ GETCPUS specifies the number of CPU's to use; CMIC$ DOALL states that the next loop should be executed in parallel. SLIDE 21) Parallelization Our Cray YMP has 8 processors. They normally work independently, but a user program can get 2 or more processors to cooperatively execute portions of the program. For this to occur, special directives must be placed in the user program, either by the user, or by the FPP preprocessor. The FPP part of the Cray compiler can try to automatically select portions of your program suitable for parallel execution. To do so, it may actually change parts of your program, inserting compiler directives, creating temporary variables, breaking DO loops into smaller portions, and so on. You can have the FPP program carry out this procedure by using the command: cf77 -Zp myprog.f This will create a program that CAN use parallel processing. However, to try to get more than one processor, you would also have to request multiple CPU's before running the program. In the C shell, type: setenv NCPUS 8 a.out or in the Bourne shell: NCPUS=8 export NCPUS a.out However, whether or not the Cray will actually give you 8 CPU's depends on the system load. The program will still execute, but the amount of parallel execution will vary. SLIDE 22) Documentation PSC provides some online documents, including: ARRAYS.DOC discusses the use of FORTRAN 90 statements, FORTRAN.DOC discusses some simple issues of FORTRAN usage, FORTRAN90.BIG a copy of the FORTRAN 90 language standard. There are also many examples available for FORTRAN. Look in the AFS directory /usr/local/examples/ymp/fortran The exact name of this directory is /afs/psc.edu/common/usr/local/examples/ymp/fortran Cray Research provides a four manual set on the Cray compiler: CF77 Compiling System, Volume 1: FORTRAN Reference Manual, SR-3071 Volume 2: Compiler Message Manual, SR-3072 Volume 3: Vectorization Guide, SR-3073 Volume 4: Parallel Processing Guide, SR-3074 These manuals may be ordered from Cray Research, Incorporated Distribution Center 2360 Pilot Knob Road Mendota Heights, Minnesota, 55120 (612)-681-5907 On the Cray itself, there are MAN pages for the following topics: cf77, the overall compiling system fpp, the parallel and vector preprocessor, fmp, the mid-processor, cft77, the language compiler, segldr, the linker and loader.