The MATLAB programming system MATLAB combines the features of FORTRAN 90 and BASIC. Is the result a heroic centaur or a sterile mule? 1. Background 2. Beginning usage 3. FORTRAN 90 4. M files 5. Examples 6. Evaluation 7. Local Installation Details 1. Background The author of MATLAB told me that he first wrote the program (in FORTRAN!) as an easy interface to the EISPACK and LINPACK routines. Let's recall that EISPACK and LINPACK are two of the oldest, most venerable and most influential software packages written. EISPACK and LINPACK have been around for at least twenty years, worked their way into SCILIB, and only now are being replaced by the new LAPACK library. Why put up an interface between the programmer and the software? I didn't ask the author that question, but in my old job I wrote several programs that did the same thing, so I feel qualified to an opinion. The main reason I was asked to write such programs was to enable students in linear algebra classes to test out algorithms without having to write programs. Most students don't know FORTRAN, and the Math Department found itself having to teach FORTRAN on top of linear algebra. And then students would come to the instructor with programming errors and ask for help. So it was decided to write a program that accepted simple input statements from the user, that could define data structures, solve a problem using a particular algorithm, and display the results. The program was very successful, at least from the instructor's point of view. An interesting sidelight is that some of the instructors use the program for their own work. They claim they would never be able to do this kind of work themselves. Even though the program necessary to solve a simple matrix problem is easy to write, the effort isn't worth it to them. But when an interactive program is available, they enjoy setting up and solving problems. Just as supercomputing allows you to solve bigger problems you never would have attempted otherwise, an interactive program can allow you to solve simple problems that would normally be too tedious to set up. Before we go much further, let me here specify what's necessary to set up and solve just a three by three matrix system in FORTRAN, using LINPACK: DIMENSION A(3,3) DIMENSION IPIV(3) DIMENSION X(3) DATA A /1.0, 4.0, 7.0, 2.0, 5.0, 8.0, 3.0, 6.0, 9.0/ DATA X /1.0, 2.0, 3.0/ LDA=3 N=3 JOB=0 CALL SGEFA(A,LDA,N,IPIV,INFO,JOB) CALL SGESL(A,LDA,N,IPIV,X) WRITE(*,*)(X(I),I=1,N) STOP END $ FORTRAN/NOLIST MYPROG.FOR $ LINK/NOMAP MYPROG, PSC$LIB:LINPACKS/LIB $ RUN MYPROG Now this is bad enough, but let's go over some of the simple questions an intermediate user of FORTRAN might ask: What's that DATA statement doing? Why in that order? How come the DATA statement puts the entries in the wrong place when I change A to be of size (4,3)? Why do I have to set up IPIV if I never use it? How come I put the right hand side in X, but all of a sudden it's the answer instead? What's the difference between LDA and N? What's JOB doing? What does NOLIST do? NOMAP? How come on the Cray you don't have to specify the LINPACK library? If I change my source code, I have to recompile my program, and relink it. What if I relink it first and then recompile? Why don't I have to recompile the LINPACK library? Do I have to keep around this file MYPROG.EXE? How come my matrix A changed after I called this routine? It says my matrix A is overwritten by the L, U factors. Explain to me how these two matrices are stored in A. Unfortunately, there's another problem here. How come I get "floating overflow"? These questions have to do with the fact that FORTRAN is not a very good model of the way people think. It requires much too much structure, and has lots of rules that people find unnatural. For instance, the fact that you must declare the type and size of an object before using it, and that an object of one type can't be used when an object of another type is expected. But, for example, it's fairly easy to write down some "pseudocode" that defines the above problem, containing all the information, and in an natural way: n=3 a is of size n by n a(i,j)=j+(i-1)*n b(i)=i solve a*x=b for x A friendly language ought not to ask much more than this. In my program, the commands you had to use were: $ RUN MTH:LINGO MATRIX A 3,3 1,2,3 4,5,6 7,8,9 VECTOR B 3 1,2,3 VECTOR X 3 0,0,0 SOLVE A B X TYPE X STOP Well this isn't so friendly, but the point is it's a lot less structured. You can make a mistake and simply reenter the information. If you put the wrong values into B, you can easily change them. At any time, you can print out the values of any of the variables you like. The syntax is not quite as clumsy as it looks, since the program prompts you for everything. On the other hand, you soon get sick of having the program printing out menus for you when you know exactly what you want. Now, for comparison, here is the MATLAB computation: matlab a = [ 1 2 3; 4 5 6; 7 8 9]; b = [ 1 2 3]; x=a \ b quit It can't get simpler than that! I think, in this case, at least, the language of MATLAB is able to exactly fit the problem. You don't have to say any more than is necessary. You don't have to include DIMENSION or REAL statements. There just aren't many opportunities for typing mistakes. This simplicity is a tremendous advantage for MATLAB. There are larger, more complicated problems that MATLAB can solve in just as few statements. This means that MATLAB wins big in terms of friendliness and power. The question we must ask is, what is the cost? $10,000 on the Cray YMP, for one thing. But for another, is this a program that BELONGS on the YMP? We can open up a religious debate at this point. Historically, as supercomputers have gotten cheaper and more widespread, more users and more applications have jumped onto them. Many of these users bring their habits from minicomputers or other machines where MegaFLOP envy is not a problem. They expect the same programs to be available on the Cray that they had elsewhere, and it does not matter to them in the least whether these programs are efficient. So one question is: is it any of our damn business? If a user has a problem, and wants to solve it with a given program, when can we say no? (I propose that we say no at the time of the initial grant proposal, or not at all). Another question is: is MATLAB one of these inefficient programs? Assuming, that is, that we can estimate efficiency. To be fair, we probably should not insist that MATLAB solve all problems efficiently, but should require that it do so on big problems at least. 2. Beginning Usage The fundamental object in MATLAB is the array. The number of dimensions, and the extent of the array, are not explicitly declared by the user, but determined from the context of assignment statements. Moreover, all arrays are assumed to be complex, although MATLAB will notice that an object is essentially real, or essentially integer. Matlab has some use for scalars, particularly constants such as 2.0, and defined values such as "pi". MATLAB can determine the number of dimensions and the extent of any object. It may adjust its behavior depending on whether it is operating on a scalar, vector, or matrix. In many cases, MATLAB will treat a matrix as a set of column vectors. You can override this behavior in some cases, by specifying a range of index values, or by specifying that a matrix is temporarily to be treated as a vector. If A is a matrix, for example, you can treat it as a vector by writing A(:). MATLAB expects as input a series of statements which are either expressions, or statements assigning the value of an expression to a variable. MATLAB will normally print the value of the expression unless you request it not to do so with a trailing semicolon. The simplest assignment statement sets a scalar, a vector, or a matrix to a set of constant values: s = x v = [ v1 v2 v3] m = [ v1 v2 v3 ; v4 v5 v6; v7 v8 v9] In each case, no previous statement about the dimension of the object is required. MATLAB determines that from context, and keeps track of the information when needed later. You can build matrices from other matrices using the [ ] notation. Assuming x and y are column vectors, [x y] is the matrix whose first column is X and second column is Y. In this regard, here is a definition of the Hadamard matrix in MATLAB language: H0 = [1] H1 = [H0 H0 ; H0 -H0] H2 = [H1 H1 ; H1 -H1] ... Can you see what is going on here? [1] [1 1] [1 1 1 1] ... [1 -1] [1 -1 1 -1] [1 1 -1 -1] [1 -1 -1 1] More ambitious statements involve simple arithmetic operators. We can issue statements like: A = [1 2 3; 4 5 6; 7 8 9] B = A + 1 C = 2 * A * B Note that MATLAB correctly determines that A + 1 and 2 * A are matrix-scalar operations, but that A * B is standard matrix multiplication. MATLAB includes ways of quickly building certain standard matrices. To set A to the 5 by 5 identity matrix, the commands are n = 5 A = eye(n) Similarly, "zeros(n)", "ones(n)" and rand(n) specify N by N arrays of zeroes, ones, or random values. The diag(x) command creates a matrix whose entries are all zero, except that the diagonal entries are equal to X. However, if A is a matrix, diag(A) creates a vector containing the diagonal entries of A. What happens when you type diag(diag(A))? Interestingly you can also assign "zeros(m,n)". This means that MATLAB allows you to call a function with only a partial argument list, if you like, and it can tell how many arguments you supplied, and make up for the missing arguments by supplying defaults. This means you can write a single function that can be used by an idiot and a sage. The full set of operators available is +, -, *, ^, ' (complex conjugate transpose), \ (left division) and / (right division). A \ b means essentially INVERSE(A) * b, so to solve A*x=b for x, you write x = A \ b. A / b means essentially b * INVERSE(A), which allows you to solve x*A=b. In some cases, the obvious interpretation of a matrix operation may not be what is desired, or it may be necessary to state whether an operation is to be carried out in the matrix or scalar sense. MATLAB chooses the matrix interpreation by default, but you can specify you want to do an element by element operation by prefixing the operator with a period. Thus, using the same matrix A above, A * A = [ 30 36 42 ] A .* A = [ 1 4 9 ] [ 66 81 96 ] [ 16 25 36 ] [ 102 126 150 ] [ 49 64 81 ] MATLAB has a wide set of built in functions, with names familiar to FORTRAN programmers, including abs, sqrt, real, imag, fix, exp, log, sin, cos, tan, max, min All of the functions can take scalar, vector or matrix arguments. It's easy to see what this means for ABS; if B is a matrix, ABS(B) is a matrix of the same size, with each entry the absolute value of the entry in B. SQRT and EXP may raise a question in your mind. Isn't there a matrix square root and a matrix exponential? Indeed there is, but instead of resolving the problem with a period, MATLAB assumes that SQRT(B) means the matrix whose entries are the square roots of the entries in B, and provides a separate function name, SQRTM, for the matrix square root, as well as EXPM and LOGM. MAX and MIN also have features that may be unexpected. If V is a vector, we know what to expect from MAX(V); a scalar value that is the maximum in the vector. But what is MAX(A) where A is an array? The authors decided that the most useful definition was to define this to be a vector, consisting of the maximum entries in each column of A. We could always get the maximum value of the whole matrix either by typing max(max(a)) <-- max(a) is a vector of values, and max(max(a)) is a single value. or max(a(:)) <-- The A(:) states that A is to treated as a vector 3. FORTRAN 90 FORTRAN 90 and MATLAB were both developed to try to improve the handling of arrays. Both have come up with some features that are identical. In both, for example, you may say "A=B" or "A=2*B+1" where A and B are arrays. In FORTRAN 90, you may write array expression such as SQRT(A), F/G, and so on, but unfortunately, A*B is interpreted as element-wise multiplication! Now let us compare other features. First of all, let us address the question of subscripting. MATLAB provides a way of generating indices identical to the FORTRAN DO loop conventions. For example, the expression 1:5 is analogous to the DO loop limits in the expression "DO 10 I=1,5". MATLAB actually expands 1:5 to be the vector [1 2 3 4 5]. Tragically, the MATLAB implementors picked the wrong way to define an increment. Rather than LO:HI:INC, they chose LO:INC:HI! Thus, the expression "1:3:10" expands to [1 4 7 10], but the expression "1:10:3" expands to [1]. You can use such an expression to initialize a vector: x = 5:5:100 The values do not have to be integers. x = 0:pi:pi/20 y = sin(x) To access a single entry of an array, you simply use the usual notation of parentheses. Thus it is legal to say A(3,5) = A(5,3) + 17 But to access a submatrix, you use the colon notation above: A(1:5, 1:5) describes the 5 by 5 submatrix of A consisting of the first five rows and columns. Similarly, A(1:2:N, :) describes the matrix made up of the even rows of A and all the columns of A. (Just like in FORTRAN 90, the occurrence of a lone colon in an index position indicates that index extends over the full range). You can also index an array indirectly. X(V) is the vector [X(V(1)), X(V(2)), ..., X(V(N))], again, just like in FORTRAN 90. A FORTRAN 90 construct of some use is the ability to write conditional statements of the form WHERE (X.LT.1.0) X=X+1.0 ELSEWHERE X=X-1.0 ENDWHERE MATLAB provides the FIND statement, which returns the indices for which a given condition is true. The statements i=find(X < 1) j=find(X >= 1) x(i)=x(i)+1 x(j)=x(j)-1 do the same operation as the FORTRAN 90 WHERE did. Another MATLAB control construction is the IF statement. MATLAB allows you to compare vectors or matrices for equality or inequality. Note, however, that the condition must be true for all entries to be true. For instance: if A == B disp('Matrices are equal') end However, the statement if A ~= B disp('Matrices are not equal') end is NOT going to work if only a few entries of A and B are not equal. This is because the way MATLAB interprets a matrix conditional is that it must be the case for every entry. Thus two matrices are equal if every entry is equal, and not equal if EVERY entry is not equal. This is not quite what one would expect! MATLAB includes two logical operators, ANY and ALL, that allow you to control the behavior of IF statements. Both ANY and ALL are "vector" operators. That is, they prefer to operate on a vector, and return a scalar, but can operate on a matrix (as a set of column vectors) returning a vector. For example, if all( all (A < 0.0)) disp('Matrix was all negative! Reversing!') A = -A end if any ( any(A < 0.0)) disp('Matrix has at least one negative entry!') end if all ( any (A < 0.0)) disp('Every column has at least one negative entry!') end if any (all (A < 0.0)) disp('Some row is entirely negative!') end 4. M files Although MATLAB is thought of as an interactive system, you wouldn't get very far if you had to start it up from scratch every time you ran it. Although MATLAB can dump its variable memory into a file, and read that back in, you probably want to store repeated sequences of commands and invoke them by a brief name. In the simplest case, if there are just a sequence of commands with no dummy variables, you can just type those commands into a file whose extension is ".m", and then invoke the commands by naming the file. Thus, to compute machine epsilon, I could create a file called "epsilon.m" containing the lines tiny=1.0 while (tiny + 1 >= 1.0) tiny = tiny/2 end tiny=2*tiny fprintf('Machine epsilon is %g\n',tiny) and then, in my program, typing "epsilon" would cause these lines to execute, with exactly the same effect as if I had typed them in directly to MATLAB. More to the point, however, are operations which I want to "parameterize". As a simple example, I could want to define the area of a circle through a function statement, so that in matlab I could say x = area(r) To do this, I again create a file, with the name "area.m", containing lines like the following: function a = area(r) area = pi * r * r Most M files are more elaborate than this. In particular, the file can check the number of arguments input with the NARGIN function, and the dimensions of the input object with the SIZE function. It is possible to return one or more scalars, vectors or arrays as output. The variables in a function M file are local, in the same way that FORTRAN subroutine local variables are hidden from the calling program. It is in the area of M files that MATLAB makes itself most valuable. There are many people churning out such files, and it is possible to grab them from NETLIB. These files encapsulate procedures for geophysical data analysis, chemometrics, signal analysis, and other fields which the writers of MATLAB never touched. 5. Examples The "MATLAB" example shows how to set up a simple matrix and ask some questions about it. The "MATMUL" example shows an attempt to benchmark MATLAB by having it multiply two 256 by 256 matrices. The "POWER" example shows my attempt to write the power method for finding the dominant eigenvalue of a matrix. The "PLOT" example shows some sample plots. The "CHAOS" example shows an ambitious plot, prepared using some M files received over the network. 6. Evaluation There are a number of programs available for symbolic calculations. MACSYMA is unsupported, and written in LISP. REDUCE seems to be under active development, but is also written in LISP. We have versions of REDUCE on VMS and UNICOS. MAPLE is not available for the Cray, although we have a copy on VMS. MATHEMATICA wants a quarter of a million dollars for a Cray version, ha ha ha. And now I've read reviews saying THEORIST is better than MATHEMATICA. I have no experience with these programs. I have some vague impressions. Perhaps one of the biggest differences is that MATLAB is NOT a symbolic calculation package. If you wish to integrate X*SIN(X), MATLAB does not have a table of rules that includes an entry for the sine function. MATLAB solves the problem in the same way that a FORTRAN program would, by weighted summation of the values at various points. MATHEMATICA and the other packages look up rules for determining the integral from the given formula with no actual computation. Thus those programs are doing "artificial intelligence" for this problem, whereas MATLAB is carrying out a much more straightforward calculation. MATLAB is not really solving your problem in a new way, but rather is carrying out all the bookwork and I/O to get your problem through a fairly standard algorithm. Graphics is good...a big barrier for most people. Easy to improve performance by hooking up to system libraries. Thus, I conclude that MATLAB is appropriate for supercomputing, because it focusses on vectors rather than on symbolic formulas, has the potential for higher performance, (using Standard C and SCILIB where possible), is compatible in philosophy with the FORTRAN 90 array syntax, is sufficiently friendly to attract users, and powerful enough to keep them, hides exactly those parts of a calculation that are of no interest to the programmer, makes it easy for programmers of moderate skill to do things they wouldn't have tried on their own. 7. Local Installation Details MATLAB has been given to us for a two month trial period, 1 July to 31 August. MATLAB is only available on UNICOS. It is invoked simply by typing "matlab". There is online help available for MATLAB, on both VMS and UNICOS, by typing "pschelp matlab". Inside of MATLAB, you can type "help". There are several online documents: MATLAB.DOC contains local details and a list of all commands. MATLABPRIMER.TEX is a nice primer, with exercises and explanations. MATLABPRIMER.DOC is a plain text version of the primer. In the MATLAB examples directory (DOC$ROOT1:[EXAMPLES.MATLAB] on VMS, and /usr/local/examples/matlab on UNICOS) there is a sample NQS job containing the text of a few interactions with MATLAB. Obviously it is unnatural to try to use an interactive program in batch, but it does give you something to look over. Also in the MATLAB examples directory are a number of subdirectories containing various M files. These cover such topics as chemometrics, signal analysis, and fancy special matrices. These M files are also available online, and so MATLAB knows about them and can access the functions they define. There are a couple copies of the MATLAB User's Guide available for inspection. There is also the book "Handbook for Matrix Computations" by Coleman and Van Loan, which describes and compares LINPACK, EISPACK and MATLAB. There is a MATLAB directory in NETLIB, the electronic software distribution service. It contains subdirectories of M files. There is also a MATLAB users group, which has a news digest distributed electonically.