MATLAB 5¶
This lab covers odds and ends regarding MATLAB.
Floating Point Numbers¶
We have been dealing with floating point numbers in both Python and MATLAB. These are computer approximations to real numbers, and most of the time they behave just like real numbers. However, there are some cases when treating floating point numbers exactly like real numbers can trip you up. This section talks a bit about the details of floating point numbers, and potential pitfalls that you may encounter if you treat them exactly like real numbers.
To represent a floating point number, we use a specified number of bits to represent the number. These bits are divided between the sign (1 bit), a fixed number of decimal places, and a fixed number of bits for the exponent, akin to scientific notation. Because this is on a computer, floating point numbers use base 2 instead of base 10. Thus we can think of a floating point number \({f}\) as follows:
To maximize precision, floating point numbers are normalized so that there are no leading zeros (by recalibrating the exponent), represented here by the \({2^m}\) here. Since in base-2 the leading digit of a normalized number will always be a 1
, we can drop this implicit bit to improve precision.
A “single precision” or 32-bit floating point number will be represented by the following: 1 sign bit, 8 exponent bits (wtih an implicit offset of 127 to represent negative exponents), and 24 mantissa bits (including the implicit bit). For example, \({\pi}\) can be written as a single precision number as follows. First, write \({\pi}\) to 32 digits in base-2:
\({\pi}\) =11.001001000011111101101010100010
First, we normalize by dividing by \({2^1}\), and keep 24 digits in the mantissa:
\({\pi}\) =1.00100100001111110110101
\({\times 2^1}\).
The exponent is expressed as 1 + 127 = 128, or in binary 1000 0000
. The sign bit is zero, and we drop the implicit bit, leaving us with the following (using SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM notation):
\({\pi}\) =010000000001001000011111101101
.
Single precision numbers give you about 7 significant figures represented in base-10, and an exponent range from -38 to 38. That is fine for many applications, but serious numerical work usually requires more, which is why double precision (64-bit) is standard in MATLAB. Double precision numbers give you about 16 significant figures, with exponents ranging from -308 to 308. One can also use quad precision (128-bit) floating point numbers, which have an additional number of bits dedicated to both the exponent and mantissa. The properties of floating point numbers in various precision are summarized in following table.
Type | Sign | Exponent | Mantissa | Total bits | Exponent bias | Bits precision | |
---|---|---|---|---|---|---|---|
Single | 1 | 8 (\({\pm38}\)) | 23 | 32 | 127 | \({24_2}\) | \({7_{10}}\) |
Double | 1 | 11 (\({\pm308}\)) | 52 | 64 | 1023 | \({53_2}\) | \({16_{10}}\) |
Quad | 1 | 15 (\({\pm4965}\)) | 112 | 128 | 16383 | \({113_2}\) | \({34_{10}}\) |
Problems with Finite Precision¶
The fact that numbers are represented by a finite number of bits has a few important implications for arithmetic and other things that you need to do with numbers on a computer:
Adding two numbers of very different magnitudes will often not yield the result you intended if the smaller number does not register within the precision of the larger number. To see this in MATLAB (which uses double precision, so we need to add something smaller by at least \({10^{-16}}\)), enter the following:
a = 1.; b = a + 1.e-17; if a == b disp('a and b are the same'); end
You can mitigate this problem by increasing the precision of both numbers, as after doing so the smaller number will then register within the significant figures of the larger numbers.
Subtracting numbers that are nearly the same will give a number that appears to be more precise than it actually is. In MATLAB, try the following:
a = 1. + 1.e-15; b = a - 1.; disp(b);
This will display mostly meaningless digits, because these digits are smaller than the smallest digit represented in
1. + 1.e-15
. Increasing the precision does not alleviate this problem, as increasing the precision adds on meaningless digits to both numbers.Testing for equality between floating point numbers often not give you the expected results because of rounding errors. When comparing floating point numbers
a
andb
, it is best to compareabs(a-b) < epsilon
, whereepsilon
is a small number set by the precision of the numbers. For an example of this, try the following in MATLAB:a = 0; delta = 0.01; for i=1:100 a = a + delta; end if a == 1 disp('a is one!'); end
This will not print anything out, because there is round-off error adding in up 0.01 (which cannot be exactly represented with a finite number of digits using double precision, despite its simple base-10 representation) 100 times. You should always use some small tolerance when comparing floating point numbers!
These are things to be aware of when doing math on computers. However, it is important to note that while round-off error is always something to worry about, it is not unpredictable. That is, if we do the same thing multiple times, we get the same result. For instance, when adding up a bunch of small numbers we may not get exactly the sum we expect, but if we subtract those same numbers off of the result, we will get the same number we started with:
a = 1.;
delta = 0.01;
for i=1:100
a = a + delta;
end
for i=1:100
a = a - delta;
end
if a == 1.
disp('a is one!');
end
This will in fact print out something. There may be round-off errors owing to the fact that 0.01 does not have an exact representation as a double precision floating point number, but adding and subtracting the double precision representation of 0.01 does leave us with the same number we started with.
Saving Data to a File¶
We spent some time in the first MATLAB class covering how to get data into MATLAB. However, you are also likely going to need to get data out of MATLAB, too. Here are your options when it comes to writing data to file
Let’s say you are using MATLAB and have the results of a particularly long calculation that you would like to write to disk for future use. There are a few ways that you can accomplish this, depending on the intended use of the data.
MATLAB native .mat files: MATLAB’s default method of saving data is in binary .mat files through the
save
command. You can save more than one variable to the same file – the default use ofsave
is to save all workspace variables to the same file (done by invokingsave(<filename>)
. To save only one variable, you can callsave(<filename>,'<variable>')
. MATLAB can also save .mat files as ASCII files, with a number of options for formatting the output. Use thehelp
command to see all of the different possibilities. Note that this is saved in a MATLAB-specific format, so you need to use MATLAB to read these files (useload(<filename>)
).ASCII formatted files: There are two options for non-MATLAB-specific saving human-readable files.
fprintf
is the equivalent offscanf
and can be used to write formatted data to a text file.fprintf
takes 3 arguments: a file id, formatting instructions, and the data to be written. For example, if I have a vectorA
that is 10 elements in length that I would like to write to file with each entry on its own line, I can use the following:fid = fopen('outfile.dat','w'); for n = 1:10 fprintf(fid,'%f\n',A(n)); end fclose(fid);
You probably recognize the
fopen
command from before; here it is important to include the'w'
so that MATLAB will open the file in write mode.fprintf
includes the file id, the formatting (in this particular case, we print as a floating point number, followed by a carriage return (\n
), and the data to print. If you want to put a tab after printing a character, use\t
instead of\n
, or you can add spaces as well. You can specify the number of digits to write by using'%m.nf'
for the formatting string, wherem
is the minimum number of characters to print andn
specifies the number of figures to include after the decimal point. Thus,'%5.3f'
will print a minimum of 5 total characters (it will be padded with blank spaces at the left if need be), three of which are after the decimal point.An alternative is to use
dlmwrite
, which writes character delimited formatted text to a file.dlmread
is another option for reading data into MATLAB that I did not mention the first time around – it is used to read data that has a common character that separates every data point. A comma is the default delimiter (as CSV files are a fairly common format), but an alternative delimiter can be specified when invoking the command. An example use isdlmwrite(<filename>,A);
. See the documentation to learn more about the specific options.Binary files: To write to a file in binary format, use
fwrite
, which is the writing equivalent offread
. To write the matrix A to disk as a binary file as a double precision float with a little endian byte ordering, usefid = fopen('outfile.dat','wb'); fwrite(fid,A,'double','l'); % you can specify other formats and byte-ordering values if necessary fclose(fid);
The only thing to note here is the use of the
'wb'
tag to specify that the file should be opened in binary write mode. Everything else should be familiar from the use offread
.
If speed is a concern, writing binary files (using either save
or fwrite
, depending on the intended use of the data) is much faster than writing formatted files, and will take up less disk space. If you will be writing lots of data at full precision and the files do not need to be human readable, consider using one of the binary options.
Additional MATLAB Data Types¶
Data Structures¶
Sometimes in geophysics we have more complex data than can be simply represented by vectors or matrices. For instance, a seismogram often contains three components, each of which has the same length. There are multiple ways we might handle this in a MATLAB code:
- Three separate vectors, one for each component
- One matrix, with three columns (one for each component)
Either method will work, but they both have their drawbacks. If each component is its own vector, then we can use descriptive names to describe each one (i.e. station1_N
, station1_E
, station1_Z
), which makes our code more understandable. However, this does not highlight the fact that all of these vectors represent different aspects of the same measurement, since each variable is separate. Further, if we want to write a function to process the components, we need to pass all three vectors to the function as separate arguments. Using a matrix solves our problems, but now it is less clear what each component is and our code is harder to understand.
We can get the best of both worlds by using a data structure, which is a single variable that holds multiple data objects, each of which has their own name. One way we might create a data structure for a seismogram is as follows:
station1.N = [ <array> ];
station1.E = [ <array> ];
station1.Z = [ <array> ];
You would obviously create the appropriate vectors for assigning each component (by reading the data from a file, for instance). This gives you a single data object station1
that makes it clear that this is a single measurement and can be passed to functions in a simple fashion. You can also add additional data to the structure, like the time (a vector), and even other attributes that might not even be vectors, like station latitutde/longitude, sampling frequency, and the instrument type:
station1.t = [ <array> ];
station1.latitude = 36.;
station1.longitude = -90.;
station1.samplerate = 100.;
station1.type = 'broadband';
This is a handy trick for writing code that is clear and easy to understand, as you can wrap all relevant data into a single structure. If you want to initiate an empty structure, use struct()
, to which you can then add entries as above. To initialize a structure with multiple entries at once, use struct('<fieldname>',<fieldvalue>, ... )
. Because of how MATLAB stores the field names internally, you need to explicitly use a string for the field name when initializing in this manner.
Cell Arrays¶
MATLAB arrays work well for numeric data, and structures are handy for combining multiple data types into a single object. But what if you have data that is a collection of a large number of strings? A structure is not really useful for this, because we would need to store every string as its own unique entry, which obscures the structure of the data. This is also cumbersome for accessing the data, since you would need to use a string to access each entry, rather than a convenient numeric index.
We might think that we can simply use an array of strings to represent the data in a convenient way, using the same syntax for defining an array of numbers:
stringarray = ['abc' 'def'];
disp(stringarray(1));
However, this will simply print out a
, which is not what we really wanted. This is because a string is already represented within MATLAB as an array of characters. When we defined stringarray
, we were really just combining two arrays of characters, much in the way MATLAB lets you concatenate vectors or matrices using vectorcat = [vector1 vector2]
.
To handle arrays of strings, MATLAB provides the cell array datatype. Cell arrays use curly braces {}
for defining and indexing the arrays, rather than the normal square brackets (for defining) and parentheses (for indexing) arrays. Cell arrays can hold any data type in each entry, including strings, booleans, numbers, and even entire matrices, so you are not restricted to numeric values like in a matrix.
Here is an example cell array holding the data from stringarray
above:
stringarray = {'abc' 'def'};
disp(stringarray{1});
This will give us the desired result. As mentioned above, we can put any type of data in each entry of a cell array, for example
mixedarray = {'abc' 123 [1 2]};
disp(mixedarray{1}); % displays 'abc'
disp(mixedarray{2}); % displays 123
disp(mixedarray{3}); % displays 1 2
disp(mixedarray{3}(1)); % displays 1
This allows us to combine different datatypes into a single array that can be indexed like a matrix.
Exercises¶
- Practice saving data in various formats from your workspace. Check that you were successful by reading the data back into MATLAB using the equivalent read commands from our previous work with MATLAB.
- Create a data structure representing an earthquake catalog (you can use the file “newmadrid2015.txt” file from the second class on MATLAB as input data). Include separate fields in the data structure for the date, latitude, longitude, and magnitude.
- I have provided 1000 binary .mat files zipped together in the file “lab11files.tar.gz”, on the course website. Use a cell array
to read all of them into MATLAB. Hint: Use the shell to write all the file names into a text file, then read that file into MATLAB using
fscanf
to define your cell array, and then loop over the cell array and use theload
command to read each one into MATLAB.