.. _sac3:

*********************************
Seismic Analysis Code 3
*********************************

In this lab on SAC, we will talk about Fourier transforms, plotting spectra, and filtering data, all useful techniques when processing time signal data.

=============================
Discrete Fourier Transforms
=============================

When analyzing time signals in geophysics (and practically all other disiplines), it is often convenient to look at functions in the frequency domain. To transform a signal :math:`{h(t)}` from the time to frequency domian, we take a Fourier Transform :math:`{H(f)}`, defined as

.. math::
    H(f) = \int_{-\infty}^{+\infty}h(t)\exp(-i 2\pi f t)dt.

If we have :math:`{H(f)}`, we can invert it to compute :math:`{h(t)}` using

.. math::
    h(t) = \int_{-\infty}^{+\infty}H(f)\exp(i 2\pi f t)df.

On a computer, we only have discrete samples from the time series, so we need to define a discrete version of the transform. Noting that the discrete version of an integral is a sum, if we have a vector :math:`{h_n}` of length :math:`{N}`, representing the values of the function at the time points in question, then the DFT is

.. math::
    H_k = \sum_{n=1}^N h_n \exp\left(-i2\pi k\frac{n}{N}\right)

where :math:`{k=1,\ldots,N}` represents the various frequencies. Each entry in :math:`{H_k}` is a complex number, so it has both a real and an imaginary part (or alternatively an amplitude and a phase). However, this would suggest that we start with :math:`{N}` numbers and get out :math:`{2N}` numbers (since the transform is :math:`{N}` complex numbers), but if the input data are real, then the transform coefficients :math:`{H_k}` are symmetric about :math:`{N/2}` (known as the Nyquist point, corresponding to the Nyquist frequency or half the sampling frequency of the data, which is the highest frequency signal that can be captured by a discretely sampled time signal) and the imaginary parts are anti-symmetric about :math:`{N/2}`. Thus, there are really only :math:`{N}` meaningful numbers in the discrete transform if we started with real data.

The inverse of the FFT to transform :math:`{H_k}` back to :math:`{h_n}` is

.. math::
    h_n = \frac{1}{N}\sum_{k=1}^N H_k \exp\left(i2\pi k\frac{n}{N}\right)

Some definitions omit the leading factor of :math:`{1/N}`, so if you use a software package to do a transform and its inverse and don't get back the same value, you are probably missing this leading factor. Note that the FFT definitions require that the data be uniformly spaced in time, and the algorithm does not depend on what that spacing is. Therefore, to translate the coefficients :math:`{F_k}` into specific frequencies, you may need to figure that out on your own using the time spacing of your data (many, including SAC, do this for you).

Looking at the definition of the discrete Fourier transform, it looks like we need to do two nested loops over :math:`{N}` terms each in order to calculate this. This means that the number of calculations that you need to do will grow like :math:`{N^2}`, which can get rather large for long time series (this prevented Fourier analysis from being very useful in scientific applications). However, some smart people in the 60's figured out that a naive implementation can be improved upon to instead grow like :math:`{N\log N}`, which is much more efficient. This made discrete Fourier transforms much easier to calculate, and implementations of this algorithm are usually referred to as **Fast Fourier Transforms**, or FFT for short. FFT algorithms usually require that the signal have a length that is a power of 2 for computational efficiency, so if the signal in question is not already an appropriate length, it will be padded with extra zeros to make it have a length that is a power of 2.

Because this algorithm is so widely used, most software packages come with FFTs already installed (this includes MATLAB and SAC, and it is available as an add-on Python package). We will cover FFTs and spectra in SAC in this lab, but you should be aware that many of these same tools are available in MATLAB and Python as well.

======================
Spectral Analysis
======================

SAC has many functions for doing analysis of spectra. To take the Fourier transform of the signal in memory, use the ``FFT`` command. ``fft`` takes two options. First, you can specify ``wmean`` or ``womean`` to include or remove the constant DC offset from the transform, respectively. Second, you can obtain two versions of the transform, either ``rlim`` to obtain real and imaginary parts of the coefficients, or ``amph`` to obtain amplitude and phase information on the coefficients. The default values are ``wmean`` and ``amph``.

An inverse FFT can be taken with ``IFFT``. ``ifft`` requires that the data in memory is in the frequency domain, either by being the result of a previous ``fft`` or read from a file containing spectral data. SAC automatically recognizes whether the FFT is stored in real/imaginary or amplitude/phase format, so you do not need to specify anything about the format of the data.

When you take an FFT, the header information on time (beginning, time increment, and number of points) is automatically changed to reflect the beginning frequency, the sampling frequency, and number of points in the transform. An inverse FFT restores the header information to the time domain.

To plot the spectrum once you take an FFT, use the ``PLOTSP`` command (shortened to ``psp``. ``plotsp`` has two optional arguments. First, you can specify the type of spectral plot: ``asis`` plots whatever form the data is already in (determined by the command issued when you took the FFT), ``rlim`` plots the real part of the FFT followed by the imaginary part, ``amph`` plots the amplitude, then the phase, and ``rl``, ``im``, ``am``, ``ph`` plots only the real, imaginary, amplitude, or phase, respectively. The default value is ``asis``. Second, you can specify the plot style to be linear (``linlin``), semi-log on the horizontal (``loglin``), semi-log on the vertical (``linlog``), or log-log (``loglog``). The default is ``loglog``. For example, you can plot the amplitude spectrum of a boxcar function over one second on a log-log scale up to the Nyquist frequency of 500 Hz as follows: ::

    funcgen boxcar npts 1000 delta 0.001
    fft amph    
    plotsp am loglog

.. image:: sac3_spectrum.png

**Exercise:** Make a spectral plot for the triangle function over the same times and frequencies.

Another common spectral plot is a spectrogram, which plots the spectral content of a window of particular length as a function of time. The SAC command to do this is ``SPECTROGRAM``. There are many options for ``spectrogram``. The basic options include ``window``, which determines the length of the window (in seconds) used to calculate the spectrum (and therefore determines the range of frequencies in the spectrogram), and ``slice`` which determines the spacing (in seconds) between successive windows. Other options include specifying the details of how the spectrum is estimated, and plot appearance attributes. See ``help spectrogram`` or the SAC manual for more details on these options.

An example spectrogram can be seen using the following commands: ::

    funcgen seismogram
    spectrogram

which will display the following plot:

.. image:: sac3_spectrogram.png

One peculiar thing about spectrogram is that since it calculates the spectrogram, places it in memory (overwriting the original signal in the process), and plots it in on single command, you cannot execute it more than once without reloading the data (otherwise it tries to calculate the spectrogram of a spectrogram). Therefore, if you want to change the spectrogram options, you will need to reload the data into memory before executing the command again.

**Exercise:** Create a spectrogram for a seismogram that you have downloaded from the IRIS website. Note that in many cases, the default parameters might not give a very interesting plot -- if necessary, change the ``window``, ``slice``, and ``ymax`` parameters to make a more useful plot. (Hint: plot the spectrum of the signal first to determine the relevant frequency band, and then choose ``window`` to be large enough to resolve those frequencies in the FFT -- remember that the minimum nonzero frequency resolved in a DFT is the sampling frequency divided by the number of data points. Set ``ymax`` to make the vertical range of the plot reasonable for the frequencies in the data. You will also want to increase ``slice`` so that the time windows are more reasonably spaced, though it is not necessary.)

===============
Filtering
===============

Many signal processing situations call for filtering data in different frequency bands. SAC has a number of built-in filters, which you can apply using the ``BANDPASS``, ``HIGHPASS``, and ``LOWPASS`` commands (shortened to ``bp``, ``hp``, and ``lp``). SAC has four built-in filters, which can be selected with the following options: Butterworth (``butter``), Bessel (``bessel``), and Chebychev type 1 and type 2 filters (``c1`` and ``c2``, respectively). The details of all of these filters is beyond the scope of this class, but they have the following basic properties:

* **Butterworth:** Designed to have a uniform gain within the passband. The group/phase delay is moderate. The transition from passband to 
  stopband is reasonably sharp. In most cases, this should be the first filter you try for filtering a signal. See the following figure for an 
  example of the gain as a function of frequency.

* **Bessel:** Designed to have a maximally flat group/phase delay, meant to preserve the shape of signals in the passband. However, the 
  consequences of this is that the amplitudes of the filtered signal are not particularly accurate, and the fall-off from the passband to the 
  stopband is the most gradual of all of the filters.

* **Chebychev Type 1:** Designed to have a sharp transition from passband to stopband. The tradeoff is that there are "ripples" in the passband 
  (see below), and poor group/phase response.

* **Chebychev Type 2:** Same as the Chebychev Type 1, but with the ripples in the stopband.

For additional details on these filters, Wikipedia has lots of good information. For more details on their implementation in SAC, see the manual.

.. figure:: linear_filters.*

    Gain as a function of frequency for several linear filters. The Butterworth is flat in the passband, but has a slower 
    transition from one to the other than the Chebyshev filters. The Bessel filter has the most gradual fall off in the gain, but the
    group delay is flat in the passband (not shown), helping to maintain the shape of signals. The Chebychev filters are more 
    complicated to implement, and have ripples in either the passband (type 1) or the stopband (type 2).

The Butterworth and Bessel filters are the simplest to implement. The arguments for the ``bandpass`` command in SAC involve the number of poles, which describe how the filter transitions from the passband to the stopband, and the frequency range (low to high) to keep in the signal. For example, to implement a butterworth filter with 2 poles from 2 to 10 Hz, the SAC command would be ``bandpass butter npoles 2 corner 2 10``. ``highpass`` and ``lowpass`` work in a similar fashion, but only take one input frequency. We can see how a bandpass filter affects a sample seismogram as follows. First, generate the seismogram, plot it, and look at its spectrum: ::

    funcgen seismogram
    qdp off
    plot
    fft amph
    beginwindow
    plotsp am

The ``beginwindow`` command simply opens a new figure window, so I can look at these side by side. Now filter the data and plot again: ::

    funcgen seismogram
    bandpass butter npoles 2 corner 2 10
    beginwindow
    plot
    fft amph
    beginwindow
    plotsp am

You should see that the lower and higher frequencies are reduced relative to the original seismogram.

There are a few other things that you should know about the ``bandpass`` arguments. First, the minimum corner frequency cannot be zero (you will cause SAC to crash if you do this), and you should use ``lowpass`` to handle this situation. Also, while ``npoles`` can take any value from 1-10, in practice people do not usually use ``npoles`` greater than 3 or 4. While you will get a faster transition from passband to stopband using a higher value for ``npoles``, this comes at the cost of a significant effect on the group/phase delay in the passband.

To take the complement of a bandpass filter, which removes the frequencies between the two frequencies specified using ``corner`` use the ``BANDREJ`` command. ``bandrej`` takes all the same options as ``bandpass``, so if you know how to use ``bandpass``, ``bandrej`` is straightforward.

**Exercise:** Try out the ``bandpass``, ``highpass``, and ``lowpass`` commands with different values on different signals. The highpass and lowpass functions take only a single frequency, but the other filter details are the same. Also try using the Bessel filter, which takes the same arguments as the Butterworth filter.