Basic time series analysis
In this chapter, we learn some basic time series analysis techniques and how to read real world data from a single file and from a batch of files.
Generating a test time series
Before we start to analyse time series, we first have a look on how they are generated. We now construct a very simple time series from scratch with Python.
Create a script including the following commands:
# Create some dummy data:
import numpy as np
import matplotlib.pyplot as plt
# imports most relevant Matplotlib commands
# create x-values:
dx = 0.1
x_values = np.arange(0, 2*np.pi, dx)
# define parameters:
f1 = 2 # Frequency 1 in Hz
f2 = 10 # Frequency 2 in Hz
A1 = 6 # Amplitude 1
A2 = 2 # Amplitude 2
A_sin = A1 * np.sin(f1 * x_values)
A_cos = A2 * np.cos(f2 * x_values)
A_signal = A_sin + A_cos
# plots:
fig=plt.figure(1)
plt.plot(x_values, A_sin)
plt.plot(x_values, A_cos)
plt.plot(x_values, A_signal, lw=5, c="y")
plt.show()
Now, we add some noise to the signal:
# add some noise:
np.random.seed(1)
A_Noise = 2
Noise = np.random.randn(len(x_values)) * A_Noise
A_signal_noisy = A_signal + Noise
fig=plt.figure(1)
plt.plot(x_values, A_signal, lw=5, c="y")
plt.plot(x_values, A_signal_noisy, lw=2, c="r")
plt.show()
Exercise 1
Basing the solution of Exercise 2 from the Data Visualization with Matplotlib chapter in the Python Basics course, copy and modify the appropriate plot commands, in order to plot
A_signal
(add the following properties to the plot command:label="Signal", lw=2, c="pink"
)A_signal_noisy
(add the following properties to the plot command:label="Noisy signal", lw=3, c="lime"
)
in a single plot window. Don’t forget to annotate your plot with appropriate x- and y-labels, a title, and a legend. Also, save your plot as a PDF.
# Your solution 1 here:
Toggle solution
The figure above includes
- our signal, which is a super-position of actually two prime signals (a sine and a cosine wave with different frequencies and amplitudes)
- our signal with some random (“background”) noise.
Signal filtering
We will now try to filter our signal in order to
- recover the first and the second prime signal, respectively.
- apply the same filtering operations to the noisy signal.
Let’s first aplly a so-called butter-worth filter to recover the (lower frequency) sine signal (i.e., we perform low-pass filtering). For this, we use the butter
function from the scipy-filter
package:
from scipy import signal # put this on top of
# your script
# set filter parameters:
fs = x_values.shape[0] # Sampling frequency
fc = 5 # Cut-off frequency of the filter
w = fc / (fs / 2) # Normalize the frequency
filter_order = 10
# apply the filter function:
b, a = signal.butter(filter_order, w, 'low')
# filtering our signal
A_signal_clean = signal.filtfilt(b, a, A_signal)
plt.plot(x_values, A_signal, label='Signal', c="k")
plt.plot(x_values, A_signal_clean, label='filtered',
c = "lime", lw=4)
plt.plot(x_values, A_sin, label='Sine', c="blue")
plt.plot(x_values, A_cos, label='Cosine', c="red")
plt.legend(loc="best")
With the defined filter settings we are able to recover the sine-signal from A_signal
.
In order to create a corresponding high-pass filter to also recover the high-frequency cosine signal, we would have to repeat all filter relevant commands from above. That’s actually not the way how we should do. Instead, it would be convenient if we would have our own filter-function, so that we don’t have to repeat all these complex steps.
Exercise 2: Construct a filter function
- Recap the Function definition chapter from our Python Basics course.
- Write a function that performs the low-pass filtering from above. Define the function in this way that the following arguments must be passed as input parameters:
sampling_freq
for the sampling frequency,cutoff_freq
for the cut-off frequency of the filter,filter_order
for the filter order, andinput_signal
for the input signal, that has to be processedkind
for the type of the filter to be applied (eitherlow
orhigh
)
- Apply your newly created function to
A_signal
, once for low-pass filtering the signal, and once for high-pass filtering it, and add corresponding plotting commands to plot- the unimpaired signal
A_signal
, - the high-pass filtered signal (e.g.,
A_clean1
), - the low-pass filtered signal (e.g.,
A_clean2
), and - a cosine and a sine wave for comparison.
- the unimpaired signal
- Apply your filter to the noisy signal
A_signal_noisy
, again both for low- and high-pass filtering the signal. In a new plot window, plot,- the noisy signal
A_signal_noisy
, - the high-pass filtered signal (e.g.,
A_clean1_noisy
), - the low-pass filtered signal (e.g.,
A_clean2_noisy
), and - a cosine and a sine wave for comparison. What do you notice?
- the noisy signal
# Your solution 2.2 here:
Toggle solution
# Your solution 2.3 here:
Toggle solution
# Your solution 2.4 here:
Toggle solution
Advice: Matplotlib commands can grow big very quickly. It’s therefore often usefull to plug these commands into a function definition in order to keep your script readable and especially when you need to repeat your plot command(s) several times within your script.
Exercise 3: Reading real world data
- Download the data folder “Data/Pandas_3” from the GitHub repository ꜛ (if not already done).
- Write a script, the reads the file “mouse_1_odorA_3_results.xlsx” into a Pandas DataFrame.
-
Iterate over the keys of the DataFrame (here called
df
) viafor key in df.keys(): plt.plot(time, df[key])
and plot each column of the read Excel file data into one figure. Set an appropriate titel and x- and y-labels. Use the sample frequency
sampling_rate = 30.1
(unit: [1/s]) to construct the correspondingtime
array:time = np.arange(df.shape[0]) / sampling_rate
- Visit the documentation website ꜛ of the
scipy.signal.medfilt
and read how to apply a 1D median filter to a given data array. Apply this filter to each column of the data and plot the filtered results into a new figure.
# Your solution 3.2-3.3 here:
Toggle solution
# Your solution 3.4 here:
Toggle solution
df.shape
(2000, 38)
# Alternative solution:
Toggle alternative solution
Example for exporting a Pandas DataFrame into an Excel file:
# Example for exporting our processed data into a new Excel file:
df_medianfiltered_out_df = pd.DataFrame(data=df_medianfiltered)
df_medianfiltered_out_df.to_excel("data_medianfiltered.xlsx")
"""Note, for initial call of '.to_excel' you need to install
the package/module 'xlwt' """
df_medianfiltered_out_df
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1979.681 | 4476.209 | 2814.321 | 2416.224 | 1648.606 | 1449.719 | 1744.611 | 2119.442 | 1305.099 | 3221.120 | ... | 3221.120 | 2468.984 | 1750.352 | 1649.530 | 2833.220 | 1282.657 | 1231.136 | 3042.830 | 2108.877 | 1966.602 |
1 | 2007.620 | 4476.209 | 2838.231 | 2425.859 | 1648.606 | 1606.934 | 1886.981 | 2251.333 | 1305.119 | 3336.423 | ... | 3336.423 | 2477.631 | 1854.227 | 1785.687 | 3217.644 | 1435.793 | 1374.950 | 3142.866 | 2266.636 | 1966.602 |
2 | 2079.598 | 4476.209 | 2866.485 | 2520.453 | 1648.606 | 1606.934 | 1938.510 | 2251.333 | 1340.275 | 3628.556 | ... | 3628.556 | 2508.199 | 1875.356 | 1785.687 | 3231.953 | 1457.536 | 1374.950 | 3142.866 | 2266.829 | 2029.088 |
3 | 2079.598 | 4476.209 | 2946.522 | 2522.406 | 1737.614 | 1618.653 | 2003.899 | 2321.433 | 1357.278 | 3760.244 | ... | 3760.244 | 2516.503 | 1875.356 | 1807.257 | 3269.945 | 1500.843 | 1425.907 | 3142.866 | 2283.250 | 2029.088 |
4 | 2079.598 | 4589.748 | 2948.629 | 2625.568 | 1737.614 | 1618.653 | 2086.211 | 2424.808 | 1419.689 | 4031.759 | ... | 4031.759 | 2575.812 | 1875.356 | 1852.093 | 3296.585 | 1500.843 | 1500.500 | 3142.866 | 2334.456 | 2029.088 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1995 | 1659.978 | 1988.962 | 2204.836 | 1856.755 | 1084.831 | 1065.796 | 1785.423 | 1737.096 | 982.970 | 1787.830 | ... | 1787.830 | 2002.620 | 1782.898 | 1908.837 | 2551.720 | 1140.593 | 1198.836 | 1228.652 | 1450.763 | 1504.574 |
1996 | 1623.542 | 1988.962 | 2187.248 | 1768.047 | 1008.979 | 1058.442 | 1749.308 | 1737.096 | 982.970 | 1787.830 | ... | 1787.830 | 2002.620 | 1760.417 | 1908.837 | 2551.720 | 1022.057 | 1198.836 | 1207.696 | 1450.763 | 1504.574 |
1997 | 1585.314 | 1981.024 | 2127.415 | 1768.047 | 993.758 | 1018.179 | 1749.308 | 1737.096 | 970.974 | 1697.901 | ... | 1697.901 | 1934.899 | 1760.417 | 1908.837 | 2551.720 | 1018.071 | 1198.836 | 1193.161 | 1421.070 | 1357.963 |
1998 | 1581.983 | 1874.429 | 2110.344 | 1757.333 | 993.106 | 1018.179 | 1660.389 | 1737.096 | 957.219 | 1690.157 | ... | 1690.157 | 1934.899 | 1760.417 | 1908.837 | 2546.767 | 1010.943 | 1084.129 | 1185.286 | 1421.070 | 1323.653 |
1999 | 1333.852 | 1786.643 | 2025.873 | 1757.333 | 936.424 | 1002.555 | 1660.389 | 1737.096 | 874.662 | 1658.114 | ... | 1658.114 | 1934.899 | 1733.292 | 1836.423 | 2472.458 | 1010.943 | 962.400 | 1105.598 | 1385.715 | 1323.653 |
2000 rows × 38 columns
Grand average
# for the sake of curiosity, let's plot the Grand Average:
fig = plt.figure(2, figsize=(10, 4))
plt.clf()
for column, key in enumerate(df.keys()):
plt.plot(time, df_medianfiltered[:, column])
plt.plot(time, df_medianfiltered.mean(axis=1), 'k', lw=5)
plt.title("median filtered data (alternative 2)")
plt.xlabel("time [s]")
plt.ylabel("a.u.")
plt.show()
plt.savefig(file_1 + "median filtered (alt 2).pdf")
Another alternative solution would look like as follows, but at the moment I do not understand, why the medians look so different compared to the previous calculations?!
# Another alternative solution to 3.4:
Toggle another alternative solution
Exercise 4: Apply your filter function to real world data (Exercise 3 continued)
- Apply the
my_filter
function, that we have written above, to the data read in Exercise 3 and plot, again, the results into a new figure. - Save each of your three figures (those from Exercise 3 including) into a separate PDF file.
# Your solution 4 here:
Toggle solution
Exercise 5: Batch file processing
Now, and instead of reading each Excel manually, use the following command to scan for all Excel files in the downloaded folder:
file_name = [f for f in os.listdir(file_path) if f.endswith('.xlsx')]
Copy your solutions from above into a new script. Change the new script in that way, that it iterates over all scanned file names in file_name
, starting from the Pandas reading part. Play a bit around, to see, what is contained in file_name
.
Hint: The automatized reading of the filenames now requires an iteration of your pipeline within a for-loop.
# Your solution 5 here:
['ROI_table.xlsx', 'ROI_table_1.xlsx', 'ROI_table_5.xlsx', 'ROI_table_4.xlsx', 'ROI_table_3.xlsx', 'ROI_table_2.xlsx']
Note that we show the result of only one Excel file here.