import numpy as np
import matplotlib.pyplot as plt
#Set the amount of isotope remaining (i.e. 1=100%)
= 1.0
theta_0
#Create a regularly spaced vector of time values
#20 units between 0 and 1.
=20
steps=1.0
timelength= np.linspace(0,timelength,steps)
time_values
#Try several different values for the half-life, k
for const_k in [1.0, 3.1, 10.0, 31, 100.0]:
#This is the solution to the decay equation, how much isotope remains.
= theta_0 * np.exp(-const_k * time_values)
exact_theta_values #Plot the results for different k values
plt.plot(time_values, exact_theta_values)
#Add some plot flair
'1', '3.1', '10', '31', '1000'),title=r'$k$')
plt.legend(('time')
plt.xlabel(r'$\theta$')
plt.ylabel( plt.show()
Numerical analysis
Questions
- What computational/numercal techniques can be performed in Python?
- What tools are availble for analysis?
- How can I learn more information about my data?
Objectives
- Generate a numerical model.
- Interpolate missing data.
- Run the scikit-learn k-means algorithm.
- Plot 3D data.
Numerical models
We start with the numerical solution of a very simple differential equation. In fact we choose something simple enough that we already know the answer.
\(\frac{d\theta}{dt} = - k \theta\)
This is the equation which governs radioactive decay, in which case \(\theta\) is the amount of the radioactive isotope remaining and \(d\theta / dt\) is the activity that we can measure. \(k\) is closely related to the half life.
The solution to this equation is\(\theta(t) = \theta_0 e^{-kt}\)
where \(\theta_0\) is the amount of the radioactive material remaining. The same equation also describes the cooling of, say, a cup of coffee. In this case we interpret \(\theta\) as the excess temperature (above room temperature).
We want to be able to march forward in time from our starting point (just like the picture above) where \(\theta = \theta_0\) to obtain the value of \(\theta\) at later times. To do this, we need to approximate the original differential equation, and, in particular, the value of the time derivative at each time. There are a number of ways to do this.
First order numerical approximation
Assume that the variation in \(\theta(t)\) is linear, i.e.\(\theta(t') = \theta_n + \beta t'\)
where we use a local time coordinate \(t' = t - n\Delta t\), so that when we differentiate
\(\frac{d \theta}{dt} = \beta\)
To determine the approximation for the derivative therefore becomes the solution to the following equation:
\(\theta_{n+1} = \theta_n + \beta \Delta t\)
\(\Rightarrow \beta = \frac{d \theta}{dt} = \frac{\theta_{n+1} - \theta_n}{\Delta t}\)
This is a first order difference expression for the derivative which we substitute into the original differential equation for radioactive decay at the current timestep
\(\frac{\theta_{n+1} - \theta_n}{\Delta t} = - k \theta_n\)
This rearranges to give us a time-marching algorithm:
\(\theta_{n+1} = \theta_n (1-k \Delta t)\)
In a moment we will compute some values for this expression to see how accurate it is.
#Set the known constant values
= 1.0
theta_0 = 10.0
const_k #How many timesteps to solve
= 20
steps = 1.0
timelength = timelength / steps
delta_t
#Create a regularly spaced vector of time values
= np.linspace(0,timelength,steps)
time_values
#Create an empty array to store the solutions
= np.zeros(steps)
theta_values
#Set the starting values
0] = theta_0
theta_values[
#Step through the time values
for i in range(1, steps):
#Find the value for theta at this time step
= theta_values[i-1] * (1 - const_k * delta_t)
theta_values[i]
#Compare with the exact solution
= theta_0 * np.exp(-const_k * time_values)
exact_theta_values
#Plot and compare your results
=3.0, color="red")
plt.plot(time_values, theta_values, linewidth'b-') plt.plot(time_values, exact_theta_values,
Second Order Runge-Kutta
The Runge-Kutta method can be a more accurate approach to higher order integration solutions. The idea is to estimate the gradient \(d / d t\) at the half way point between two timestep values. This is done in two stages. Initially a first order estimate, \( \) is made for the value of the function \( \) at \(t=t+t /2\) in the future. This value is then subsituted into the differential equation to obtain the estimate for the gradient at this time. The revised gradient is then used to update the original \((t)\) by an entire timestep.
The first order step is \[ \begin{split} \hat{\theta}(t+\Delta t /2) & = \theta(t) + \left. \frac{d \theta}{d t} \right|_t \frac{\Delta t}{2} \\ &= \theta(t) \left[ 1-\frac{k\Delta t}{2} \right] \end{split} \]
Substitute to estimate the gradient at the mid-point \[ \left. \frac{d \theta}{d t} \right|_{t+\Delta t /2} \approx -k \theta(t) \left[ 1-\frac{k\Delta t}{2} \right] \]
Use this value as the average gradient over the interval \(t\rightarrow t+\Delta t\) to update \(\)
\(\theta(t+\Delta t) \approx \theta(t) + \delta t \left( -k \theta(t) \left[ 1-\frac{k\Delta t}{2} \right] \right)\)
\(\theta(t+\Delta t) \approx \theta(t) \left( 1 - k \Delta t + k^2 \frac{\Delta t^2}{2} \right)\)
Challenge
- Can you implement the Second Order Runge-Kutta Numerical Solution in Python? Start as before:
#Set the known constant values
= 1.0
theta_0 = 10.0
const_k #How many timesteps to solve
= 20
steps = 1.0
timelength = timelength / steps
delta_t
#Create a regularly spaced vector of time values
= np.linspace(0,timelength,steps)
time_values
#Create an empty array to store the solutions
= np.zeros(steps)
theta_values
#Set the starting values
0] = theta_0
theta_values[
#Step through the time values
for i in range(1, steps):
#Find the value for theta at this time step
= ????
theta_values[????]
#Compare with the exact solution
= theta_0 * np.exp(-const_k * time_values)
exact_theta_values
#Plot and compare your results
=3.0, color="red")
plt.plot(time_values, theta_values, linewidth'b-') plt.plot(time_values, exact_theta_values,
Solution
This is my solution
= 1.0
theta_0 = 10.0
const_k = 20
steps = 1.0
timelength = timelength / steps
delta_t
= np.linspace(0,timelength,steps)
time_values
= np.zeros(steps)
theta_values
0] = theta_0
theta_values[
for i in range(1, steps):
= theta_values[i-1] * (1 - const_k * delta_t + const_k**2 * delta_t**2 / 2.0)
theta_values[i]
= theta_0 * np.exp(-const_k * time_values)
exact_theta_values
=3.0, color="red")
plt.plot(time_values, theta_values, linewidth'b-') plt.plot(time_values, exact_theta_values,
scipy.interpolate
This module provides general interpolation capability for data in 1, 2, and higher dimensions. This list of features is from the documentation:
A class representing an interpolant (interp1d) in 1-D, offering several interpolation methods.
Convenience function griddata offering a simple interface to interpolation in N dimensions (N = 1, 2, 3, 4, …). Object-oriented interface for the underlying routines is also available.
Functions for 1- and 2-dimensional (smoothed) cubic-spline interpolation, based on the FORTRAN library FITPACK. There are both procedural and object-oriented interfaces for the FITPACK library.
Interpolation using Radial Basis Functions.
import matplotlib.pyplot as plt
import numpy as np
1D data
from scipy.interpolate import interp1d
= np.linspace(0, 10, num=11, endpoint=True)
x = np.cos(-x**2/9.0)
y = interp1d(x, y, kind='linear') # default if kind=None
f = interp1d(x, y, kind='cubic')
f2 = interp1d(x, y, kind='nearest') f3
= np.linspace(0, 10, num=41, endpoint=True)
xnew 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--', xnew, f3(xnew), '.-')
plt.plot(x, y, 'data', 'linear', 'cubic', 'nearest'], loc='best')
plt.legend([ plt.show()
nD data
There are fewer approaches to n-dimensional data, the evaluation for arbitrary dimensions is always for points on an n dimensional grid.
from scipy.interpolate import griddata
def func(x, y):
return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
# A regular grid array of x,y coordinates
= np.mgrid[0:1:100j, 0:1:200j] # see np.info(np.mgrid) for an explanation of the 200j !! grid_x, grid_y
np.info(np.mgrid)
`nd_grid` instance which returns a dense multi-dimensional "meshgrid".
An instance of `numpy.lib.index_tricks.nd_grid` which returns an dense
(or fleshed out) mesh-grid when indexed, so that each returned argument
has the same shape. The dimensions and number of the output arrays are
equal to the number of indexing dimensions. If the step length is not a
complex number, then the stop is not inclusive.
However, if the step length is a **complex number** (e.g. 5j), then
the integer part of its magnitude is interpreted as specifying the
number of points to create between the start and stop values, where
the stop value **is inclusive**.
Returns
-------
mesh-grid `ndarrays` all of the same dimensions
See Also
--------
lib.index_tricks.nd_grid : class of `ogrid` and `mgrid` objects
ogrid : like mgrid but returns open (not fleshed out) mesh grids
r_ : array concatenator
Examples
--------
>>> np.mgrid[0:5, 0:5]
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
>>> np.mgrid[-1:1:5j]
array([-1. , -0.5, 0. , 0.5, 1. ])
# A random sampling within the same area
= np.random.rand(1000, 2)
points = func(points[:,0], points[:,1])
values
# Resample from the values at these points onto the regular mesh
= griddata(points, values, (grid_x, grid_y), method='nearest')
grid_z0 = griddata(points, values, (grid_x, grid_y), method='linear')
grid_z1 = griddata(points, values, (grid_x, grid_y), method='cubic') grid_z2
221)
plt.subplot(=(0,1,0,1), origin='lower', cmap='jet')
plt.imshow(func(grid_x, grid_y).T, extent0], points[:,1], 'k.', ms=1)
plt.plot(points[:,'Original')
plt.title(222)
plt.subplot(=(0,1,0,1), origin='lower', cmap='jet')
plt.imshow(grid_z0.T, extent'Nearest')
plt.title(223)
plt.subplot(=(0,1,0,1), origin='lower', cmap='jet')
plt.imshow(grid_z1.T, extent'Linear')
plt.title(224)
plt.subplot(=(0,1,0,1), origin='lower', cmap='jet')
plt.imshow(grid_z2.T, extent'Cubic')
plt.title(6, 6)
plt.gcf().set_size_inches( plt.show()
Splines
Which have the added benefit of giving smooth derivative information
from scipy.interpolate import splrep, splev
= np.arange(0, 2*np.pi+np.pi/4, 2*np.pi/8)
x = np.sin(x)
y = splrep(x, y, s=0)
tck = np.arange(0, 2*np.pi, np.pi/50)
xnew = splev(xnew, tck, der=0)
ynew = splev(xnew, tck, der=1) yder
plt.figure()'x', xnew, ynew, xnew, np.sin(xnew), x, y, 'b')
plt.plot(x, y, 'Linear', 'Cubic Spline', 'True'])
plt.legend([-0.05, 6.33, -1.05, 1.05])
plt.axis(['Cubic-spline interpolation')
plt.title( plt.show()
plt.figure()'--')
plt.plot(xnew, yder, xnew, np.cos(xnew),'Cubic Spline', 'True'])
plt.legend([-0.05, 6.33, -1.05, 1.05])
plt.axis(['Derivative estimation from spline')
plt.title( plt.show()
2D splines are also available
from scipy.interpolate import bisplrep, bisplev
# Gridded function (at low resolution ... doesn't need to be gridded data here)
= np.mgrid[-1:1:20j, -1:1:20j]
x, y = (x+y) * np.exp(-6.0*(x*x+y*y))
z
plt.figure()='jet',shading='auto')
plt.pcolor(x, y, z, cmap
plt.colorbar()"Sparsely sampled function.")
plt.title( plt.show()
= np.mgrid[-1:1:70j, -1:1:70j]
xnew, ynew
## Create the spline-representation object tck
= bisplrep(x, y, z, s=0)
tck = bisplev(xnew[:,0], ynew[0,:], tck) znew
plt.figure()='jet',shading='auto')
plt.pcolor(xnew, ynew, znew, cmap
plt.colorbar()"Interpolated function.")
plt.title( plt.show()
See also
- Radial basis function interpolation for scattered data in n dimensions (slow for large numbers of points):
from scipy.interpolate import Rbf
scipy.ndimage
for fast interpolation operations on image-like arrays- B-splines on regular arrays are found in the
scipy.signal
module
Clustering data with scikit-learn
Here we want to explore a neat and efficient way of exploring a (seisimic tomography) dataset in Python. We will be using a Machine Learning algorithm known as K-Means clustering.
Data is from: Li, C., van der Hilst, R. D., Engdahl, E. R., and Burdick, S. (2008), A new global model for P wave speed variations in Earth’s mantle, Geochem. Geophys. Geosyst., 9, Q05018, doi:10.1029/2007GC001806
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
Load in the tomography data set. Assign the column vectors to unique variables (for clarity).
=np.loadtxt('../data/ggge1202-sup-0002-ds01.txt', skiprows=1)
tomo
=tomo[:,0]
lat=tomo[:,1]
lon=tomo[:,2]
depth=tomo[:,3] dvp
Now run the clustering algorithm
= KMeans(n_clusters=5, random_state=0).fit(dvp.reshape(-1, 1))
kmeans
#When completed, check the clusters the algorithm has identified.
print(kmeans.cluster_centers_)
[[-0.2770305 ]
[ 3.92536372]
[ 0.25228711]
[-0.03758588]
[ 1.82168895]]
Note, many functions have been “parallelised” and tuned to best take advantage of your computer, see e.g. for more details https://scikit-learn.org/stable/modules/computing.html#parallelism
You can have a look what the labels look like. It is essentially a vector the same length as the data indicating which label it has classified.
kmeans.labels_
array([4, 4, 4, ..., 3, 3, 3], dtype=int32)
Choose one of the clusters to visualise, so subset the data into new vectors accordingly
=3
centre
=lat[kmeans.labels_==centre]
latClust=lon[kmeans.labels_==centre]
lonClust=depth[kmeans.labels_==centre]
depthClust=dvp[kmeans.labels_==centre] dvpClust
Finally, plot the results!
= plt.figure()
fig = fig.add_subplot(111, projection='3d')
ax -depthClust, c=dvpClust); ax.scatter(lonClust, latClust,
Key points
- You can use Python to solve math equations
- scipy for interpolation
- sklearn for clustering
- New ways to plot data
- Read the docs to learn more
All materials copyright Sydney Informatics Hub, University of Sydney