#Load the required modules
import shapefile
#NOTE: Weirdly and confusingly, this package is called "pyshp" but you call it via the name "shapefile"
Useful Python packages for different data types
Questions
- What are libraries and packages?
- How can I load tabular data into Python?
- How can I load shapefiles?
- How can I load segy and las data?
Objectives
- Learn how to deal with specialty data types.
- Learn about pandas, pyshp, lasio, obspy.
Python can deal with basically any type of data you throw at it. The open source python community has developed many packages that make things easy. Today we will look at pyshp
(for dealing with shapefiles), pandas
(great for tables and time series), lasio
(for las format well log data) and obspy
(a highly featured seismic data processing suite) packages.
Data for this exercised was downloaded from http://www.bom.gov.au/water/groundwater/explorer/map.shtml
Shapefiles
Shapefiles are a very common file format for GIS data, the standard for which is developed and maintained by ESRI, the makers of the ArcGIS software. Shapefiles collect vectors of features, such as points, lines, polygons. The “file” is actually a misnomer - if you look at a single “shapefile” on your machine using a file explorer, you can see that it’s actually made up of several files, three of which are mandatory, and others which may/may not be there.
#help(shapefile)
#Or check out the help pages https://github.com/GeospatialPython/pyshp
#Set the filename
='../data/shp_torrens_river/NGIS_BoreLine.shp'
boreshape
#read in the file
= shapefile.Reader(boreshape)
shapeRead
#And save out some of the shape file attributes
= shapeRead.records()
recs = shapeRead.shapes()
shapes = shapeRead.fields
fields = len(shapes) Nshp
print(Nshp) #print the Number of items in the shapefile
7635
#print the fields fields
[('DeletionFlag', 'C', 1, 0),
['HydroID', 'N', 10, 0],
['HydroCode', 'C', 30, 0],
['BoreID', 'N', 10, 0],
['TopElev', 'F', 19, 11],
['BottomElev', 'F', 19, 11],
['HGUID', 'N', 10, 0],
['HGUNumber', 'N', 10, 0],
['NafHGUNumb', 'N', 10, 0],
['SHAPE_Leng', 'F', 19, 11]]
3] #print the first record, then this is a list that can be subscripted further recs[
Record #3: [32002002, '652800645', 30027773, -147.26, -154.26, 31000045, 1044, 125005, 0.0]
1].points #print the point values of the first shape shapes[
[(591975.5150000006, -3816141.8817), (591975.5150000006, -3816141.8817)]
shapeRead.shapeTypeName
'POLYLINEZ'
= shapeRead.record(0)
rec'TopElev'] rec[
6.74
Challenge.
- Look at the data above. It provides the coordinates of the wells as points.
- How many coordinates are provided for each well? Why do you think this is?
- What is the Bottom Elevation of the 300th record?
Solution
There are two coordinates. But they are duplicated.
= shapeRead.record(299)
rec'BottomElev']
rec[
#or
299][4] recs[
#Here is a slightly neater way to read in the data, but it looks confusing at first.
#But we will need it in this form for our next exercise.
#This type of assignment is known as "list comprehension"
#fields = [x[0] for x in shapeRead.fields][1:]
#Break this down line by line
#for x in shapeRead.fields:
# print(x)
#Now just print the 1st (0th) column of each list variable
#for x in shapeRead.fields:
# print(x[0])
#But we want to save these values in a list (not just print them out).
#fields=[]
#for x in shapeRead.fields:
# fields.append(x[0])
#And we don't want the DeletionFlag field, so we need to just get all the values except the first
#fields=fields[1:]
#fields = [x[0] for x in shapeRead.fields][1:]
#Do a list comprehnsion for the the other variable too
#shps = [s.points for s in shapeRead.shapes()]
Shapefiles are not a native python format, but the community have developed tools for exploring them. The package we have used “pyshp” imported with the name “shapefile” (for some non-consistent weird reason), is one example of working with shapefiles. Alternatives exist.
Dataframes and table manipulation
Pandas is one of the most useful packages (along with probably numpy and matplotlib). We will use it several times throughout the course for data handling and manipulation.
#As before, read in the shapefile
='../data/shp_torrens_river/NGIS_BoreLine.shp'
boreshape
#Read the shapefile attributes to variables
= shapefile.Reader(boreshape)
shapeRead = [x[0] for x in shapeRead.fields][1:]
fields = [s.points for s in shapeRead.shapes()]
shps = shapeRead.records() recs
import pandas
#Now convert the variables to a pandas dataframe
= pandas.DataFrame(columns=fields, data=recs)
df
#See more details at the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
df
HydroID | HydroCode | BoreID | TopElev | BottomElev | HGUID | HGUNumber | NafHGUNumb | SHAPE_Leng | |
---|---|---|---|---|---|---|---|---|---|
0 | 32001999 | 652800645 | 30027773 | 6.74 | -74.26 | 31000043 | 1042 | 104005 | 0.0 |
1 | 32002000 | 652800645 | 30027773 | -74.26 | -125.26 | 31000109 | 1108 | 110002 | 0.0 |
2 | 32002001 | 652800645 | 30027773 | -125.26 | -147.26 | 31000045 | 1044 | 125005 | 0.0 |
3 | 32002002 | 652800645 | 30027773 | -147.26 | -154.26 | 31000045 | 1044 | 125005 | 0.0 |
4 | 32002003 | 652800645 | 30027773 | -154.26 | -168.26 | 31000045 | 1044 | 125005 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7630 | 32145557 | 662810075 | 30057044 | 102.62 | 90.89 | 31000139 | 1138 | 100001 | 0.0 |
7631 | 32145558 | 662810075 | 30057044 | 103.08 | 102.62 | 31000139 | 1138 | 100001 | 0.0 |
7632 | 32145559 | 662813065 | 30060034 | 535.08 | 451.08 | 31000026 | 1025 | 134001 | 0.0 |
7633 | 32145560 | 662813065 | 30060034 | 451.08 | 171.08 | 31000014 | 1013 | 134001 | 0.0 |
7634 | 32145561 | 662814687 | 30061656 | 444.30 | 432.30 | 31000014 | 1013 | 134001 | 0.0 |
7635 rows × 9 columns
#Add a new column called "coords" to the DataFrame, fill it with what is in "shps"
'coords'] = shps
df[
#Alternatively, you can use the "assign" method
#df = df.assign(coords=shps)
df
HydroID | HydroCode | BoreID | TopElev | BottomElev | HGUID | HGUNumber | NafHGUNumb | SHAPE_Leng | coords | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 32001999 | 652800645 | 30027773 | 6.74 | -74.26 | 31000043 | 1042 | 104005 | 0.0 | [(591975.5150000006, -3816141.8817), (591975.5... |
1 | 32002000 | 652800645 | 30027773 | -74.26 | -125.26 | 31000109 | 1108 | 110002 | 0.0 | [(591975.5150000006, -3816141.8817), (591975.5... |
2 | 32002001 | 652800645 | 30027773 | -125.26 | -147.26 | 31000045 | 1044 | 125005 | 0.0 | [(591975.5150000006, -3816141.8817), (591975.5... |
3 | 32002002 | 652800645 | 30027773 | -147.26 | -154.26 | 31000045 | 1044 | 125005 | 0.0 | [(591975.5150000006, -3816141.8817), (591975.5... |
4 | 32002003 | 652800645 | 30027773 | -154.26 | -168.26 | 31000045 | 1044 | 125005 | 0.0 | [(591975.5150000006, -3816141.8817), (591975.5... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7630 | 32145557 | 662810075 | 30057044 | 102.62 | 90.89 | 31000139 | 1138 | 100001 | 0.0 | [(605865.9246000014, -3830429.3729999997), (60... |
7631 | 32145558 | 662810075 | 30057044 | 103.08 | 102.62 | 31000139 | 1138 | 100001 | 0.0 | [(605865.9246000014, -3830429.3729999997), (60... |
7632 | 32145559 | 662813065 | 30060034 | 535.08 | 451.08 | 31000026 | 1025 | 134001 | 0.0 | [(612545.5916999988, -3832402.8148999996), (61... |
7633 | 32145560 | 662813065 | 30060034 | 451.08 | 171.08 | 31000014 | 1013 | 134001 | 0.0 | [(612545.5916999988, -3832402.8148999996), (61... |
7634 | 32145561 | 662814687 | 30061656 | 444.30 | 432.30 | 31000014 | 1013 | 134001 | 0.0 | [(635716.0604999997, -3816751.5364999995), (63... |
7635 rows × 10 columns
Pandas more frequently is used to directly read in tables. So let’s read in the csv data that came with shapefile (as this gives us some additional fields not stored in the shapefile that we can explore.
#read in the data
=pandas.read_csv("../data/shp_torrens_river/NGIS_LithologyLog.csv",usecols=list(range(0,13)))
log_data
#What is the "usecols" variable equal to?
#Try reading the data without using the usecols option, can you solve the error?
# print the first 30 and last 30 rows log_data
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | MinorLithCode | Description | Source | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1769789 | 30062892 | 662815923 | 57.25 | NGS | 18.0 | 19.5 | 39.25 | 37.75 | CLYU | None | Clay | SAGeodata |
1 | 1769790 | 30062892 | 662815923 | 57.25 | NGS | 19.5 | 22.0 | 37.75 | 35.25 | ROCK | None | Rocks and sand | SAGeodata |
2 | 1769791 | 30062892 | 662815923 | 57.25 | NGS | 22.0 | 24.0 | 35.25 | 33.25 | CLYU | None | Clay | SAGeodata |
3 | 1770725 | 30141910 | 662816624 | 4.0 | NGS | 0.0 | 6.0 | 4.0 | -2.0 | None | None | No sample | SAGeodata |
4 | 1770726 | 30141910 | 662816624 | 4.0 | NGS | 6.0 | 15.0 | -2.0 | -11.0 | CLYU | None | Clay - orange-red grey, mottled; stiff-sticky.... | SAGeodata |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
70168 | 4120345 | 30304039 | 662829228 | None | UNK | 9.0 | 10.0 | None | None | CLYU | None | Sandy clay | SAGeodata |
70169 | 4120346 | 30304039 | 662829228 | None | UNK | 10.0 | 12.5 | None | None | SAND | None | Clay sand | SAGeodata |
70170 | 4120347 | 30304050 | 652802882 | None | UNK | 0.0 | 0.3 | None | None | FILL | None | Fill | SAGeodata |
70171 | 4120348 | 30304050 | 652802882 | None | UNK | 0.3 | 0.8 | None | None | SAND | None | Clayey sand | SAGeodata |
70172 | 4120349 | 30304050 | 652802882 | None | UNK | 0.8 | 3.5 | None | None | SAND | None | Sand | SAGeodata |
70173 rows × 13 columns
# add a new column as a function of existing columns
'Thickness'] = log_data.ToDepth - log_data.FromDepth log_data[
type(log_data) # see what Python type the DataFrame is
pandas.core.frame.DataFrame
3) # print the first 3 rows log_data.head(
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | MinorLithCode | Description | Source | Thickness | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1769789 | 30062892 | 662815923 | 57.25 | NGS | 18.0 | 19.5 | 39.25 | 37.75 | CLYU | None | Clay | SAGeodata | 1.5 |
1 | 1769790 | 30062892 | 662815923 | 57.25 | NGS | 19.5 | 22.0 | 37.75 | 35.25 | ROCK | None | Rocks and sand | SAGeodata | 2.5 |
2 | 1769791 | 30062892 | 662815923 | 57.25 | NGS | 22.0 | 24.0 | 35.25 | 33.25 | CLYU | None | Clay | SAGeodata | 2.0 |
# “the index” (aka “the labels”).
log_data.index #Pandas is great for using timeseries data, where the index can be the timestamps
RangeIndex(start=0, stop=70173, step=1)
# column names (which is “an index”) log_data.columns
Index(['OBJECTID', 'BoreID', 'HydroCode', 'RefElev', 'RefElevDesc',
'FromDepth', 'ToDepth', 'TopElev', 'BottomElev', 'MajorLithCode',
'MinorLithCode', 'Description', 'Source', 'Thickness'],
dtype='object')
# data types of each column log_data.dtypes
OBJECTID int64
BoreID int64
HydroCode int64
RefElev object
RefElevDesc object
FromDepth float64
ToDepth float64
TopElev object
BottomElev object
MajorLithCode object
MinorLithCode object
Description object
Source object
Thickness float64
dtype: object
# number of rows and columns log_data.shape
(70173, 14)
# underlying numpy array — df are stored as numpy arrays for efficiencies. log_data.values
array([[1769789, 30062892, 662815923, ..., 'Clay', 'SAGeodata', 1.5],
[1769790, 30062892, 662815923, ..., 'Rocks and sand', 'SAGeodata',
2.5],
[1769791, 30062892, 662815923, ..., 'Clay', 'SAGeodata', 2.0],
...,
[4120347, 30304050, 652802882, ..., 'Fill', 'SAGeodata', 0.3],
[4120348, 30304050, 652802882, ..., 'Clayey sand', 'SAGeodata',
0.5],
[4120349, 30304050, 652802882, ..., 'Sand', 'SAGeodata', 2.7]],
dtype=object)
#log_data['MajorLithCode'] # select one column
##Equivalent to
#log_data.MajorLithCode
##and
#log_data.iloc[:,9]
##and
#log_data.loc[:,'MajorLithCode']
type(log_data['MajorLithCode']) # determine datatype of column (e.g., Series)
pandas.core.series.Series
#describe the data frame
='all') log_data.describe(include
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | MinorLithCode | Description | Source | Thickness | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 7.017300e+04 | 7.017300e+04 | 7.017300e+04 | 70173 | 70173 | 70173.000000 | 70173.000000 | 70173 | 70173 | 70173 | 70173 | 70173 | 70173 | 70173.000000 |
unique | NaN | NaN | NaN | 5068 | 4 | NaN | NaN | 27784 | 27883 | 81 | 42 | 33614 | 54 | NaN |
top | NaN | NaN | NaN | None | NGS | NaN | NaN | None | None | CLYU | None | Clay | SAGeodata | NaN |
freq | NaN | NaN | NaN | 18514 | 44947 | NaN | NaN | 18514 | 18514 | 25861 | 62812 | 4603 | 70119 | NaN |
mean | 2.505677e+06 | 3.018198e+07 | 6.624491e+08 | NaN | NaN | 24.938020 | 30.621160 | NaN | NaN | NaN | NaN | NaN | NaN | 5.683139 |
std | 9.276182e+05 | 8.069609e+04 | 2.130226e+06 | NaN | NaN | 45.431762 | 48.605931 | NaN | NaN | NaN | NaN | NaN | NaN | 9.942400 |
min | 1.769789e+06 | 3.002715e+07 | 6.528000e+08 | NaN | NaN | 0.000000 | 0.010000 | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 |
25% | 1.932741e+06 | 3.014557e+07 | 6.628129e+08 | NaN | NaN | 0.800000 | 3.960000 | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 |
50% | 1.999028e+06 | 3.018487e+07 | 6.628196e+08 | NaN | NaN | 7.000000 | 11.500000 | NaN | NaN | NaN | NaN | NaN | NaN | 2.800000 |
75% | 3.967146e+06 | 3.025487e+07 | 6.628248e+08 | NaN | NaN | 25.800000 | 34.700000 | NaN | NaN | NaN | NaN | NaN | NaN | 7.000000 |
max | 4.120349e+06 | 3.030405e+07 | 6.728042e+08 | NaN | NaN | 610.300000 | 620.160000 | NaN | NaN | NaN | NaN | NaN | NaN | 300.500000 |
# summarise a pandas Series
# describe a single column log_data.FromDepth.describe()
count 70173.000000
mean 24.938020
std 45.431762
min 0.000000
25% 0.800000
50% 7.000000
75% 25.800000
max 610.300000
Name: FromDepth, dtype: float64
#calculate mean of 6th column ("FromDepth")
5].mean() log_data.iloc[:,
24.93802017870121
#alternate method to calculate mean of FromDepth column (the 6th one)
"FromDepth"].mean() log_data[
24.93802017870121
#Count how many Lith Codes there are
=log_data.MajorLithCode.value_counts() lithCounts
#Print the lithcodes, use .index or .values
lithCounts
CLYU 25861
SAND 12772
SLAT 4071
FILL 4020
SDST 3209
...
DIOR 1
SANN 1
HFLS 1
CALU 1
REGO 1
Name: MajorLithCode, Length: 81, dtype: int64
#plot a bar chart of the lith codes
=(15,5)); lithCounts.plot.bar(figsize
#Plot a bar chart of the lith codes for the rarer lithologies
< 50)].plot.bar(rot=45,figsize=(15,5)); lithCounts[(lithCounts
import numpy as np
import matplotlib.pyplot as plt
#Pandas data
= log_data.Thickness
x = log_data.Thickness.mean()
mu = log_data.Thickness.std()
sigma
# An Equivalent Numpy version
#x = log_data.Thickness.values
#mu = np.mean(x) # mean of distribution
#sigma = np.std(x) # standard deviation of distribution
# the histogram of the data
=np.arange(0,20,1), alpha=0.5)
plt.hist(x, bins'Thickness (m)')
plt.xlabel('Count')
plt.ylabel(="Histogram with a mean of "+ str(np.round(mu,2)) + " & SD " + str(np.round(sigma,2))
mystring
plt.title(mystring)
# Tweak spacing to prevent clipping of ylabel
#plt.subplots_adjust(left=0.15)
plt.show()
Challenge
Plot a histogram of the thickness where the “MajorLithCode” is “FILL”.
Hint: to filter a pandas data frame by value use the following syntax:
df[df['Variable'] == "value"]
Solution
= log_data[log_data['MajorLithCode'] == "FILL"]['Thickness'].values
x #OR
#x = log_data[log_data.MajorLithCode == "FILL"].Thickness.values
=np.arange(0,20,1), alpha=0.5)
plt.hist(x, bins plt.show()
# import numpy as np
# cmap = plt.get_cmap('viridis')
# colors = cmap(np.linspace(0, 1, len(lithCounts.index)))
# colors
# for row in log_data.itertuples():
# boreid=row[3]
# for ind,value in enumerate(recs):
# try:
# value.index(boreid)
# print(recs)
# except:
# continue
# #(row[3])
#You can plot the location of the bores slowly
# for ind, value in enumerate(recs):
# #Get the lat lon value
# lon=df.coords[ind][0][0]
# lat=df.coords[ind][0][1]
# #Get the Lithology unit
# #value[]
# #Now add the point to the plot
# plt.plot(lon,lat,"|")
# plt.show()
# #or fast
# lons= [df.coords[i][0][0] for i in range(1,len(recs))]
# lats= [df.coords[i][0][1] for i in range(1,len(recs))]
# plt.plot(lons,lats,"|")
# plt.show()
Extra credit challenge
Go to http://www.bom.gov.au/water/groundwater/explorer/map.shtml and pick another River Region. Download the dataset in “Shapefile” format (this will download the csv also). Once you have the data, follow the same routines as above and see what you can find out about the river region.
Solution
TODO Nate
Log ASCII Files
Python has a wide range of packages/libraries to do specific tasks. You can often create your own tools for doing niche tasks, but often you will find that many already exist to make things simpler for you. We will explore libraries that work with borehole data (in .las format) with the lasio library.
This tutorial based off https://towardsdatascience.com/handling-big-volume-of-well-log-data-with-a-boosted-time-efficiency-with-python-dfe0319daf26
Original Data from: https://sarigbasis.pir.sa.gov.au/WebtopEw/ws/samref/sarig1/image/DDD/PEDP013LOGS.zip
Title: Cooper Basin selected well logs in LAS format.
Publication Date: November 20
Prepared by: Energy Resources Division, Department of the Premier and Cabinet
This Record URL: https://sarigbasis.pir.sa.gov.au/WebtopEw/ws/samref/sarig1/wci/Record?r=0&m=1&w=catno=2040037
#For plotting
import matplotlib.pyplot as plt
#Library specifically for "well data"
import lasio
#To read files
import glob
#For "regular expression manipulation"
import re
#Build a list of filenames to read
= glob.glob("../data/WELL/*.las")
read_files read_files
['../data/WELL/BoolLagoon1.las',
'../data/WELL/Bungaloo1.las',
'../data/WELL/BeachportEast1.las',
'../data/WELL/BiscuitFlat1.las',
'../data/WELL/Balnaves.las',
'../data/WELL/Banyula.las',
'../data/WELL/Burrungule1.las',
'../data/WELL/Beachport1.las']
Note: the possibility of Windows VS Unix character interpretations.
#Cut out just the name of the well from the filenames
= []
well_names for file in read_files:
print("FILE:", file)
#Split the filepath at a "/" OR a ".las" OR a "\"
=re.split(r'/|\\|.las',file)
wellprint("SPLIT:", well, "\n")
-2])
well_names.append(well[
print("There are ", len(well_names), "wells.")
print(well_names)
FILE: ../data/WELL/BoolLagoon1.las
SPLIT: ['..', 'data', 'WELL', 'BoolLagoon1', '']
FILE: ../data/WELL/Bungaloo1.las
SPLIT: ['..', 'data', 'WELL', 'Bungaloo1', '']
FILE: ../data/WELL/BeachportEast1.las
SPLIT: ['..', 'data', 'WELL', 'BeachportEast1', '']
FILE: ../data/WELL/BiscuitFlat1.las
SPLIT: ['..', 'data', 'WELL', 'BiscuitFlat1', '']
FILE: ../data/WELL/Balnaves.las
SPLIT: ['..', 'data', 'WELL', 'Balnaves', '']
FILE: ../data/WELL/Banyula.las
SPLIT: ['..', 'data', 'WELL', 'Banyula', '']
FILE: ../data/WELL/Burrungule1.las
SPLIT: ['..', 'data', 'WELL', 'Burrungule1', '']
FILE: ../data/WELL/Beachport1.las
SPLIT: ['..', 'data', 'WELL', 'Beachport1', '']
There are 8 wells.
['BoolLagoon1', 'Bungaloo1', 'BeachportEast1', 'BiscuitFlat1', 'Balnaves', 'Banyula', 'Burrungule1', 'Beachport1']
#Now actually read in the log files to lasio
#The last cell was just to automatically make a nicely formatted list of well names!
= []
lases for files in read_files:
= lasio.read(files)
las lases.append(las)
#You can get an idea of what you can interogate using the help function
#help(lases)
#This is just a regular Python list! But the list contains
#in this case, special objects known as "LasFile(s)" or lasio.las object.
#Get some details using help again
#help(lases[1])
#From there we can get some info from each of the wells
=0
jfor well in lases:
#e.g. pull out the varaibles availble from the wells
print("Wellid:", j, well_names[j])
+=1
jprint(well.keys())
Wellid: 0 BoolLagoon1
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'SP']
Wellid: 1 Bungaloo1
['DEPTH', 'CALI', 'DRHO', 'DT', 'DTS', 'GR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'RMIC', 'SP']
Wellid: 2 BeachportEast1
['DEPTH', 'GR', 'RDEP', 'RMED', 'SP']
Wellid: 3 BiscuitFlat1
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'MINV', 'MNOR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'RMIC', 'SP']
Wellid: 4 Balnaves
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'MINV', 'MNOR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'RMIC', 'SP']
Wellid: 5 Banyula
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'NPHI', 'RDEP', 'RHOB', 'RMED', 'SP']
Wellid: 6 Burrungule1
['DEPTH', 'CALI', 'DT', 'GR', 'RDEP', 'RMED', 'SP']
Wellid: 7 Beachport1
['DEPTH', 'CALI', 'MINV', 'MNOR', 'RDEP', 'RMED', 'SP']
#Set a wellid you want to explore more
=1 wellid
#Make a plot of one of the wells
'DRHO'],lases[wellid]['DEPTH']) plt.plot(lases[wellid][
You have just plotted the density (DRHO) at each measured depth point. You can clean this up and present it better in the next few cells.
#Get some more info out of the well data
print(lases[wellid].curves)
Mnemonic Unit Value Description
-------- ---- ----- -----------
DEPTH M Depth
CALI in Caliper CAL Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis
DRHO g/cm3 DenCorr ZCOR Edited, bungaloo_1_mll_rtex_xyzdl_r6.dlis
DT us/ft Sonic DT24 DT24.I Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis
DTS us/ft ShearSonic DTS , bungaloo_1_mll_rtex_r2.dlis
GR gAPI GammaRay GR Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis
NPHI dec Neutron CNC Edited, bungaloo_1_neutron_r2.dlis
PEF b/e PEFactor PE Edited, bungaloo_1_mll_rtex_xyzdl_r6.dlis
RDEP ohmm DeepRes MLR4C Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis
RHOB g/cm3 Density ZDNC Edited, bungaloo_1_mll_rtex_xyzdl_r6.dlis
RMED ohmm MedRes MLR2C Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis
RMIC ohmm MicroRes RMLL Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis
SP mV SponPot SPWDH Edited, bungaloo_1_mll_rtex_r2.dlis
Challenge
Run this bit of code. Then add additional mnemonic plots to the figure.
#Import additional packages we will need
import numpy as np
import pandas as pd
#Convert a data array to a pandas dataframe
#and find significant spikes in the data
#Return the spikes as a binary 1 or 0 array
def find_unc(data):
#Convert data to pandas
=pd.DataFrame(data)
df#Caluclate the rolling average
#(200 is somewhat arbitray value to take the rolling average over)
= df.rolling(200).mean()
df_mean #Calculate the percent change (i.e any points of change) of the data
= df_mean.pct_change(periods=200)
df_change #Convert large percent changes to 1 or 0
#0.5 (50%) is a somewhat arbitray number
#to set as the amount of change in the data
= ((df_change < -0.5) | (df_change > 0.5)).astype(int)
dfbin #Return the binaray array
return(dfbin)
#Define a function to make the plot and set parameters
def make_plot(i,var,colour):
#Set the data to a variable
=lases[wellid][var]
data#Find the spikes in the data
=find_unc(data)
dfbin#Now perform the plotting
=min(lases[wellid]['DEPTH'])
top=max(lases[wellid]['DEPTH'])
bot*np.nanmax(data), lases[wellid]['DEPTH'], color = 'black', linewidth = 0.5)
ax[i].plot(dfbin'DEPTH'], color = colour, linewidth = 0.5)
ax[i].plot(data, lases[wellid][
ax[i].set_xlabel(var)
ax[i].xaxis.label.set_color(colour)0.5), np.nanpercentile(lases[wellid][var],99.5))
ax[i].set_xlim(np.nanpercentile(lases[wellid][var],='x', colors=colour)
ax[i].tick_params(axis
ax[i].title.set_color(colour)
ax[i].set_ylim(top,bot)
ax[i].invert_yaxis()=False,
ax[i].tick_params(left=True,
bottom=False,
labelleft=True)
labelbottom
#Make the figure
= plt.subplots(nrows=1, ncols=2, figsize=(16,6))
fig, ax
#Add a plot of each mnemoic to your figure
0,"GR","green")
make_plot(1,"RDEP","red")
make_plot(
#Quick way to get the list of keys
#lases[wellid].keys()
#Fix the details on the figure
=0.01)
plt.subplots_adjust(wspace0].set_ylabel("Depth (m)")
ax[0].tick_params(left=True,
ax[=True,
bottom=True,
labelleft=True) labelbottom
Solution
To solve the challenge you can change the ncols
varibale and then add new calls to the make_plot
function.
#Change the number of columns
= plt.subplots(nrows=1, ncols=8, figsize=(16,6))
fig, ax
#Add additional calls to the make plot
0,"CALI","green")
make_plot(1,"DRHO","red")
make_plot(2,"DT","blue")
make_plot(3,"GR","purple")
make_plot(4,"RDEP","cyan")
make_plot(5,"RHOB","pink")
make_plot(6,"RMED","brown")
make_plot(7,"SP","orange") make_plot(
To learn more about the smoothing steps make a diagnostic plots at each step.
#Set an example dataset, i.e.
#wellid 1 and var is "GR"
=lases[1]["GR"]
data
#Convert data to pandas
=pd.DataFrame(data)
df
plt.plot(df)"Raw data")
plt.title(
plt.show()
#Caluclate the rolling average
= df.rolling(200).mean()
df_mean
plt.plot(df_mean)"Smoothed data")
plt.title(
plt.show()
#Calculate the percent change (i.e any points of change) of the data
= df_mean.pct_change(periods=200)
df_change
plt.plot(df_change)"Percentage changes along the smoothed data")
plt.title(
plt.show()
#Convert large percent changes to 1 or 0
= ((df_change < -0.2) | (df_change > 0.2)).astype(int)
dfbin
plt.plot(dfbin)"'Binarised' version of large percentage changes")
plt.title( plt.show()
SEGY Seismic data processing
from obspy.io.segy.segy import _read_segy
import matplotlib.pyplot as plt
import numpy as np
#Adapted from https://agilescientific.com/blog/2016/9/21/x-lines-of-python-read-and-write-seg-y
#See the notebooks here for more good examples
#https://github.com/agile-geoscience/xlines
#Set the filename of the segy data
="../data/james/james_1959_pstm_tvfk_gain.sgy"
filename
#Data randomly chosen from here:
#Title: 2006 James 3D Seismic Survey.
#Author: White, A.
#Prepared by: Terrex Seismic Pty Ltd; Pioneer Surveys Pty Ltd; WestenGeco
#Tenement: PPL00182
#Operator: Santos Ltd
#https://sarigbasis.pir.sa.gov.au/WebtopEw/ws/samref/sarig1/wci/Record?r=0&m=1&w=catno=2035790
#This will take about 1 minute.
#When the [*] changes to [52] and the circle in the top right is clear, it has completed
= _read_segy(filename, headonly=True)
stream print(np.shape(stream.traces))
stream
(48832,)
48832 traces in the SEG Y structure.
#Look at a single trace
= stream.traces[10000]
one_trace
#Print out details single trace
print(one_trace)
=(16,2))
plt.figure(figsize
plt.plot(one_trace.data) plt.show()
Trace sequence number within line: 10001
1001 samples, dtype=float32, 250.00 Hz
#Stack multiple traces into a single numpy array
= np.stack([t.data for t in stream.traces[12320:12320+500]]) data
#What does the stacked data look like
data.shape
(500, 1001)
#Have a look at the data
="Greys", aspect='auto') plt.imshow(data.T, cmap
<matplotlib.image.AxesImage at 0x16a4983a0>
#Make a more informative plot
#Restrict the data to the 95th percentile
= np.percentile(data, 95)
vm
print("The 95th percentile is {:.0f}; the max amplitude is {:.0f}".format(vm, data.max()))
#Make the plot
=(16,8))
plt.figure(figsize="RdBu", vmin=-vm, vmax=vm, aspect='auto')
plt.imshow(data.T, cmap
plt.colorbar() plt.show()
The 95th percentile is 4365; the max amplitude is 34148
#What else is in the data?
#Print out the segy headers
print(stream.textual_file_header.decode())
C 1 CLIENT SANTOS COMPANY CREW NO C 2 LINE 2000.00 AREA JAMES3D C 3 REEL NO DAY-START OF REEL YEAR OBSERVER C 4 INSTRUMENT MFG MODEL SERIAL NO C 5 DATA TRACES/RECORD 24569 AUXILIARY TRACES/RECORD 0 CDP FOLD 40 C 6 SAMPLE INTERVAL 4.00 SAMPLES/TRACE 1001 BITS/IN BYTES/SAMPLE 4 C 7 RECORDING FORMAT FORMAT THIS REEL SEG-Y MEASUREMENT SYSTEM METERS C 8 SAMPLE CODE FLOATING PT C09 JAMES 3D C10 WESTERNGECO C11 MARCH 2007 C12 VERSION : James3D_pstm_tvfk_gain C13 FILTERED TRIM PSTM STACK C14 C15 GEOMETRY APPLY-TAR-MINP- C16 NOISE REDUCTION - SWATT C17 SC DECON - SCAC C18 RESIDUAL_STATICS C19 TRIM_STATICS - INVERSE_TAR - SORT C20 PSTM - SORT - GAIN C21 TRIM_STATICS - STACK C22 SPECW_10-70HZ -TVF_10-75HZ-TRACE_BALANCE C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 END EBCDIC
#And the header information for a particular trace
print(stream.traces[1013].header)
trace_sequence_number_within_line: 1014
trace_sequence_number_within_segy_file: 1014
original_field_record_number: 2004
trace_number_within_the_original_field_record: 1
energy_source_point_number: 10026
ensemble_number: 10026
trace_number_within_the_ensemble: 28
trace_identification_code: 1
number_of_vertically_summed_traces_yielding_this_trace: 1
number_of_horizontally_stacked_traces_yielding_this_trace: 13
data_use: 1
distance_from_center_of_the_source_point_to_the_center_of_the_receiver_group: 0
receiver_group_elevation: 0
surface_elevation_at_source: 0
source_depth_below_surface: 0
datum_elevation_at_receiver_group: 0
datum_elevation_at_source: 0
water_depth_at_source: 0
water_depth_at_group: 0
scalar_to_be_applied_to_all_elevations_and_depths: 1
scalar_to_be_applied_to_all_coordinates: 1
source_coordinate_x: 482760
source_coordinate_y: 7035836
group_coordinate_x: 482760
group_coordinate_y: 7035836
coordinate_units: 1
weathering_velocity: 0
subweathering_velocity: 0
uphole_time_at_source_in_ms: 0
uphole_time_at_group_in_ms: 0
source_static_correction_in_ms: 0
group_static_correction_in_ms: 0
total_static_applied_in_ms: -70
lag_time_A: 0
lag_time_B: 0
delay_recording_time: 0
mute_time_start_time_in_ms: 0
mute_time_end_time_in_ms: 20
number_of_samples_in_this_trace: 1001
sample_interval_in_ms_for_this_trace: 4000
gain_type_of_field_instruments: 0
instrument_gain_constant: 0
instrument_early_or_initial_gain: 0
correlated: 0
sweep_frequency_at_start: 0
sweep_frequency_at_end: 0
sweep_length_in_ms: 0
sweep_type: 0
sweep_trace_taper_length_at_start_in_ms: 0
sweep_trace_taper_length_at_end_in_ms: 0
taper_type: 0
alias_filter_frequency: 0
alias_filter_slope: 0
notch_filter_frequency: 0
notch_filter_slope: 0
low_cut_frequency: 0
high_cut_frequency: 0
low_cut_slope: 0
high_cut_slope: 0
year_data_recorded: 0
day_of_year: 0
hour_of_day: 0
minute_of_hour: 0
second_of_minute: 0
time_basis_code: 0
trace_weighting_factor: 0
geophone_group_number_of_roll_switch_position_one: 0
geophone_group_number_of_trace_number_one: 0
geophone_group_number_of_last_trace: 0
gap_size: 0
over_travel_associated_with_taper: 0
x_coordinate_of_ensemble_position_of_this_trace: 0
y_coordinate_of_ensemble_position_of_this_trace: 0
for_3d_poststack_data_this_field_is_for_in_line_number: 0
for_3d_poststack_data_this_field_is_for_cross_line_number: -4587520
shotpoint_number: 2004
scalar_to_be_applied_to_the_shotpoint_number: 0
trace_value_measurement_unit: 10026
transduction_constant_mantissa: 0
transduction_constant_exponent: 0
transduction_units: 0
device_trace_identifier: 0
scalar_to_be_applied_to_times: 1052
source_type_orientation: 0
source_energy_direction_mantissa: 0
source_energy_direction_exponent: 1607
source_measurement_mantissa: 0
source_measurement_exponent: 0
source_measurement_unit: 0
#You can automatically extract data you might need from the header
#Get the sample interval from the header info
= stream.traces[0].header.sample_interval_in_ms_for_this_trace / 1e6
dt dt
0.004
Challenge
A single seismic section can be viewed with this snippet of code:
#Set number of xlines
=262
n#Set start iline
=0
m
print(m,m*n,m*n+n)
= np.stack(t.data for t in stream.traces[m*n:m*n+n])
data = np.percentile(data, 95)
vm =(14,4))
plt.figure(figsize="RdBu", vmin=-vm, vmax=vm, aspect='auto')
plt.imshow(data.T,cmap plt.show()
Can you put this in a loop to show multiple sections at once?
Solution
…
#Set number of xlines
=262
n#Set start iline
=0
m
while m < 10:
print(m,m*n,m*n+n)
= np.stack(t.data for t in stream.traces[m*n:m*n+n])
data = np.percentile(data, 95)
vm =(14,4))
plt.figure(figsize="RdBu", vmin=-vm, vmax=vm, aspect='auto')
plt.imshow(data.T,cmap
plt.show()=m+1 m
Roll-your-own data reader
Western seismic VELF format
Sometimes there are no good libraries or data-readers availble for your specific use-case. Very common if you get some specialty instrument with some unique data format. Often the documentation is a good place to start for figuring out how to interpret the data, and more-often-than not, the header of the data can give you all the information you need. Download this VELF file, containing some 3D seismic data. I could not find any good python libraries to handle this dataset, so we can just try out a few things to get the data into a format that is useful for us.
#Imports for plotting
import matplotlib.pyplot as plt
# Load Builtin colormaps, colormap handling utilities, and the ScalarMappable mixin.
from matplotlib import cm
import pandas as pd
#Open the file for reading
= open("../data/S3D_Vrms_StkVels_VELF.txt",'r')
f
#Read in all the lines in the file and save them to a variable
= f.readlines()
mylist
#Close the file
f.close()print("Done reading file.")
Done reading file.
#Print out the first 20 lines
print(mylist[0:20])
['Client: XXX \n', 'Project: YYY\n', 'Contractor: ZZZ\n', 'Date:\n', '\n', 'Velocity type: RMS Velocity in Time\n', '\n', '\n', 'Datum: GDA94, UTM Zone: UTM53, Central Meridian : \n', 'Statics: Two way time corrected to mean sea level: No\n', ' Gun and Cable statics applied: No\n', ' Tidal statics applied: No\n', '\n', '3D Grid details:\n', 'inline crossline X Y\n', '\n', '1000 5000 599413.78 7382223.37\n', '1000 5309 595633.30 7375486.63\n', '1448 5000 609180.96 7376742.28\n', '1448 5309 605400.48 7370005.55\n']
#Set up some empty lists to store each bit of data in
=[]
inline=[]
crossline=[]
X=[]
Y
=[]
velf
#Loop through all the lines in the file
for i,line in enumerate(mylist):
#First split the line up (by default, split by whitespace)
=line.split()
splitline
#If we encounter certain lines, save some data
if i in [16,17,18,19]:
#Print out the lines (check we are doing the right thing)
print(splitline)
int(splitline[0]))
inline.append(int(splitline[1]))
crossline.append(float(splitline[2]))
X.append(float(splitline[3]))
Y.append(
#This is where the actual data starts
#Now depending on the key word at the start of each line
#save the data to each particular list/array
#Read the data in again, this time with some thought about what we actually want to do
if i>49:
if splitline[0]=='LINE':
= int(splitline[1])
LINE
if splitline[0]=='SPNT':
=int(splitline[1])
xline3d=float(splitline[2])
binx=float(splitline[3])
biny=int(splitline[4])
inline3d
if splitline[0]=='VELF':
for j,val in enumerate(splitline[1:]):
#print(j,val)
#Counting from the 0th index of splitline[1:end]
if j%2==0:
=int(val)
telse:
=int(val)
vt velf.append([LINE,xline3d,binx,biny,inline3d,t,vt])
['1000', '5000', '599413.78', '7382223.37']
['1000', '5309', '595633.30', '7375486.63']
['1448', '5000', '609180.96', '7376742.28']
['1448', '5309', '605400.48', '7370005.55']
#Convert the python "list" type to Pandas dataframe
=pd.DataFrame(velf)
df#Set the names of the columns
=['LINE','xline3d','binx','biny','inline3d','t','vt']
df.columns
df
LINE | xline3d | binx | biny | inline3d | t | vt | |
---|---|---|---|---|---|---|---|
0 | 1000 | 5080 | 598435.0 | 7380479.0 | 1000 | 0 | 3200 |
1 | 1000 | 5080 | 598435.0 | 7380479.0 | 1000 | 295 | 3300 |
2 | 1000 | 5080 | 598435.0 | 7380479.0 | 1000 | 598 | 4137 |
3 | 1000 | 5080 | 598435.0 | 7380479.0 | 1000 | 738 | 4537 |
4 | 1000 | 5080 | 598435.0 | 7380479.0 | 1000 | 1152 | 4500 |
... | ... | ... | ... | ... | ... | ... | ... |
3082 | 1440 | 5280 | 605580.0 | 7370735.0 | 1440 | 2216 | 5259 |
3083 | 1440 | 5280 | 605580.0 | 7370735.0 | 1440 | 2861 | 5791 |
3084 | 1440 | 5280 | 605580.0 | 7370735.0 | 1440 | 3526 | 6294 |
3085 | 1440 | 5280 | 605580.0 | 7370735.0 | 1440 | 4697 | 7077 |
3086 | 1440 | 5280 | 605580.0 | 7370735.0 | 1440 | 5988 | 7748 |
3087 rows × 7 columns
#Plot the target area
=df.xline3d)
plt.scatter(df.binx,df.biny,c
plt.colorbar()
plt.show()
#Plot it in 3d just becasue we can
= plt.figure()
fig = fig.add_subplot(projection='3d')
ax =df.xline3d)
ax.scatter(df.binx,df.biny,df.inline3d,c30, 70)
ax.view_init( plt.show()
#Now, make some plots...
#One way to do this, is to
#make a 'group' for each unique seismic line
=df.groupby(['LINE','xline3d','inline3d']) groups
#Make plots by certain groupings
#Add a value to spread out the data nicely
=0
ifor name,grp in groups:
if name[2]==1280:
print(name)
+i,-grp.t)
plt.plot(grp.vt+=500 i
(1280, 5040, 1280)
(1280, 5060, 1280)
(1280, 5080, 1280)
(1280, 5100, 1280)
(1280, 5120, 1280)
(1280, 5140, 1280)
(1280, 5160, 1280)
(1280, 5180, 1280)
(1280, 5200, 1280)
(1280, 5220, 1280)
(1280, 5240, 1280)
(1280, 5260, 1280)
(1280, 5280, 1280)
from scipy.interpolate import interp1d
import numpy as np
#Normal plots
%matplotlib inline
#Fancy intereactive plots
#%matplotlib notebook
= plt.figure(figsize=(10,10))
fig = fig.add_subplot(projection='3d')
ax for name,grp in groups:
##Plot all the data
+grp.vt,grp.biny,grp.t,'b-')
ax.plot(grp.binx
##Plot all the data with colors
# colors=cm.seismic(grp.vt/grp.vt.max())
# ax.scatter(grp.binx+grp.vt,grp.biny,grp.t,c=grp.vt/grp.vt.max(),cmap=cm.seismic)
#Interpolate the data and plot with colors
# x = grp.t
# y = grp.vt
# f = interp1d(x, y, kind='linear')
# num=50
# xnew = np.linspace(0,max(x),num)
# ynew = f(xnew)
# binx = np.linspace(min(grp.binx),max(grp.binx),num)
# biny = np.linspace(min(grp.biny),max(grp.biny),num)
# colours = cm.seismic(ynew/grp.vt.max())
# ax.scatter(binx+xnew,biny,xnew,c=colours)
plt.show()
Bonus: Convert Text to SegY format
A work in progress… but the strategy is to simply match up the correct SEGY values with those in the text file. Grouping everything by each x-y (i-j, lat-lon) surface point, and then the z-axis will be the trace (down to a depth or a TWT etc).
from obspy.core.stream import Stream, Trace
from obspy.core.utcdatetime import UTCDateTime
from obspy.io.segy.segy import SEGYTraceHeader
from obspy.core.util.attribdict import AttribDict
from obspy.io.segy.segy import SEGYBinaryFileHeader
import sys
# Group all the text traces by their the i-j coordinates
=df.groupby(['LINE','xline3d'])
groupsprint(len(groups))
#Make a stream object (flush it out to begin because we have used this variable names for demos)
= None
stream_out = Stream()
stream_out
#not sure how to group the trace ensembles but can use a counter to keep track of them
= 0
ensemble_number
for ids,df_trace in groups:
#ids are the LINE, xline3d coordinate locations
#trc is the subset of the full dataframe for just that i-j location
#For each LINE-xline3d location, a trace is impdence at all the depth values, i.e.
= df_trace.vt.values
data
# Enforce correct byte number and set to the Trace object
= np.require(data, dtype=np.float32)
data = Trace(data=data)
trace
# Set all the segy header information
# Attributes in trace.stats will overwrite everything in trace.stats.segy.trace_header
= 0.01
trace.stats.delta = UTCDateTime(1970,1,1,0,0,0)
trace.stats.starttime
# If you want to set some additional attributes in the trace header,
# add one and only set the attributes you want to be set. Otherwise the
# header will be created for you with default values.
if not hasattr(trace.stats, 'segy.trace_header'):
= {}
trace.stats.segy
= SEGYTraceHeader()
trace.stats.segy.trace_header
# trace.stats.segy.trace_header.trace_sequence_number_within_line = index + 1
# trace.stats.segy.trace_header.receiver_group_elevation = 0
= int(df_trace.binx.values[0])
trace.stats.segy.trace_header.source_coordinate_x = int(df_trace.biny.values[0])
trace.stats.segy.trace_header.source_coordinate_y = ensemble_number #Not sure how this is actually determined
trace.stats.segy.trace_header.ensemble_number = 2400
trace.stats.segy.trace_header.lag_time_A = 3000
trace.stats.segy.trace_header.lag_time_B = len(data)
trace.stats.segy.trace_header.number_of_samples_in_this_trace
+=1
ensemble_number
# Add trace to stream
stream_out.append(trace)
# A SEGY file has file wide headers. This can be attached to the stream
# object. If these are not set, they will be autocreated with default
# values.
= AttribDict()
stream_out.stats = 'Textual Header!'
stream_out.stats.textual_file_header = SEGYBinaryFileHeader()
stream_out.stats.binary_file_header = 5
stream_out.stats.binary_file_header.trace_sorting_code # stream.stats.binary_file_header.number_of_data_traces_per_ensemble=1
print("Stream object before writing...")
print(stream_out)
"TEST.sgy", format="SEGY", data_encoding=1, byteorder=sys.byteorder)
stream_out.write(
print("Stream object after writing. Will have some segy attributes...")
print(stream_out)
217
Stream object before writing...
217 Trace(s) in Stream:
... | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.060000Z | 100.0 Hz, 7 samples
...
(215 other traces)
...
... | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.100000Z | 100.0 Hz, 11 samples
[Use "print(Stream.__str__(extended=True))" to print all Traces]
Stream object after writing. Will have some segy attributes...
217 Trace(s) in Stream:
Seq. No. in line: 0 | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.060000Z | 100.0 Hz, 7 samples
...
(215 other traces)
...
Seq. No. in line: 0 | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.100000Z | 100.0 Hz, 11 samples
[Use "print(Stream.__str__(extended=True))" to print all Traces]
API (application programming interface) calls
For example: https://github.com/geological-survey-of-queensland/open-data-api
This is a general API, and can be interfaced with many types of software. Python can do it too! You can write an entire application around APIs provided by developers.
import requests
import json
# set the API 'endpoint'
= 'https://geoscience.data.qld.gov.au/api/action/'
api
# construct our query using the rules set out in the documentation
= api + 'package_search?q=quamby'
query
# make the get request and store it in the response object
= requests.get(query)
response
# view the payload as JSON
= response.json() json_response
#What kind of data does the server return us?
type(json_response)
#json_response
dict
#Dig into the returned data and just pull out certain items we want
=json_response['result']['results'][1]['GeoJSONextent'] jr
#Clean up the returned values
=jr.split(':')[-1].split('}')[0] mysplit
#And convert them into something more reasonable.
import ast
=ast.literal_eval(mysplit)[0] xy
#Now you have the data you need in a useful format you can do whatever you like!
xy
[[[139.901170853999, -20.4651656329993],
[139.884504220999, -20.4651658769991],
[139.867837588999, -20.4651661309991],
[139.851170957999, -20.4651663759992],
[139.834504324999, -20.4651666519991],
[139.834504414999, -20.4484997749993],
[139.834504493999, -20.4318328969991],
[139.834504583999, -20.4151660189993],
[139.834504673999, -20.3984991289993],
[139.834504817999, -20.3818322399992],
[139.834504974999, -20.3651653509993],
[139.834505130999, -20.3484984619992],
[139.834505276999, -20.3318315619991],
[139.817838621999, -20.3318318499993],
[139.801171968999, -20.3318321489993],
[139.784505313999, -20.3318324379992],
[139.784505458999, -20.3151655479993],
[139.784505604999, -20.2984986709991],
[139.801172269999, -20.2984983719992],
[139.817838923999, -20.2984980729993],
[139.834505589999, -20.2984977849992],
[139.851172242999, -20.2984974849992],
[139.867838875999, -20.2984972409993],
[139.884505496999, -20.2984970089993],
[139.901172127999, -20.2984967649993],
[139.901171993999, -20.3151636539992],
[139.901171860999, -20.3318305429993],
[139.901171725999, -20.3484974319992],
[139.901171591999, -20.3651643209993],
[139.901171445999, -20.3818312099993],
[139.901171311999, -20.3984980879992],
[139.901171199999, -20.4151649779992],
[139.901171087999, -20.4318318659992],
[139.901170965999, -20.4484987439991],
[139.901170853999, -20.4651656329993]]]
Key points
- Obspy can be used to work with segy seismic data
- Lasio can be used to work with well log data
- Pyshp can be used to work with Shapefiles
- Pandas dataframes are the best format for working with tabular data of mixed numeric/character types
- Numpy arrays are faster when working with purely numeric data
- Make your own “data-reader” with native Python
- API calls can be made to build your own workflows and applications with Python
All materials copyright Sydney Informatics Hub, University of Sydney