Useful Python packages for different data types

Questions

  • What are libraries and packages?
  • How can I load tabular data into Python?
  • How can I load shapefiles?
  • How can I load segy and las data?

Objectives

  • Learn how to deal with specialty data types.
  • Learn about pandas, pyshp, lasio, obspy.

Python can deal with basically any type of data you throw at it. The open source python community has developed many packages that make things easy. Today we will look at pyshp (for dealing with shapefiles), pandas (great for tables and time series), lasio (for las format well log data) and obspy (a highly featured seismic data processing suite) packages.

Data for this exercised was downloaded from http://www.bom.gov.au/water/groundwater/explorer/map.shtml

Shapefiles

Shapefiles are a very common file format for GIS data, the standard for which is developed and maintained by ESRI, the makers of the ArcGIS software. Shapefiles collect vectors of features, such as points, lines, polygons. The “file” is actually a misnomer - if you look at a single “shapefile” on your machine using a file explorer, you can see that it’s actually made up of several files, three of which are mandatory, and others which may/may not be there.

#Load the required modules
import shapefile

#NOTE: Weirdly and confusingly, this package is called "pyshp" but you call it via the name "shapefile"
#help(shapefile)
#Or check out the help pages https://github.com/GeospatialPython/pyshp
#Set the filename
boreshape='../data/shp_torrens_river/NGIS_BoreLine.shp'

#read in the file
shapeRead = shapefile.Reader(boreshape)

#And save out some of the shape file attributes
recs    = shapeRead.records()
shapes  = shapeRead.shapes()
fields  = shapeRead.fields
Nshp    = len(shapes)
print(Nshp) #print the Number of items in the shapefile
7635
fields #print the fields
[('DeletionFlag', 'C', 1, 0),
 ['HydroID', 'N', 10, 0],
 ['HydroCode', 'C', 30, 0],
 ['BoreID', 'N', 10, 0],
 ['TopElev', 'F', 19, 11],
 ['BottomElev', 'F', 19, 11],
 ['HGUID', 'N', 10, 0],
 ['HGUNumber', 'N', 10, 0],
 ['NafHGUNumb', 'N', 10, 0],
 ['SHAPE_Leng', 'F', 19, 11]]
recs[3] #print the first record, then this is a list that can be subscripted further
Record #3: [32002002, '652800645', 30027773, -147.26, -154.26, 31000045, 1044, 125005, 0.0]
shapes[1].points #print the point values of the first shape
[(591975.5150000006, -3816141.8817), (591975.5150000006, -3816141.8817)]
shapeRead.shapeTypeName 
'POLYLINEZ'
rec= shapeRead.record(0)
rec['TopElev']
6.74

Challenge.

  • Look at the data above. It provides the coordinates of the wells as points.
  • How many coordinates are provided for each well? Why do you think this is?
  • What is the Bottom Elevation of the 300th record?
Solution

There are two coordinates. But they are duplicated.

    
rec= shapeRead.record(299)
rec['BottomElev']
    
#or
    
recs[299][4]
#Here is a slightly neater way to read in the data, but it looks confusing at first.
#But we will need it in this form for our next exercise.

#This type of assignment is known as "list comprehension"
#fields = [x[0] for x in shapeRead.fields][1:]

#Break this down line by line
#for x in shapeRead.fields:
#    print(x)

#Now just print the 1st (0th) column of each list variable
#for x in shapeRead.fields:
#    print(x[0])

#But we want to save these values in a list (not just print them out).
#fields=[]
#for x in shapeRead.fields:
#    fields.append(x[0])

#And we don't want the DeletionFlag field, so we need to just get all the values except the first
#fields=fields[1:]

#fields = [x[0] for x in shapeRead.fields][1:]

#Do a list comprehnsion for the the other variable too
#shps = [s.points for s in shapeRead.shapes()]

Shapefiles are not a native python format, but the community have developed tools for exploring them. The package we have used “pyshp” imported with the name “shapefile” (for some non-consistent weird reason), is one example of working with shapefiles. Alternatives exist.

Dataframes and table manipulation

Pandas is one of the most useful packages (along with probably numpy and matplotlib). We will use it several times throughout the course for data handling and manipulation.

#As before, read in the shapefile
boreshape='../data/shp_torrens_river/NGIS_BoreLine.shp'

#Read the shapefile attributes to variables
shapeRead = shapefile.Reader(boreshape)
fields = [x[0] for x in shapeRead.fields][1:]
shps = [s.points for s in shapeRead.shapes()]
recs= shapeRead.records()
import pandas
#Now convert the variables to a pandas dataframe
df = pandas.DataFrame(columns=fields, data=recs)

#See more details at the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
df
HydroID HydroCode BoreID TopElev BottomElev HGUID HGUNumber NafHGUNumb SHAPE_Leng
0 32001999 652800645 30027773 6.74 -74.26 31000043 1042 104005 0.0
1 32002000 652800645 30027773 -74.26 -125.26 31000109 1108 110002 0.0
2 32002001 652800645 30027773 -125.26 -147.26 31000045 1044 125005 0.0
3 32002002 652800645 30027773 -147.26 -154.26 31000045 1044 125005 0.0
4 32002003 652800645 30027773 -154.26 -168.26 31000045 1044 125005 0.0
... ... ... ... ... ... ... ... ... ...
7630 32145557 662810075 30057044 102.62 90.89 31000139 1138 100001 0.0
7631 32145558 662810075 30057044 103.08 102.62 31000139 1138 100001 0.0
7632 32145559 662813065 30060034 535.08 451.08 31000026 1025 134001 0.0
7633 32145560 662813065 30060034 451.08 171.08 31000014 1013 134001 0.0
7634 32145561 662814687 30061656 444.30 432.30 31000014 1013 134001 0.0

7635 rows × 9 columns

#Add a new column called "coords" to the DataFrame, fill it with what is in "shps"
df['coords'] = shps

#Alternatively, you can use the "assign" method
#df = df.assign(coords=shps)
df
HydroID HydroCode BoreID TopElev BottomElev HGUID HGUNumber NafHGUNumb SHAPE_Leng coords
0 32001999 652800645 30027773 6.74 -74.26 31000043 1042 104005 0.0 [(591975.5150000006, -3816141.8817), (591975.5...
1 32002000 652800645 30027773 -74.26 -125.26 31000109 1108 110002 0.0 [(591975.5150000006, -3816141.8817), (591975.5...
2 32002001 652800645 30027773 -125.26 -147.26 31000045 1044 125005 0.0 [(591975.5150000006, -3816141.8817), (591975.5...
3 32002002 652800645 30027773 -147.26 -154.26 31000045 1044 125005 0.0 [(591975.5150000006, -3816141.8817), (591975.5...
4 32002003 652800645 30027773 -154.26 -168.26 31000045 1044 125005 0.0 [(591975.5150000006, -3816141.8817), (591975.5...
... ... ... ... ... ... ... ... ... ... ...
7630 32145557 662810075 30057044 102.62 90.89 31000139 1138 100001 0.0 [(605865.9246000014, -3830429.3729999997), (60...
7631 32145558 662810075 30057044 103.08 102.62 31000139 1138 100001 0.0 [(605865.9246000014, -3830429.3729999997), (60...
7632 32145559 662813065 30060034 535.08 451.08 31000026 1025 134001 0.0 [(612545.5916999988, -3832402.8148999996), (61...
7633 32145560 662813065 30060034 451.08 171.08 31000014 1013 134001 0.0 [(612545.5916999988, -3832402.8148999996), (61...
7634 32145561 662814687 30061656 444.30 432.30 31000014 1013 134001 0.0 [(635716.0604999997, -3816751.5364999995), (63...

7635 rows × 10 columns

Pandas more frequently is used to directly read in tables. So let’s read in the csv data that came with shapefile (as this gives us some additional fields not stored in the shapefile that we can explore.

#read in the data
log_data=pandas.read_csv("../data/shp_torrens_river/NGIS_LithologyLog.csv",usecols=list(range(0,13))) 

#What is the "usecols" variable equal to?
#Try reading the data without using the usecols option, can you solve the error?
log_data           # print the first 30 and last 30 rows
OBJECTID BoreID HydroCode RefElev RefElevDesc FromDepth ToDepth TopElev BottomElev MajorLithCode MinorLithCode Description Source
0 1769789 30062892 662815923 57.25 NGS 18.0 19.5 39.25 37.75 CLYU None Clay SAGeodata
1 1769790 30062892 662815923 57.25 NGS 19.5 22.0 37.75 35.25 ROCK None Rocks and sand SAGeodata
2 1769791 30062892 662815923 57.25 NGS 22.0 24.0 35.25 33.25 CLYU None Clay SAGeodata
3 1770725 30141910 662816624 4.0 NGS 0.0 6.0 4.0 -2.0 None None No sample SAGeodata
4 1770726 30141910 662816624 4.0 NGS 6.0 15.0 -2.0 -11.0 CLYU None Clay - orange-red grey, mottled; stiff-sticky.... SAGeodata
... ... ... ... ... ... ... ... ... ... ... ... ... ...
70168 4120345 30304039 662829228 None UNK 9.0 10.0 None None CLYU None Sandy clay SAGeodata
70169 4120346 30304039 662829228 None UNK 10.0 12.5 None None SAND None Clay sand SAGeodata
70170 4120347 30304050 652802882 None UNK 0.0 0.3 None None FILL None Fill SAGeodata
70171 4120348 30304050 652802882 None UNK 0.3 0.8 None None SAND None Clayey sand SAGeodata
70172 4120349 30304050 652802882 None UNK 0.8 3.5 None None SAND None Sand SAGeodata

70173 rows × 13 columns

# add a new column as a function of existing columns
log_data['Thickness'] = log_data.ToDepth - log_data.FromDepth
type(log_data)     # see what Python type the DataFrame is
pandas.core.frame.DataFrame
log_data.head(3)    # print the first 3 rows
OBJECTID BoreID HydroCode RefElev RefElevDesc FromDepth ToDepth TopElev BottomElev MajorLithCode MinorLithCode Description Source Thickness
0 1769789 30062892 662815923 57.25 NGS 18.0 19.5 39.25 37.75 CLYU None Clay SAGeodata 1.5
1 1769790 30062892 662815923 57.25 NGS 19.5 22.0 37.75 35.25 ROCK None Rocks and sand SAGeodata 2.5
2 1769791 30062892 662815923 57.25 NGS 22.0 24.0 35.25 33.25 CLYU None Clay SAGeodata 2.0
log_data.index     # “the index” (aka “the labels”). 
#Pandas is great for using timeseries data, where the index can be the timestamps
RangeIndex(start=0, stop=70173, step=1)
log_data.columns  # column names (which is “an index”)
Index(['OBJECTID', 'BoreID', 'HydroCode', 'RefElev', 'RefElevDesc',
       'FromDepth', 'ToDepth', 'TopElev', 'BottomElev', 'MajorLithCode',
       'MinorLithCode', 'Description', 'Source', 'Thickness'],
      dtype='object')
log_data.dtypes    # data types of each column
OBJECTID           int64
BoreID             int64
HydroCode          int64
RefElev           object
RefElevDesc       object
FromDepth        float64
ToDepth          float64
TopElev           object
BottomElev        object
MajorLithCode     object
MinorLithCode     object
Description       object
Source            object
Thickness        float64
dtype: object
log_data.shape     # number of rows and columns
(70173, 14)
log_data.values    # underlying numpy array — df are stored as numpy arrays for efficiencies.
array([[1769789, 30062892, 662815923, ..., 'Clay', 'SAGeodata', 1.5],
       [1769790, 30062892, 662815923, ..., 'Rocks and sand', 'SAGeodata',
        2.5],
       [1769791, 30062892, 662815923, ..., 'Clay', 'SAGeodata', 2.0],
       ...,
       [4120347, 30304050, 652802882, ..., 'Fill', 'SAGeodata', 0.3],
       [4120348, 30304050, 652802882, ..., 'Clayey sand', 'SAGeodata',
        0.5],
       [4120349, 30304050, 652802882, ..., 'Sand', 'SAGeodata', 2.7]],
      dtype=object)
#log_data['MajorLithCode']         # select one column
##Equivalent to 
#log_data.MajorLithCode 
##and
#log_data.iloc[:,9]
##and
#log_data.loc[:,'MajorLithCode']
type(log_data['MajorLithCode'])   # determine datatype of column (e.g., Series)
pandas.core.series.Series
#describe the data frame
log_data.describe(include='all')     
OBJECTID BoreID HydroCode RefElev RefElevDesc FromDepth ToDepth TopElev BottomElev MajorLithCode MinorLithCode Description Source Thickness
count 7.017300e+04 7.017300e+04 7.017300e+04 70173 70173 70173.000000 70173.000000 70173 70173 70173 70173 70173 70173 70173.000000
unique NaN NaN NaN 5068 4 NaN NaN 27784 27883 81 42 33614 54 NaN
top NaN NaN NaN None NGS NaN NaN None None CLYU None Clay SAGeodata NaN
freq NaN NaN NaN 18514 44947 NaN NaN 18514 18514 25861 62812 4603 70119 NaN
mean 2.505677e+06 3.018198e+07 6.624491e+08 NaN NaN 24.938020 30.621160 NaN NaN NaN NaN NaN NaN 5.683139
std 9.276182e+05 8.069609e+04 2.130226e+06 NaN NaN 45.431762 48.605931 NaN NaN NaN NaN NaN NaN 9.942400
min 1.769789e+06 3.002715e+07 6.528000e+08 NaN NaN 0.000000 0.010000 NaN NaN NaN NaN NaN NaN 0.000000
25% 1.932741e+06 3.014557e+07 6.628129e+08 NaN NaN 0.800000 3.960000 NaN NaN NaN NaN NaN NaN 1.000000
50% 1.999028e+06 3.018487e+07 6.628196e+08 NaN NaN 7.000000 11.500000 NaN NaN NaN NaN NaN NaN 2.800000
75% 3.967146e+06 3.025487e+07 6.628248e+08 NaN NaN 25.800000 34.700000 NaN NaN NaN NaN NaN NaN 7.000000
max 4.120349e+06 3.030405e+07 6.728042e+08 NaN NaN 610.300000 620.160000 NaN NaN NaN NaN NaN NaN 300.500000
# summarise a pandas Series
log_data.FromDepth.describe()   # describe a single column
count    70173.000000
mean        24.938020
std         45.431762
min          0.000000
25%          0.800000
50%          7.000000
75%         25.800000
max        610.300000
Name: FromDepth, dtype: float64
#calculate mean of 6th column ("FromDepth")
log_data.iloc[:,5].mean()      
24.93802017870121
#alternate method to calculate mean of FromDepth column (the 6th one)
log_data["FromDepth"].mean()    
24.93802017870121
#Count how many Lith Codes there are
lithCounts=log_data.MajorLithCode.value_counts()
#Print the lithcodes, use .index or .values 
lithCounts
CLYU    25861
SAND    12772
SLAT     4071
FILL     4020
SDST     3209
        ...  
DIOR        1
SANN        1
HFLS        1
CALU        1
REGO        1
Name: MajorLithCode, Length: 81, dtype: int64
#plot a bar chart of the lith codes
lithCounts.plot.bar(figsize=(15,5));

#Plot a bar chart of the lith codes for the rarer lithologies
lithCounts[(lithCounts < 50)].plot.bar(rot=45,figsize=(15,5));

import numpy as np
import matplotlib.pyplot as plt
 
#Pandas data
x = log_data.Thickness
mu = log_data.Thickness.mean()
sigma = log_data.Thickness.std()

# An Equivalent Numpy version
#x = log_data.Thickness.values
#mu = np.mean(x) # mean of distribution
#sigma = np.std(x) # standard deviation of distribution

# the histogram of the data
plt.hist(x, bins=np.arange(0,20,1), alpha=0.5)
plt.xlabel('Thickness (m)')
plt.ylabel('Count')
mystring="Histogram with a mean of "+ str(np.round(mu,2)) + " & SD " + str(np.round(sigma,2))
plt.title(mystring)
 
# Tweak spacing to prevent clipping of ylabel
#plt.subplots_adjust(left=0.15)
plt.show()

Challenge

Plot a histogram of the thickness where the “MajorLithCode” is “FILL”.

Hint: to filter a pandas data frame by value use the following syntax:

df[df['Variable'] == "value"]

Solution
x = log_data[log_data['MajorLithCode'] == "FILL"]['Thickness'].values
#OR
#x = log_data[log_data.MajorLithCode == "FILL"].Thickness.values

plt.hist(x, bins=np.arange(0,20,1), alpha=0.5)
plt.show()
# import numpy as np
# cmap = plt.get_cmap('viridis')
# colors = cmap(np.linspace(0, 1, len(lithCounts.index)))
# colors

# for row in log_data.itertuples():
#     boreid=row[3]
#     for ind,value in enumerate(recs):  
#         try:
#             value.index(boreid)
#             print(recs)
#         except:
#             continue
#     #(row[3])


#You can plot the location of the bores slowly
# for ind, value in enumerate(recs):
#     #Get the lat lon value
#     lon=df.coords[ind][0][0]
#     lat=df.coords[ind][0][1]
#     #Get the Lithology unit
#     #value[]
    
#     #Now add the point to the plot
#     plt.plot(lon,lat,"|")
    
# plt.show()

# #or fast
# lons= [df.coords[i][0][0] for i in range(1,len(recs))] 
# lats= [df.coords[i][0][1] for i in range(1,len(recs))] 
# plt.plot(lons,lats,"|")
# plt.show()

Extra credit challenge

Go to http://www.bom.gov.au/water/groundwater/explorer/map.shtml and pick another River Region. Download the dataset in “Shapefile” format (this will download the csv also). Once you have the data, follow the same routines as above and see what you can find out about the river region.

Solution

TODO Nate

Log ASCII Files

Python has a wide range of packages/libraries to do specific tasks. You can often create your own tools for doing niche tasks, but often you will find that many already exist to make things simpler for you. We will explore libraries that work with borehole data (in .las format) with the lasio library.

This tutorial based off https://towardsdatascience.com/handling-big-volume-of-well-log-data-with-a-boosted-time-efficiency-with-python-dfe0319daf26

Original Data from: https://sarigbasis.pir.sa.gov.au/WebtopEw/ws/samref/sarig1/image/DDD/PEDP013LOGS.zip

Title: Cooper Basin selected well logs in LAS format.
Publication Date: November 20
Prepared by: Energy Resources Division, Department of the Premier and Cabinet
This Record URL: https://sarigbasis.pir.sa.gov.au/WebtopEw/ws/samref/sarig1/wci/Record?r=0&m=1&w=catno=2040037

#For plotting
import matplotlib.pyplot as plt

#Library specifically for "well data"
import lasio

#To read files
import glob

#For "regular expression manipulation"
import re
#Build a list of filenames to read
read_files = glob.glob("../data/WELL/*.las")
read_files
['../data/WELL/BoolLagoon1.las',
 '../data/WELL/Bungaloo1.las',
 '../data/WELL/BeachportEast1.las',
 '../data/WELL/BiscuitFlat1.las',
 '../data/WELL/Balnaves.las',
 '../data/WELL/Banyula.las',
 '../data/WELL/Burrungule1.las',
 '../data/WELL/Beachport1.las']

Note: the possibility of Windows VS Unix character interpretations.

#Cut out just the name of the well from the filenames
well_names = []
for file in read_files:
    print("FILE:", file)
    #Split the filepath at a "/" OR a ".las" OR a "\"
    well=re.split(r'/|\\|.las',file)
    print("SPLIT:", well, "\n")
    well_names.append(well[-2])

print("There are ", len(well_names), "wells.")
print(well_names)
FILE: ../data/WELL/BoolLagoon1.las
SPLIT: ['..', 'data', 'WELL', 'BoolLagoon1', ''] 

FILE: ../data/WELL/Bungaloo1.las
SPLIT: ['..', 'data', 'WELL', 'Bungaloo1', ''] 

FILE: ../data/WELL/BeachportEast1.las
SPLIT: ['..', 'data', 'WELL', 'BeachportEast1', ''] 

FILE: ../data/WELL/BiscuitFlat1.las
SPLIT: ['..', 'data', 'WELL', 'BiscuitFlat1', ''] 

FILE: ../data/WELL/Balnaves.las
SPLIT: ['..', 'data', 'WELL', 'Balnaves', ''] 

FILE: ../data/WELL/Banyula.las
SPLIT: ['..', 'data', 'WELL', 'Banyula', ''] 

FILE: ../data/WELL/Burrungule1.las
SPLIT: ['..', 'data', 'WELL', 'Burrungule1', ''] 

FILE: ../data/WELL/Beachport1.las
SPLIT: ['..', 'data', 'WELL', 'Beachport1', ''] 

There are  8 wells.
['BoolLagoon1', 'Bungaloo1', 'BeachportEast1', 'BiscuitFlat1', 'Balnaves', 'Banyula', 'Burrungule1', 'Beachport1']
#Now actually read in the log files to lasio
#The last cell was just to automatically make a nicely formatted list of well names!
lases = []
for files in read_files:
    las = lasio.read(files)
    lases.append(las)
#You can get an idea of what you can interogate using the help function
#help(lases)
#This is just a regular Python list! But the list contains
#in this case, special objects known as "LasFile(s)" or lasio.las object.
#Get some details using help again
#help(lases[1])
#From there we can get some info from each of the wells
j=0
for well in lases:
    #e.g. pull out the varaibles availble from the wells
    print("Wellid:", j, well_names[j])
    j+=1
    print(well.keys())
Wellid: 0 BoolLagoon1
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'SP']
Wellid: 1 Bungaloo1
['DEPTH', 'CALI', 'DRHO', 'DT', 'DTS', 'GR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'RMIC', 'SP']
Wellid: 2 BeachportEast1
['DEPTH', 'GR', 'RDEP', 'RMED', 'SP']
Wellid: 3 BiscuitFlat1
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'MINV', 'MNOR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'RMIC', 'SP']
Wellid: 4 Balnaves
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'MINV', 'MNOR', 'NPHI', 'PEF', 'RDEP', 'RHOB', 'RMED', 'RMIC', 'SP']
Wellid: 5 Banyula
['DEPTH', 'CALI', 'DRHO', 'DT', 'GR', 'NPHI', 'RDEP', 'RHOB', 'RMED', 'SP']
Wellid: 6 Burrungule1
['DEPTH', 'CALI', 'DT', 'GR', 'RDEP', 'RMED', 'SP']
Wellid: 7 Beachport1
['DEPTH', 'CALI', 'MINV', 'MNOR', 'RDEP', 'RMED', 'SP']
#Set a wellid you want to explore more
wellid=1
#Make a plot of one of the wells
plt.plot(lases[wellid]['DRHO'],lases[wellid]['DEPTH'])

You have just plotted the density (DRHO) at each measured depth point. You can clean this up and present it better in the next few cells.

#Get some more info out of the well data
print(lases[wellid].curves)
Mnemonic  Unit   Value  Description                                                                                        
--------  ----   -----  -----------                                                                                        
DEPTH     M             Depth                                                                                              
CALI      in            Caliper     CAL Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis          
DRHO      g/cm3         DenCorr     ZCOR Edited, bungaloo_1_mll_rtex_xyzdl_r6.dlis                                         
DT        us/ft         Sonic       DT24 DT24.I Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis  
DTS       us/ft         ShearSonic  DTS , bungaloo_1_mll_rtex_r2.dlis                                                      
GR        gAPI          GammaRay    GR Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis           
NPHI      dec           Neutron     CNC Edited, bungaloo_1_neutron_r2.dlis                                                 
PEF       b/e           PEFactor    PE Edited, bungaloo_1_mll_rtex_xyzdl_r6.dlis                                           
RDEP      ohmm          DeepRes     MLR4C Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis        
RHOB      g/cm3         Density     ZDNC Edited, bungaloo_1_mll_rtex_xyzdl_r6.dlis                                         
RMED      ohmm          MedRes      MLR2C Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis        
RMIC      ohmm          MicroRes    RMLL Spliced, Edited, bungaloo_1_mll_rtex_r1.dlis, bungaloo_1_mll_rtex_r2.dlis         
SP        mV            SponPot     SPWDH Edited, bungaloo_1_mll_rtex_r2.dlis                                              

Challenge

Run this bit of code. Then add additional mnemonic plots to the figure.

#Import additional packages we will need
import numpy as np
import pandas as pd

#Convert a data array to a pandas dataframe
#and find significant spikes in the data
#Return the spikes as a binary 1 or 0 array
def find_unc(data):
    #Convert data to pandas
    df=pd.DataFrame(data)
    #Caluclate the rolling average 
    #(200 is somewhat arbitray value to take the rolling average over)
    df_mean = df.rolling(200).mean()
    #Calculate the percent change (i.e any points of change) of the data
    df_change = df_mean.pct_change(periods=200)
    #Convert large percent changes to 1 or 0
    #0.5 (50%) is a somewhat arbitray number 
    #to set as the amount of change in the data
    dfbin = ((df_change < -0.5) | (df_change > 0.5)).astype(int)
    #Return the binaray array
    return(dfbin)

#Define a function to make the plot and set parameters
def make_plot(i,var,colour):
    #Set the data to a variable
    data=lases[wellid][var]
    #Find the spikes in the data
    dfbin=find_unc(data)
    #Now perform the plotting
    top=min(lases[wellid]['DEPTH'])
    bot=max(lases[wellid]['DEPTH'])
    ax[i].plot(dfbin*np.nanmax(data), lases[wellid]['DEPTH'], color = 'black', linewidth = 0.5)
    ax[i].plot(data, lases[wellid]['DEPTH'], color = colour, linewidth = 0.5)
    ax[i].set_xlabel(var)
    ax[i].xaxis.label.set_color(colour)
    ax[i].set_xlim(np.nanpercentile(lases[wellid][var],0.5), np.nanpercentile(lases[wellid][var],99.5))
    ax[i].tick_params(axis='x', colors=colour)
    ax[i].title.set_color(colour)
    ax[i].set_ylim(top,bot)
    ax[i].invert_yaxis()
    ax[i].tick_params(left=False,
                bottom=True,
                labelleft=False,
                labelbottom=True)

#Make the figure
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(16,6))

#Add a plot of each mnemoic to your figure
make_plot(0,"GR","green")
make_plot(1,"RDEP","red")

#Quick way to get the list of keys
#lases[wellid].keys()

#Fix the details on the figure
plt.subplots_adjust(wspace=0.01)
ax[0].set_ylabel("Depth (m)")
ax[0].tick_params(left=True,
            bottom=True,
            labelleft=True,
            labelbottom=True)
Solution

To solve the challenge you can change the ncols varibale and then add new calls to the make_plot function.

#Change the number of columns
fig, ax = plt.subplots(nrows=1, ncols=8, figsize=(16,6))
    
#Add additional calls to the make plot
make_plot(0,"CALI","green")
make_plot(1,"DRHO","red")
make_plot(2,"DT","blue")
make_plot(3,"GR","purple")
make_plot(4,"RDEP","cyan")
make_plot(5,"RHOB","pink")
make_plot(6,"RMED","brown")
make_plot(7,"SP","orange")

To learn more about the smoothing steps make a diagnostic plots at each step.

#Set an example dataset, i.e.
#wellid 1 and var is "GR"
data=lases[1]["GR"]

#Convert data to pandas
df=pd.DataFrame(data)

plt.plot(df)
plt.title("Raw data")
plt.show()

#Caluclate the rolling average 
df_mean = df.rolling(200).mean()

plt.plot(df_mean)
plt.title("Smoothed data")
plt.show()

#Calculate the percent change (i.e any points of change) of the data
df_change = df_mean.pct_change(periods=200)

plt.plot(df_change)
plt.title("Percentage changes along the smoothed data")
plt.show()

#Convert large percent changes to 1 or 0
dfbin = ((df_change < -0.2) | (df_change > 0.2)).astype(int)

plt.plot(dfbin)
plt.title("'Binarised' version of large percentage changes")
plt.show()

SEGY Seismic data processing

from obspy.io.segy.segy import _read_segy
import matplotlib.pyplot as plt
import numpy as np

#Adapted from https://agilescientific.com/blog/2016/9/21/x-lines-of-python-read-and-write-seg-y
#See the notebooks here for more good examples
#https://github.com/agile-geoscience/xlines
#Set the filename of the segy data

filename="../data/james/james_1959_pstm_tvfk_gain.sgy"

#Data randomly chosen from here:
#Title: 2006 James 3D Seismic Survey.
#Author: White, A.
#Prepared by: Terrex Seismic Pty Ltd; Pioneer Surveys Pty Ltd; WestenGeco
#Tenement: PPL00182
#Operator: Santos Ltd
#https://sarigbasis.pir.sa.gov.au/WebtopEw/ws/samref/sarig1/wci/Record?r=0&m=1&w=catno=2035790
#This will take about 1 minute. 
#When the [*] changes to [52] and the circle in the top right is clear, it has completed
stream = _read_segy(filename, headonly=True)
print(np.shape(stream.traces))
stream
(48832,)
48832 traces in the SEG Y structure.
#Look at a single trace
one_trace = stream.traces[10000]

#Print out details single trace
print(one_trace)

plt.figure(figsize=(16,2))
plt.plot(one_trace.data)
plt.show()
Trace sequence number within line: 10001
1001 samples, dtype=float32, 250.00 Hz

#Stack multiple traces into a single numpy array
data = np.stack([t.data for t in stream.traces[12320:12320+500]])
#What does the stacked data look like
data.shape
(500, 1001)
#Have a look at the data
plt.imshow(data.T, cmap="Greys", aspect='auto')
<matplotlib.image.AxesImage at 0x16a4983a0>

#Make a more informative plot

#Restrict the data to the 95th percentile
vm = np.percentile(data, 95)

print("The 95th percentile is {:.0f}; the max amplitude is {:.0f}".format(vm, data.max()))

#Make the plot
plt.figure(figsize=(16,8))
plt.imshow(data.T, cmap="RdBu", vmin=-vm, vmax=vm, aspect='auto')
plt.colorbar()
plt.show()
The 95th percentile is 4365; the max amplitude is 34148

#What else is in the data?

#Print out the segy headers
print(stream.textual_file_header.decode())
C 1 CLIENT SANTOS                 COMPANY                       CREW NO         C 2 LINE    2000.00 AREA JAMES3D                                                C 3 REEL NO           DAY-START OF REEL     YEAR      OBSERVER                  C 4 INSTRUMENT  MFG            MODEL            SERIAL NO                       C 5 DATA TRACES/RECORD 24569  AUXILIARY TRACES/RECORD       0 CDP FOLD    40    C 6 SAMPLE INTERVAL  4.00   SAMPLES/TRACE  1001 BITS/IN      BYTES/SAMPLE  4    C 7 RECORDING FORMAT        FORMAT THIS REEL SEG-Y  MEASUREMENT SYSTEM METERS   C 8 SAMPLE CODE FLOATING PT                                                     C09 JAMES 3D                                                                    C10 WESTERNGECO                                                                 C11 MARCH 2007                                                                  C12 VERSION : James3D_pstm_tvfk_gain                                            C13 FILTERED TRIM PSTM STACK                                                    C14                                                                             C15 GEOMETRY APPLY-TAR-MINP-                                                    C16 NOISE REDUCTION - SWATT                                                     C17  SC DECON - SCAC                                                            C18 RESIDUAL_STATICS                                                            C19  TRIM_STATICS - INVERSE_TAR - SORT                                          C20 PSTM  - SORT  - GAIN                                                        C21 TRIM_STATICS - STACK                                                        C22 SPECW_10-70HZ -TVF_10-75HZ-TRACE_BALANCE                                    C23                                                                             C24                                                                             C25                                                                             C26                                                                             C27                                                                             C28                                                                             C29                                                                             C30                                                                             C31                                                                             C32                                                                             C33                                                                             C34                                                                             C35                                                                             C36                                                                             C37                                                                             C38                                                                             C39                                                                             C40 END EBCDIC                                                                  
#And the header information for a particular trace
print(stream.traces[1013].header)
trace_sequence_number_within_line: 1014
trace_sequence_number_within_segy_file: 1014
original_field_record_number: 2004
trace_number_within_the_original_field_record: 1
energy_source_point_number: 10026
ensemble_number: 10026
trace_number_within_the_ensemble: 28
trace_identification_code: 1
number_of_vertically_summed_traces_yielding_this_trace: 1
number_of_horizontally_stacked_traces_yielding_this_trace: 13
data_use: 1
distance_from_center_of_the_source_point_to_the_center_of_the_receiver_group: 0
receiver_group_elevation: 0
surface_elevation_at_source: 0
source_depth_below_surface: 0
datum_elevation_at_receiver_group: 0
datum_elevation_at_source: 0
water_depth_at_source: 0
water_depth_at_group: 0
scalar_to_be_applied_to_all_elevations_and_depths: 1
scalar_to_be_applied_to_all_coordinates: 1
source_coordinate_x: 482760
source_coordinate_y: 7035836
group_coordinate_x: 482760
group_coordinate_y: 7035836
coordinate_units: 1
weathering_velocity: 0
subweathering_velocity: 0
uphole_time_at_source_in_ms: 0
uphole_time_at_group_in_ms: 0
source_static_correction_in_ms: 0
group_static_correction_in_ms: 0
total_static_applied_in_ms: -70
lag_time_A: 0
lag_time_B: 0
delay_recording_time: 0
mute_time_start_time_in_ms: 0
mute_time_end_time_in_ms: 20
number_of_samples_in_this_trace: 1001
sample_interval_in_ms_for_this_trace: 4000
gain_type_of_field_instruments: 0
instrument_gain_constant: 0
instrument_early_or_initial_gain: 0
correlated: 0
sweep_frequency_at_start: 0
sweep_frequency_at_end: 0
sweep_length_in_ms: 0
sweep_type: 0
sweep_trace_taper_length_at_start_in_ms: 0
sweep_trace_taper_length_at_end_in_ms: 0
taper_type: 0
alias_filter_frequency: 0
alias_filter_slope: 0
notch_filter_frequency: 0
notch_filter_slope: 0
low_cut_frequency: 0
high_cut_frequency: 0
low_cut_slope: 0
high_cut_slope: 0
year_data_recorded: 0
day_of_year: 0
hour_of_day: 0
minute_of_hour: 0
second_of_minute: 0
time_basis_code: 0
trace_weighting_factor: 0
geophone_group_number_of_roll_switch_position_one: 0
geophone_group_number_of_trace_number_one: 0
geophone_group_number_of_last_trace: 0
gap_size: 0
over_travel_associated_with_taper: 0
x_coordinate_of_ensemble_position_of_this_trace: 0
y_coordinate_of_ensemble_position_of_this_trace: 0
for_3d_poststack_data_this_field_is_for_in_line_number: 0
for_3d_poststack_data_this_field_is_for_cross_line_number: -4587520
shotpoint_number: 2004
scalar_to_be_applied_to_the_shotpoint_number: 0
trace_value_measurement_unit: 10026
transduction_constant_mantissa: 0
transduction_constant_exponent: 0
transduction_units: 0
device_trace_identifier: 0
scalar_to_be_applied_to_times: 1052
source_type_orientation: 0
source_energy_direction_mantissa: 0
source_energy_direction_exponent: 1607
source_measurement_mantissa: 0
source_measurement_exponent: 0
source_measurement_unit: 0
#You can automatically extract data you might need from the header
#Get the sample interval from the header info
dt = stream.traces[0].header.sample_interval_in_ms_for_this_trace / 1e6
dt
0.004

Challenge

A single seismic section can be viewed with this snippet of code:

#Set number of xlines
n=262
#Set start iline
m=0

print(m,m*n,m*n+n)
data = np.stack(t.data for t in stream.traces[m*n:m*n+n])
vm = np.percentile(data, 95)
plt.figure(figsize=(14,4))
plt.imshow(data.T,cmap="RdBu", vmin=-vm, vmax=vm, aspect='auto')
plt.show()

Can you put this in a loop to show multiple sections at once?

Solution

#Set number of xlines
n=262
#Set start iline
m=0

while m < 10:
    print(m,m*n,m*n+n)
    data = np.stack(t.data for t in stream.traces[m*n:m*n+n])
    vm = np.percentile(data, 95)
    plt.figure(figsize=(14,4))
    plt.imshow(data.T,cmap="RdBu", vmin=-vm, vmax=vm, aspect='auto')
    plt.show()
    m=m+1

Roll-your-own data reader

Western seismic VELF format

Sometimes there are no good libraries or data-readers availble for your specific use-case. Very common if you get some specialty instrument with some unique data format. Often the documentation is a good place to start for figuring out how to interpret the data, and more-often-than not, the header of the data can give you all the information you need. Download this VELF file, containing some 3D seismic data. I could not find any good python libraries to handle this dataset, so we can just try out a few things to get the data into a format that is useful for us.

#Imports for plotting
import matplotlib.pyplot as plt
# Load Builtin colormaps, colormap handling utilities, and the ScalarMappable mixin.
from matplotlib import cm
import pandas as pd
#Open the file for reading
f = open("../data/S3D_Vrms_StkVels_VELF.txt",'r')

#Read in all the lines in the file and save them to a variable
mylist = f.readlines()

#Close the file
f.close()
print("Done reading file.")
Done reading file.
#Print out the first 20 lines
print(mylist[0:20])
['Client: XXX \n', 'Project: YYY\n', 'Contractor: ZZZ\n', 'Date:\n', '\n', 'Velocity type: RMS Velocity in Time\n', '\n', '\n', 'Datum: GDA94, UTM Zone: UTM53, Central Meridian :  \n', 'Statics: Two way time corrected to mean sea level: No\n', '         Gun and Cable statics applied: No\n', '         Tidal statics applied: No\n', '\n', '3D Grid details:\n', 'inline    crossline      X            Y\n', '\n', '1000        5000      599413.78   7382223.37\n', '1000        5309      595633.30   7375486.63\n', '1448        5000      609180.96   7376742.28\n', '1448        5309      605400.48   7370005.55\n']
#Set up some empty lists to store each bit of data in
inline=[]
crossline=[]
X=[]
Y=[]

velf=[]

#Loop through all the lines in the file
for i,line in enumerate(mylist):
    
    #First split the line up (by default, split by whitespace)
    splitline=line.split()
    
    #If we encounter certain lines, save some data
    if i in [16,17,18,19]:  
        #Print out the lines (check we are doing the right thing)
        print(splitline)
        
        inline.append(int(splitline[0]))
        crossline.append(int(splitline[1]))
        X.append(float(splitline[2]))
        Y.append(float(splitline[3]))
      

    #This is where the actual data starts
    #Now depending on the key word at the start of each line
    #save the data to each particular list/array
    #Read the data in again, this time with some thought about what we actually want to do
    if i>49:
        if splitline[0]=='LINE':
            LINE = int(splitline[1])
            
        if splitline[0]=='SPNT':
            xline3d=int(splitline[1])
            binx=float(splitline[2])
            biny=float(splitline[3])
            inline3d=int(splitline[4])
            
        if splitline[0]=='VELF':
            
            for j,val in enumerate(splitline[1:]):
                #print(j,val)
                #Counting from the 0th index of splitline[1:end]
                if j%2==0:
                    t=int(val)
                else:
                    vt=int(val)
                    velf.append([LINE,xline3d,binx,biny,inline3d,t,vt]) 
['1000', '5000', '599413.78', '7382223.37']
['1000', '5309', '595633.30', '7375486.63']
['1448', '5000', '609180.96', '7376742.28']
['1448', '5309', '605400.48', '7370005.55']
#Convert the python "list" type to Pandas dataframe
df=pd.DataFrame(velf)
#Set the names of the columns
df.columns=['LINE','xline3d','binx','biny','inline3d','t','vt']

df
LINE xline3d binx biny inline3d t vt
0 1000 5080 598435.0 7380479.0 1000 0 3200
1 1000 5080 598435.0 7380479.0 1000 295 3300
2 1000 5080 598435.0 7380479.0 1000 598 4137
3 1000 5080 598435.0 7380479.0 1000 738 4537
4 1000 5080 598435.0 7380479.0 1000 1152 4500
... ... ... ... ... ... ... ...
3082 1440 5280 605580.0 7370735.0 1440 2216 5259
3083 1440 5280 605580.0 7370735.0 1440 2861 5791
3084 1440 5280 605580.0 7370735.0 1440 3526 6294
3085 1440 5280 605580.0 7370735.0 1440 4697 7077
3086 1440 5280 605580.0 7370735.0 1440 5988 7748

3087 rows × 7 columns

#Plot the target area
plt.scatter(df.binx,df.biny,c=df.xline3d)
plt.colorbar()
plt.show()

#Plot it in 3d just becasue we can
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(df.binx,df.biny,df.inline3d,c=df.xline3d)
ax.view_init(30, 70)
plt.show()

#Now, make some plots...
#One way to do this, is to
#make a 'group' for each unique seismic line
groups=df.groupby(['LINE','xline3d','inline3d'])
#Make plots by certain groupings

#Add a value to spread out the data nicely
i=0
for name,grp in groups:
    
    if name[2]==1280:
        print(name)
        plt.plot(grp.vt+i,-grp.t)
        i+=500
(1280, 5040, 1280)
(1280, 5060, 1280)
(1280, 5080, 1280)
(1280, 5100, 1280)
(1280, 5120, 1280)
(1280, 5140, 1280)
(1280, 5160, 1280)
(1280, 5180, 1280)
(1280, 5200, 1280)
(1280, 5220, 1280)
(1280, 5240, 1280)
(1280, 5260, 1280)
(1280, 5280, 1280)

from scipy.interpolate import interp1d
import numpy as np
#Normal plots
%matplotlib inline

#Fancy intereactive plots
#%matplotlib notebook
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(projection='3d')
for name,grp in groups:

    ##Plot all the data
    ax.plot(grp.binx+grp.vt,grp.biny,grp.t,'b-')
    
    ##Plot all the data with colors
#     colors=cm.seismic(grp.vt/grp.vt.max())
#     ax.scatter(grp.binx+grp.vt,grp.biny,grp.t,c=grp.vt/grp.vt.max(),cmap=cm.seismic)
    
    #Interpolate the data and plot with colors
#     x = grp.t
#     y = grp.vt
#     f = interp1d(x, y, kind='linear')

#     num=50
#     xnew = np.linspace(0,max(x),num)
#     ynew = f(xnew)
#     binx = np.linspace(min(grp.binx),max(grp.binx),num)
#     biny = np.linspace(min(grp.biny),max(grp.biny),num)
#     colours = cm.seismic(ynew/grp.vt.max())

#     ax.scatter(binx+xnew,biny,xnew,c=colours)
    
plt.show()

Bonus: Convert Text to SegY format

A work in progress… but the strategy is to simply match up the correct SEGY values with those in the text file. Grouping everything by each x-y (i-j, lat-lon) surface point, and then the z-axis will be the trace (down to a depth or a TWT etc).

from obspy.core.stream import Stream, Trace
from obspy.core.utcdatetime import UTCDateTime
from obspy.io.segy.segy import SEGYTraceHeader
from obspy.core.util.attribdict import AttribDict
from obspy.io.segy.segy import SEGYBinaryFileHeader
import sys

# Group all the text traces by their the i-j coordinates
groups=df.groupby(['LINE','xline3d'])
print(len(groups))

#Make a stream object (flush it out to begin because we have used this variable names for demos)
stream_out = None 
stream_out = Stream()

#not sure how to group the trace ensembles but can use a counter to keep track of them
ensemble_number = 0
       
for ids,df_trace in groups:
    #ids are the LINE, xline3d coordinate locations
    #trc is the subset of the full dataframe for just that i-j location

    #For each LINE-xline3d location, a trace is impdence at all the depth values, i.e.
    data = df_trace.vt.values

    # Enforce correct byte number and set to the Trace object
    data = np.require(data, dtype=np.float32)
    trace = Trace(data=data)

    # Set all the segy header information
    # Attributes in trace.stats will overwrite everything in trace.stats.segy.trace_header
    trace.stats.delta = 0.01
    trace.stats.starttime = UTCDateTime(1970,1,1,0,0,0)

    # If you want to set some additional attributes in the trace header,
    # add one and only set the attributes you want to be set. Otherwise the
    # header will be created for you with default values.
    if not hasattr(trace.stats, 'segy.trace_header'):
        trace.stats.segy = {}

    trace.stats.segy.trace_header = SEGYTraceHeader()

#         trace.stats.segy.trace_header.trace_sequence_number_within_line = index + 1
#         trace.stats.segy.trace_header.receiver_group_elevation = 0
    trace.stats.segy.trace_header.source_coordinate_x = int(df_trace.binx.values[0])
    trace.stats.segy.trace_header.source_coordinate_y = int(df_trace.biny.values[0])
    trace.stats.segy.trace_header.ensemble_number = ensemble_number #Not sure how this is actually determined
    trace.stats.segy.trace_header.lag_time_A = 2400
    trace.stats.segy.trace_header.lag_time_B = 3000
    trace.stats.segy.trace_header.number_of_samples_in_this_trace = len(data)

    ensemble_number +=1

    # Add trace to stream
    stream_out.append(trace)

# A SEGY file has file wide headers. This can be attached to the stream
# object.  If these are not set, they will be autocreated with default
# values.
stream_out.stats = AttribDict()
stream_out.stats.textual_file_header = 'Textual Header!'
stream_out.stats.binary_file_header = SEGYBinaryFileHeader()
stream_out.stats.binary_file_header.trace_sorting_code = 5
# stream.stats.binary_file_header.number_of_data_traces_per_ensemble=1

print("Stream object before writing...")
print(stream_out)

stream_out.write("TEST.sgy", format="SEGY", data_encoding=1, byteorder=sys.byteorder)

print("Stream object after writing. Will have some segy attributes...")
print(stream_out)
217
Stream object before writing...
217 Trace(s) in Stream:

... | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.060000Z | 100.0 Hz, 7 samples
...
(215 other traces)
...
... | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.100000Z | 100.0 Hz, 11 samples

[Use "print(Stream.__str__(extended=True))" to print all Traces]
Stream object after writing. Will have some segy attributes...
217 Trace(s) in Stream:

Seq. No. in line:    0 | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.060000Z | 100.0 Hz, 7 samples
...
(215 other traces)
...
Seq. No. in line:    0 | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.100000Z | 100.0 Hz, 11 samples

[Use "print(Stream.__str__(extended=True))" to print all Traces]

API (application programming interface) calls

For example: https://github.com/geological-survey-of-queensland/open-data-api

This is a general API, and can be interfaced with many types of software. Python can do it too! You can write an entire application around APIs provided by developers.

import requests
import json

# set the API 'endpoint'
api = 'https://geoscience.data.qld.gov.au/api/action/'

# construct our query using the rules set out in the documentation
query = api + 'package_search?q=quamby'

# make the get request and store it in the response object
response = requests.get(query)

# view the payload as JSON
json_response = response.json()
#What kind of data does the server return us?
type(json_response)
#json_response
dict
#Dig into the returned data and just pull out certain items we want
jr=json_response['result']['results'][1]['GeoJSONextent']
#Clean up the returned values
mysplit=jr.split(':')[-1].split('}')[0]
#And convert them into something more reasonable.
import ast
xy=ast.literal_eval(mysplit)[0]
#Now you have the data you need in a useful format you can do whatever you like!
xy
[[[139.901170853999, -20.4651656329993],
  [139.884504220999, -20.4651658769991],
  [139.867837588999, -20.4651661309991],
  [139.851170957999, -20.4651663759992],
  [139.834504324999, -20.4651666519991],
  [139.834504414999, -20.4484997749993],
  [139.834504493999, -20.4318328969991],
  [139.834504583999, -20.4151660189993],
  [139.834504673999, -20.3984991289993],
  [139.834504817999, -20.3818322399992],
  [139.834504974999, -20.3651653509993],
  [139.834505130999, -20.3484984619992],
  [139.834505276999, -20.3318315619991],
  [139.817838621999, -20.3318318499993],
  [139.801171968999, -20.3318321489993],
  [139.784505313999, -20.3318324379992],
  [139.784505458999, -20.3151655479993],
  [139.784505604999, -20.2984986709991],
  [139.801172269999, -20.2984983719992],
  [139.817838923999, -20.2984980729993],
  [139.834505589999, -20.2984977849992],
  [139.851172242999, -20.2984974849992],
  [139.867838875999, -20.2984972409993],
  [139.884505496999, -20.2984970089993],
  [139.901172127999, -20.2984967649993],
  [139.901171993999, -20.3151636539992],
  [139.901171860999, -20.3318305429993],
  [139.901171725999, -20.3484974319992],
  [139.901171591999, -20.3651643209993],
  [139.901171445999, -20.3818312099993],
  [139.901171311999, -20.3984980879992],
  [139.901171199999, -20.4151649779992],
  [139.901171087999, -20.4318318659992],
  [139.901170965999, -20.4484987439991],
  [139.901170853999, -20.4651656329993]]]

Key points

  • Obspy can be used to work with segy seismic data
  • Lasio can be used to work with well log data
  • Pyshp can be used to work with Shapefiles
  • Pandas dataframes are the best format for working with tabular data of mixed numeric/character types
  • Numpy arrays are faster when working with purely numeric data
  • Make your own “data-reader” with native Python
  • API calls can be made to build your own workflows and applications with Python

All materials copyright Sydney Informatics Hub, University of Sydney