Data Science for Geoscience
This course provides an overview of the most relevant areas of data science to address geoscientific challenges and questions as they pertain to the environment, earth resources & hazards. The focus lies on the methods that treat common characters of geoscientific data: multivariate, multi-scale, compositional, geospatial and space- time. In addition, the course will treat those statistical method that allow a quantification of the “human dimension” by looking at quantifying impact on humans (e.g. hazards, contamination) and how humans impact the environment (e.g. contamination, land use). The course focuses on developing skills that are not covered in traditional statistics and machine learning courses.
The material aims at exposure and application over in-depth methodological or theoretical development. Data science areas covered are: extreme value statistics, multi-variate analysis, factor analysis, compositional data analysis, spatial information aggregation, spatial analysis and estimation, geostatistics and spatial uncertainty, treating data of different scales of observation, spatio-temporal modeling. The focus lies on developing practical skills on real data sets, executing software and interpreting results.
The objectives of this course are to:
- Discover fields of data science typically not covered in traditional courses
- Identify a combination of data science methods to address a specific geoscientific question or challenge whether related to the environment, earth resources or hazard, and its impact on humans
- Use statistical software on real datasets and communicate the results to a non-expert audience
Part I: Extremes
- Statistical analysis of skew data
- Extreme value statistics
- Applications: size and magnitude distributions (volcanoes, diamonds, earthquakes), extreme flooding, weather, climate.
Part II: Compositions
- Compositional data analysis
- Applications: geochemical data in Earth Resources
Part III: Causality
- Multivariate analysis of compositional data
- Application: pollution, water quality, anomaly detection, Earth Resources prospecting.
Part IV: Geospatial analysis
- Bayesian aggregation of geospatial information
- Weights of Evidence method
- Logistic regression
Part V: Spatial uncertainty
- Spatial analysis, geostatistics & spatial uncertainty
- Application: interpolating remote sensing data, pollution data, groundwater/reservoir modeling
- Variogram Analysis
- Multiple-point geostatistics
Geoscientists and geo-engineers who wish to expand their knowledge on data scientific methods specifically applicable to earth science type data sets: skew data, compositional/multivariate, spatio-temporal.
- Coles, S., Bawa, J., Trenner, L., & Dorazio, P. (2001). An introduction to statistical modeling of extreme values (Vol. 208). London: Springer.
- Pawlowsky-Glahn, V., & Buccianti, A. (2011). Compositional data analysis: Theory and applications. John Wiley & Sons.
- Härdle, W., & Simar, L. (2003). Applied multivariate statistical analysis. Berlin: Springer.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.
About the Instructor
Jef Caers received both an MSc (’93) in mining engineering / geophysics and a PhD (’97) in engineering from the Katholieke Universiteit Leuven, Belgium. Currently, he is Professor of Geological Sciences (since 2015) and previously Professor of Energy Resources Engineering at Stanford University, California, USA. He is also director of the Stanford Center for Earth Resources Forecasting, an industrial affiliates program in decision making under uncertainty with ~20 partners from the Earth Resources Industry. Dr. Caers’ research interests are quantifying uncertainty and risk in the exploration and exploitation of Earth Resources. Jef Caers has published in a diverse range of journals covering Mathematics, Statistics, Geological Sciences, Geophysics, Engineering and Computer Science. He was awarded the Vistelius award by the IAMG in 2001, was Editor-in-Chief of Computers and Geosciences (2010-2015). Dr. Caers has received several best paper awards and written four books entitled “Petroleum Geostatistics” (SPE, 2005) “Modeling Uncertainty in the Earth Sciences” (Wiley-Blackwell, 2011), “Multiple- point Geostatistics: stochastic modeling with training images” (Wiley-Blackwell, 2015) and “Quantifying Uncertainty in Subsurface Systems (Wiley-Blackwell, 2018). Dr. Caers was awarded the 2014 Krumbein Medal of the IAMG for his career achievement.