Language Models for Geoscience Applications
Language Models for Geoscience Applications
Course Description
This course will explore the potential of Generative AI (Gen-AI) for geoscience. By examining the key concepts of large language models, and real-world applications of them, participants will gain insights into how these cutting-edge technologies are being used to solve complex geoscience challenges. The course material is aimed at geoscientists that are looking to use AI applications and want a better understanding of how they work, how to get the best out of them and how to critically evaluate their performance.
The course will begin by covering the basic concepts for understanding generative AI and Large Language Models (LLMs), including data embedding, benchmarking, and the mechanics of transformer architectures. The second section of the course will take a deeper look into advanced techniques and methodologies, including retrieval augmented generation (RAG), agents, and improving model results through prompting and grounding.
Finally, the participants will apply the course content to examine critical discussions for the ethical use of generative AI, cybersecurity concerns, and the necessary regulatory frameworks governing AI deployment in geoscience.
Two group discussion sessions during the day include problem-solving tasks that apply the course material to real-world problems.
In the expanded two day course, we will spend more time covering the fundamental concepts behind LLMs, delve deeper into several of the topics and expand with further details around data types and data processing.
Course Objectives:
-Understand the main use cases of generative AI for geoscience data.
-Cover the main concepts of how language models work and common architectures for building chat-bots and agents.
-Critical evaluation of model outputs and techniques for improving results.
-Highlight considerations for safe and ethical use of generative AI.
Course Outline:
Day 1
Introduction to Generative AI
-Definition and Overview of Generative AI;
-Overview of Language Models and the Evolution of LLMs (Large Language Models);
-Key Use Cases and Applications in Geoscience.
Fundamental concepts in LLMs
-Understanding Transformers and Attention;
-Benchmarks and training data;
-Fine tuning and foundations models;
-Data types, data sources, and AI ready data.
Innovative Architectures (Part 1)
-Embeddings, Vector Stores and similarity measures;
-Retrieval-Augmented Generation (RAG): Architecture, Benefits, and Limitations;
-Agents and Agentic Structures: An Overview of Autonomous Systems;
-Reasoning Frameworks: Chain of Thought, ReAct, and Tree of Thought Approaches.
Group exercises
Day 2, for 2 day course
Innovative Architectures (Part 2)
-Function calling and structured outputs;
-Coding agents and vibe coding.
Technical Parameters and Challenges
-Exploring Key Parameters in LLMs;
-Understanding Hallucinations in AI and the Importance of Grounding.
Advanced Techniques for Model Optimization
-Neuroscience and LLM Functioning: An Introduction to ICL (In Context Learning);
-Prompt Engineering: Strategies for Effective LLM Interactions;
-Fine-Tuning Models: Foundation Models and Their Applications.
Ethical Considerations and Cybersecurity
-Cybersecurity Implications of Generative AI in Geoscience;
-Ethical Challenges and Considerations in AI Deployment.
Group exercises
Future Directions and Trends
-Emerging Trends in Generative AI and Their Potential Impact on Geoscience;
-Speculative Applications and Research Directions for Future Generative AI in Geoscience.
Prerequisites:
Participants should have a basic understanding of artificial intelligence and experience of using AI tools but do not need to have experience of building AI technologies. The course includes some simple code examples in Python which are illustrative only. Participants do not need to have prior experience of using Python
About the Instructor:
Thomas has a research background geology and geochemistry with a Ph.D. from Freie Universitӓt Berlin and 2 years postdoctoral experience at NTNU, Norway. He has experience working in data science and analytics covering everything from data integration, dashboarding to ML & AI solutions. Thomas has worked in different industries including oil & gas, mining, finance, and power & utilities.