Data Science and Machine Learning in the Atmosphere, Ocean, and Climate Sciences
Updated: Mar 14, 2020
Welcome to my journey into the use of Data Science and Machine Learning in the Atmosphere, Ocean, and Climate Sciences. The goal of this blog is explore how data science and machine learning are being used in Atmosphere, Ocean, and Climate Science and facilitate entry into these topics for those that are interested. As an Atmosphere, Ocean, and Climate Scientist, I will approach these topics from that perspective.
What types of data do we use in Atmosphere, Ocean, and Climate Science?
Weather and climate datasets are the epitome of Big Data. We use datasets such as model simulations, satellite observations, weather forecasts, model climate projections, analysis/re-analysis, station measurements, soundings, radar, buoys, ships, and the list goes on and on.
Weather and climate model datasets are growing rapidly due to increasing numbers of models, simulations, forecasts, higher resolution, and inclusion of more Earth system components. Other datasets such as satellite sensor are also growing due to increased resolution. Tools for working with Big Data and the skills to use them effectively are now a necessity in Atmosphere, Ocean, and Climate Science.
What is data science and what does it have to do with Atmosphere, Ocean, and Climate Science?
While data science has become a commonly used word more recently, I feel like I have been a data scientist for as long as I can remember. To me data science is a broad term that refers to the skill set used to ask and answer questions from data and find relevant, useful information in datasets. In short, data science is finding information in data, something atmosphere, ocean, and climate scientists are well trained for.
Data science involves statistics, computer programing, and domain or disciplinary knowledge of the datasets one is working with. Computer programming skills are needed to read, quality control, and work efficiently and effectively with the data. Statistical skills and domain knowledge are needed to ask and answer scientific questions of the data. High performance computing may also be necessary if the datasets are very large and weather and climate model simulations or forecasts are involved. Software development skills may also be important to develop tools for others to use and work with the data.
The key components of data science in Atmosphere, Ocean, and Climate science that are of interest to me and will discussed in this blog are:
Python and corresponding packages and tools for handling Big Data
Developing atmosphere, ocean, and climate relevant open source software
Applying machine learning and information theory to understand sources of predictability in the atmosphere and ocean.
Speaking of machine learning...
What is machine learning and what does it have to do with Atmosphere, Ocean, and Climate Science?
Machine learning is a set of computational and statistical tools for developing models to make predictions based on data. In more common atmospheric science terms, they are algorithms for developing empirical models. Many of the tools we use in atmosphere, ocean, and climate science are well known machine learning algorithms, including:
Principal Component Analysis (often called EOFs by atmospheric scientists)
Other machine learning methods are being adopted in the atmosphere, ocean, and climate science communities, including neural networks, decision trees, random forests, and many others.
I view the machine learning algorithms that I have not previously been using in my data analysis as additional tools to add to my climate data analysis toolbox.
Join me on the journey to learn more about applying data science and machine learning in atmosphere, ocean and climate science.