How does artificial intelligence advance Earth system modelling?

How does the Earth system function? The Earth system is incredibly complex, and understanding how it works is important for the survival of our species. Earth system science is an area of knowledge that has advanced rapidly in recent years. So, too, has artificial intelligence. To learn how the latter helps the former, we spoke with Markus Reichstein, who heads the Max Planck Institute’s Biogeochemical Integration Department. He recently co-authored a paper describing how machine learning is advancing Earth system science. We asked for his insight on how artificial intelligence is helping Earth system modeling.

How is machine learning uniquely placed to help us better understand the Earth system?

The Earth system is a very complex system in which physical, chemical, and biological processes interact at spatial scales over 17 orders of magnitude. The modelling of such a system in a classical system modelling approach involves many uncertainties. Therefore, observational data plays a crucial role. In the context of the rapid development of sensor networks, including airborne and satellite remote sensing, we are now able to exploit a wealth of observations with data-driven models.

This is where machine learning comes into play. It can, for instance, automatically detect climate extremes or statistically predict dynamic variables such as precipitation or vegetation productivity. Deep learning approaches in particular can deal very naturally with spatiotemporal data as we find them for Earth. These data-driven models do not replace the classical system models, but can be used for improvement, for instance by model evaluation or model parameterization.

What are the main obstacles to leveraging machine learning in the realm of Earth system science?

Machine learning has already had a significant impact on Earth system science. Many operational products, such as the classification of land cover from satellite imagery, are based on machine learning. One of the most important challenges is the application of machine learning to dynamic processes in the atmosphere, the biosphere, and oceans. In describing these processes, physical and chemical consistency is needed (e.g. preservation of mass balances), but not guaranteed by machine learning methods. In addition, there is a need for models that an expert can interpret (i.e. explainable models). Finally, but importantly, quantification of uncertainty is an important aspect. Thus, the current efforts within research in the field of artificial intelligence to develop physically consistent, explainable, and probabilistic models will strongly boost application in the geosciences.

A further aspect is that no data-driven model can be better than the data itself – therefore the harmonization and quality control of very heterogeneous data streams is an important challenge, especially when combining geoscientific and social science data.

How big is the community interested in this fusion of machine learning and Earth system science, and how did you end up joining it?

Machine learning plays a role in almost all realms of Earth science today. Machine learning plays an increasingly important role at our annual geoscientific meetings of the American Geophysical Union or European Geophysical Union, in which 15,000 to more than 20,000 scientists participate.

I myself began to work with classical system models as early as the 1990s and recognized their limitations in comparison to observations. One approach to improving these classical systems models is to adapt their parameters or state variables in so-called data assimilation or inverse modelling approaches, but one is still bound to the overall model formulation, which is very uncertain for many processes. Machine learning then fascinated me because it allows the construction of models from data without or with a minimum of theoretical assumptions, which can then be used to question assumptions in the system models. Ultimately, it is the interaction of data-driven and theory-driven models that advances science, and not just for the Earth system.