Seminar

Data Generation, Heuristics, and Machine Learning for Materials Discovery and Simulation

AI for Science seminar with Janine George, BAM Berlin and University of Jena.

Overview

Zoom password: ai4science

The on-site event will be followed by fika in the Analysen coffee area (fika from 16:00-16:30).

Abstract:

Machine learning (ML) offers powerful new strategies for accelerating the discovery and design of functional materials. In our work, we develop ML models and software frameworks for large-scale screening and advanced materials simulations, starting from robust high‑throughput quantum‑chemical workflows, such as those implemented in atomate2. [1,2]

These automated workflows enable the creation of large, high‑quality materials databases that form the foundation for data science and machine learning. In addition to experimentally known crystal structures, increasingly generative models are used to extend materials databases, which also need to be evaluated. [3]

To build predictive, scientifically grounded ML models, we use chemical bonding concepts, incorporating quantum‑chemical bonding strengths and related descriptors as physically meaningful features to predict vibrational properties and heat transport. [4,5] Beyond property prediction, we address the challenge of determining which hypothetical materials are synthesizable. To this end, we introduced co-training into a positive–unlabeled (PU) learning framework, enabling ML‑based classification even in the absence of true negative data—an essential step for screening synthesizable compounds. [6,7]

To advance atomistic simulations of complex materials, we further developed automated training pipelines for ML interatomic potentials that support both general-purpose and system‑specific potential development, as implemented in our software autoplex. [8] This automated approach has already facilitated detailed investigations of challenging systems, including the computational exploration of amorphous arsenic. [9]

Together, these developments provide a toolbox spanning workflow automation, automated ML potential training, and ML models for materials properties and synthesis, enabling scalable, data‑driven discovery and understanding of advanced materials.

 

Janine George

About the speaker:

Janine George is a Professor of Materials Informatics at Friedrich Schiller University Jena and, since 2025, the Acting Head of the Division Digital Materials Chemistry at BAM Berlin, where she leads research at the intersection of quantum chemistry, machine learning, and condensed matter physics.

Her career spans a doctorate from RWTH Aachen (Richard Dronskowski, Germany), postdoctoral work at Université catholique de Louvain (Geoffroy Hautier, Gian-Marco Rignanese, Belgium), and an ERC Starting Grant (2024). Her group has made major contributions to community software tools such as atomate2 and pymatgen and (co-)develops own software, including LobsterPy and autoplex.

 

Structured learning

This theme focuses on how to make use of structure in data to build machine learning (ML) and artificial intelligence (AI) systems which are safer, more trustworthy and generalize better. Structure includes the relationship between data, in time and space, and how the predictions change when data is transformed in specific ways, for example rotated or scaled. These topics are abstract and general but have a direct impact on the use of AI and ML in the sciences and in applications such as drugs and materials design, or medical imaging.

Rocio Mercado
  • Assistant Professor, Data Science and AI, Computer Science and Engineering
Simon Olsson
  • Associate Professor, Data Science and AI, Computer Science and Engineering