Considering the particulars: how AI can help advance chemical system research

By Kate Barnes

At the most basic level, everything in the world is chemistry. A combination of particles, which form atoms, which then make up molecules.

Understanding these combinations and their outcomes plays a critical role in moving scientific discovery forward across industries, from developing new drugs to building energy efficient technologies, such as LEDs and solar cell materials.

To better grasp what these chemical and molecular combinations can truly do, however, extremely accurate and precise modeling is necessary, as is the data and infrastructure to make that modeling possible.

The development of these critical technologies is what drives Paul Zimmerman.

Zimmerman, a professor of chemistry at the University of Michigan’s College of Literature, Science, and the Arts, is a problem solver. He works to better equip researchers like himself who are seeking answers to some of the world’s most challenging questions.

Throughout his career, Zimmerman said he has noticed a common theme among researchers. Those working on applied problem solving, like designing materials used in transportation, often operate separately from those developing research tools, like computational or machine learning systems. The common assumption is that these types of work should be done in parallel, not together.

As a computational chemist, Zimmerman is working to change that narrative.

“I sat between both approaches and realized it’s very easy to get stuck in either one,” Zimmerman said. “So I wanted to find a way to bridge that gap through a computational lens. The tools to do that weren’t available, so I began building my own.”

Enter the Zimmerman Lab, a group of researchers at U-M that specialize in “developing predictive simulation techniques and their use to drive forward physical, synthetic and materials chemistry research efforts.”

An electron moves through a helix, twisting as it goes and generating a magnetic field. Results like this from a DOE-sponsored SciDAC project may inform scientists to build new devices that transmit quantum information.

“By training models with these robust data sets, we can calculate highly accurate, on-demand and synthetic points that others can then use to train their model or inform their research. The development of these tools allows for this vast amount of information to be accessible; the data sets will not sit with only one person, one company. It provides opportunities for more scientists to solve more than one complex problem at a time.”

Paul Zimmerman

Professor of Chemistry, University of Michigan College of Literature, Science, and the Arts

AI models can be powerful tools for discovering new solutions, but only when they are trained on large amounts of data. When that data doesn’t exist, one option is to create synthetic data using high-performance computing.

The challenge is that this synthetic data has to be extremely accurate—otherwise it can mislead the AI. Unfortunately, today’s chemical modeling methods are so computationally demanding that it’s not practical to generate the huge volumes of accurate data needed.

To address this challenge, Zimmerman and his team are creating and employing highly accurate chemical system data sets that can then be used to train artificial intelligence models and machine learning tools. These groundbreaking tools, which are in advanced stage development, have tremendous potential across industries.

In order to ensure the data and trained models produce the most accurate results, Zimmerman partnered with U-M faculty members Vikram Gavini and Ambuj Tewari. Gavini, a professor of mechanical engineering and of materials science and engineering, and Tewari, a professor of statistics and of electrical engineering and computer science assist Zimmerman and his team by bringing mathematical and machine learning expertise to his research.

“No one discipline had figured this out yet,” Zimmerman said. “So we knew a multidisciplinary approach was necessary. Our different perspectives and areas of expertise have been critical in idea generation and thought development throughout the process.”

For time scale and effort reference, an exceptional human might describe or recognize roughly 1,000 chemical bonds or sets in a lifetime. The data sets Zimmerman and his team are creating and training AI models on can recognize hundreds of thousands of chemical sets in a few months.

“By training models with these robust data sets, we can calculate highly accurate, on-demand and synthetic points that others can then use to train their model or inform their research,” Zimmerman said. “The development of these tools allows for this vast amount of information to be accessible; the data sets will not sit with only one person, one company. It provides opportunities for more scientists to solve more than one complex problem at a time.”