Blog016 | On-The-Fly MD Explanation (2) On-The-Fly MD and Its Applications | Yu-ichiro MATSUSHITA

TECHNICAL NOTE

On-The-Fly MD Explanation (2) On-The-Fly MD and Its Applications

In Part 2, building on the content of Part 1, we will explain what On-The-Fly MD calculations are and address the challenges of machine learning MD that can be resolved by using On-The-Fly MD.

INTRODUCTION

WHAT IS

What is On-The-Fly MD?

Under the challenges of machine learning MD mentioned above, we consider On-The-Fly MD to be an excellent method for solving these issues. So, what exactly is On-The-Fly MD? We will explain it from here.

On-The-Fly MD, as illustrated in the diagram below, performs MD calculations while simultaneously evaluating errors, typically estimating the errors against first-principles MD. If the error exceeds a certain threshold, the calculation switches to first-principles MD, where DFT calculations are executed, and the learning model is updated. If the error does not exceed the threshold, atomic positions are updated, and the process advances to the next MD step.

This allows On-The-Fly MD calculations to ensure that the accuracy approaches that of first-principles MD calculations while using machine learning potentials. In this way, the time-consuming DFT calculations are minimized, achieving accuracy comparable to that of first-principles MD calculations. In practice, executing On-The-Fly MD results in a speedup of 100 to 200 times compared to first-principles MD calculations.

Reproduced from Vandermause et al., npj Computational Materials 6, 20 (2020).

The horizontal axis represents the MD steps, while the blue dots on the vertical axis indicate the estimated errors. The large red circles represent the MD steps where the errors exceed the threshold, indicating significant errors.

The above figure actually represents the situation when On-The-Fly MD is executed, showing that significant errors appear at certain MD steps as the steps progress. When a large error is observed, the calculation switches to first-principles MD.

An important point to note is that in the initial 10 steps, the errors are large, and first-principles MD steps are repeated. This indicates that during the early steps, sufficient training data has not been obtained, resulting in large errors. However, after about 10 steps, the errors decrease rapidly, indicating that a good machine learning potential, which mimics first-principles MD, is being generated. After surpassing the 10-step mark, it is clear that most of the steps are machine learning MD, with very few DFT calculations being invoked.

In fact, this figure plots around 220 steps, with DFT being called approximately 20 times. This means that DFT calculations have been reduced to about one-tenth compared to first-principles MD. However, since about 10 of these 20 DFT calls occurred in the early stages, as the total number of MD steps increases, the proportion of DFT calls will decrease.

In practice, when performing long-term On-The-Fly MD calculations, the number of DFT calls decreases to as little as one-hundredth of the total steps. This results in a reduction in computational cost of nearly one hundred times compared to first-principles MD, effectively lowering the computation time to around one-hundredth. This is what On-The-Fly MD calculations are.

FURTHER

Further Utilization of On-The-Fly MD

Here, we will explain how On-The-Fly MD can solve two challenges related to the utilization of machine learning potential databases. First, the first challenge concerns the computational cost for generating machine learning potentials. Let's consider generating machine learning potentials using On-The-Fly MD. Typically, machine learning potentials are generated using neural networks, which requires several hundred DFT calculations to be performed in advance to create the training data. However, among these hundreds of training data points, many are similar to each other, resulting in significant redundancy. This means that unnecessary DFT calculations are performed on duplicated data, creating a computational bottleneck.

In contrast, with On-The-Fly MD, only those MD steps with large estimated errors are subjected to DFT calculations, which are then incorporated into the learning model. This approach minimizes the number of DFT calculations without redundant data. As a result, it significantly reduces the number of DFT calculations, enabling rapid generation of machine learning potentials.

The second challenge pertains to the principle of MD calculations due to the excessive number of hyperparameters. In previous On-The-Fly MD calculations, first-principles MD results were used as reliable reference data. To the extent possible, we performed error estimation at every step to reproduce the results of first-principles MD calculations. What happens if this reference data is replaced with machine learning potentials like CHGNet? If we use machine learning potentials such as CHGNet as training data, On-The-Fly MD will try to mimic these machine learning potentials as closely as possible.

If we reduce the number of hyperparameters in On-The-Fly MD, what will be the outcome? In fact, On-The-Fly MD does not require versatility (the ability to produce reasonable results for any material). It is sufficient that the accuracy is good for the specific material being studied. Therefore, it becomes possible to achieve comparable accuracy with fewer hyperparameters than generic potentials like CHGNet.

In practice, by executing On-The-Fly MD with machine learning potentials like CHGNet as training data, we can transform it into a machine learning potential with fewer hyperparameters. Once a machine learning potential with fewer hyperparameters is created, the speed of MD calculations can approach a hundred times faster. Of course, this process requires more effort compared to directly using versatile machine learning potentials like CHGNet, but the benefits afterward are tremendous.

Especially for those who wish to perform long MD calculations, this method provides overwhelming computational speed compared to simply using generic machine learning potentials, making it highly recommended.

BY QULUOD

On-The-Fly MD Offered by Quloud

As explained, the use of On-The-Fly MD significantly advances the capabilities of machine learning MD. In our product Quloud V5.0, this On-The-Fly MD will be implemented. Additionally, various usage scenarios, as mentioned above, can be operated through a user-friendly interface. We hope that those who are interested in trying machine learning MD or are facing challenges with it will consider using our product.

Recommended for Those Who Want to Perform Large-Scale CHGNet-MD Calculations for Extended Periods

Recommended for Those Who Want to Quickly Generate Machine Learning Potentials
Recommended for Those Who Want to Execute Large-Scale MD Calculations

Recommended for Those Who Want to Run DFT-MD with High Precision

Beneficial Users

One-Hundredth of CHGNet
(Achieving a 100x speedup through synergy with LAMMPS)

(Over 5 times faster compared to NN potentials)

Approximately 100x Speedup Compared to DFT-MD

Full Run Time

Requires Effort

75% Reduction
Particularly, using OpenMX results in faster performance (powerful and recommended)

Learning Time

Acceleration of CHGNet-MD

Rapid Generation of Machine Learning Potentials

Acceleration of DFT-MD