The modelforge package serves as the backbone, implementing the neural network potential and training infrastructure. The auditorium package takes over the task of executing the actual training. It also applies the Neural Network Potentials (NNPs) for a variety of benchmarking and stability tests, such as water radial distribution functions, pure liquid densities, heat of vaporization, and free energy calculations. The outcomes of these tests are publicly available on our website, offering a data-driven platform where users can compare diverse metrics. Our primary objective is to assist users in selecting the neural network potential that is most appropriate for their specific needs. We recognize that a universally 'optimal' solution may not always meet unique user requirements. Additionally, we aim to offer insights into computational scalability—both in terms of GPU memory usage and wall-clock time—as a function of the number of elements or atoms in the system.
auditorium
training.py # provides infrastructure to perform training and hyper-parameter optimizatoin
benchmark.py # inference time/gpu memory as a function of nr of atoms/elements
testing.py # battery of stability tests
For fair and meaningful comparisons, it's essential to assess the performance of different NNPs on a consistent hold-out test set, and if possible, an out-of-distribution test set as well. The performance metrics should be reported under two specific conditions: 1) with a fixed computational budget to enable resource-related comparisons, and 2) upon reaching convergence to gauge the performance under optimal conditions.
Additionally, we will use the ray
library for hyperparameter optimization. This approach allows us to efficiently explore the parameter space and make more robust comparisons between different NNPs.
Reproducible: All training procedures and hyperparameter settings will be documented to ensure reproducible.
Different NNPs exhibit varied scaling behavior, which can even be a function of specific properties, such as the elements involved. Both the scaling behavior concerning GPU memory and wall-clock time as a function of the number of atoms/elements are pertinent aspects. We will provide detailed timings and GPU memory consumption data for each of the potentials evaluated.
An easy to realize approach is to scale the waterbox of an oligopeptide and provide timing and GPU memory for single point energy calculations (not on the first energy calculation since there is most likely an overhead for featurization).
A battery of tests to ensure that a NNP is able to sample from a potential energy surface of a molecule.
Environment | Test system | Thermodynamic ensemble | Test property |
---|---|---|---|
Vacuum | HiPen set | - | Bond/angle deviation, potential energy convergence |
Vacuum | Example molecules for relevant functional groups, currently we are using the DrugBank as a surrogate | - | Bond/angle deviation, potential energy convergence, deviation from MM and DFT minima |
Vacuum | Dipeptide alanine | relaxed 2D torsion scan around phi/psi dihedral | |
Vacuum | Drug bank library | - | Check minimized conformation against DFT (PBE0) minima |
Vacuum/Water | Dipeptide alanine | NpT, NVE, NVT | Bond/angle deviation, potential energy convergence, O-O rdf, density [NpT], energy conservation [NVE], phi/psi distribution |
Water | Waterbox | NpT, NVE, NVT | Bond/angle deviation, potential energy convergence, O-O rdf, density [NpT], energy conservation [NVE] |
Organic solvent | n-octanol, cyclohexane | NpT | potential energy convergence, density |
Tests will be implemented in https://github.com/Exscientia/StableNetGuardOwl, which is currently using openmm-ml
. The plan is that we will tie in our own MD engine and perform the same tests with the same potentials alongside the potentials that will be published by Exscientia.