The modelforge package serves as the backbone, implementing the neural network potential and training infrastructure. The auditorium package takes over the task of executing the actual training. It also applies the Neural Network Potentials (NNPs) for a variety of benchmarking and stability tests, such as water radial distribution functions, pure liquid densities, heat of vaporization, and free energy calculations. The outcomes of these tests are publicly available on our website, offering a data-driven platform where users can compare diverse metrics. Our primary objective is to assist users in selecting the neural network potential that is most appropriate for their specific needs. We recognize that a universally 'optimal' solution may not always meet unique user requirements. Additionally, we aim to offer insights into computational scalability—both in terms of GPU memory usage and wall-clock time—as a function of the number of elements or atoms in the system.

General overview

auditorium
	training.py # provides infrastructure to perform training and hyper-parameter optimizatoin
	benchmark.py # inference time/gpu memory as a function of nr of atoms/elements
	testing.py # battery of stability tests

Training Module

For fair and meaningful comparisons, it's essential to assess the performance of different NNPs on a consistent hold-out test set, and if possible, an out-of-distribution test set as well. The performance metrics should be reported under two specific conditions: 1) with a fixed computational budget to enable resource-related comparisons, and 2) upon reaching convergence to gauge the performance under optimal conditions.

Additionally, we will use the ray library for hyperparameter optimization. This approach allows us to efficiently explore the parameter space and make more robust comparisons between different NNPs.

Reproducible: All training procedures and hyperparameter settings will be documented to ensure reproducible.

Benchmark Module

Different NNPs exhibit varied scaling behavior, which can even be a function of specific properties, such as the elements involved. Both the scaling behavior concerning GPU memory and wall-clock time as a function of the number of atoms/elements are pertinent aspects. We will provide detailed timings and GPU memory consumption data for each of the potentials evaluated.

An easy to realize approach is to scale the waterbox of an oligopeptide and provide timing and GPU memory for single point energy calculations (not on the first energy calculation since there is most likely an overhead for featurization).

Testing Module

A battery of tests to ensure that a NNP is able to sample from a potential energy surface of a molecule.

Environment Test system Thermodynamic ensemble Test property
Vacuum HiPen set - Bond/angle deviation, potential energy convergence
Vacuum Example molecules for relevant functional groups, currently we are using the DrugBank as a surrogate - Bond/angle deviation, potential energy convergence, deviation from MM and DFT minima
Vacuum Dipeptide alanine relaxed 2D torsion scan around phi/psi dihedral
Vacuum Drug bank library - Check minimized conformation against DFT (PBE0) minima
Vacuum/Water Dipeptide alanine NpT, NVE, NVT Bond/angle deviation, potential energy convergence, O-O rdf, density [NpT], energy conservation [NVE], phi/psi distribution
Water Waterbox NpT, NVE, NVT Bond/angle deviation, potential energy convergence, O-O rdf, density [NpT], energy conservation [NVE]
Organic solvent n-octanol, cyclohexane NpT potential energy convergence, density

Tests will be implemented in https://github.com/Exscientia/StableNetGuardOwl, which is currently using openmm-ml. The plan is that we will tie in our own MD engine and perform the same tests with the same potentials alongside the potentials that will be published by Exscientia.