|𝔻⟩irac's Student: Atomistics with Containers and MACE

The world of classical atomistic simulations has been moving full steam ahead with the development of universal ML potentials, more specifically Graph Neural Networks¹. This advance in new capabilities makes running multi-species and multi-phase calculations and simulations much more practical. There are many different variations out there, and Matbench Discovery [1], which I've written about, does a very nice job of capturing the general predictive capabilities of the different models. In this post, I'm covering the MACE² model[2], which I also explored in a draft preprint I was putting together in my spare time on phonon and elastic predictions using GNNs for shape memory alloys.

Be attentive

Although these GNN models are very capable and seem to be generally very good, caution is still needed when applying them to very specific materials and setups. Just make sure you understand that you should take a qualitative approach rather than a quantitative one, unless you have experimental data to compare to.

What I wanted to cover today is not the specific use or deployment on materials, but rather that I finally figured out how to set up MACE with LAMMPS for CPU inference. I don't have a GPU on my personal machine, so I'll need to use TensorDock to get that working. I used an alternative approach to configuring everything on my native machine environment (i.e., Ubuntu 22.04 LTS) to avoid cluttering my system tree. The tool I used is Apptainer³, which is an HPC containerization tool that leverages Docker.

Apptainer

You'll need to set this up to get started. At first, it looked daunting because compiling from scratch requires a good deal of repository specifications that can easily be outdated based on the last update of the documentation. Fortunately, if you head over to the GitHub repository and release page, you can find different architecture Linux package files (e.g., .deb or .rpm) and install these seamlessly.

Apptainer is a nifty tool because it naturally captures your terminal workflow you would normally follow when setting up a computing environment. You specify the basic Linux commands for setting up and installing things, which is done in the %post section. By default, this is all done with fakeroot settings, so the directory tree will be /root, which can be awkward, but just remember that (maybe there is a way to change this!). The rest of the sections usually specify the runtime settings. You also need to specify the base Linux image to use and the containerization technology. I've only ever used Docker.

I'm still very new to Apptainer/Singularity, so I'm not sure if the definition files I've been putting together are best practices, but I've had success and I really like the isolation and portability they seem to offer.

MACE + LAMMPS

One thing with many of the GNN potentials is that they have mostly been configured to work with ASE using custom ASECalculator class objects. This is super convenient given almost all are developed in Python with PyTorch and use the ASE Atoms class or pymatgen Structure class, the latter of which can easily be converted to ASE Atoms. I like ASE, and doing basic dynamic simulations (i.e., molecular dynamics) is possible. The issue that usually occurs is performance and features. This is where LAMMPS usually shines; it's both performant and very feature-rich.

So, is it possible to combine LAMMPS with these GNN potentials? The short answer is yes, but the complete answer is that it's not very robust or well-tested. I've found that in many of these GNN model frameworks, including MACE, things break—usually from deep within the PyTorch stack, making it hard to troubleshoot. But when you do get it to work, it does work. The issue, though, is matching up computing resource settings so there are no bottlenecks. This I have yet to really figure out. With MACE, it doesn't seem MPI is possible from within LAMMPS, while threading (OMP) does seem to work, though the performance boost seems to be very minimal. The other noticeable aspect is that MACE is memory-hungry, using up 30 GB for a 1000-atom system. 😅

Okay, so how did I get it to work with Apptainer? Here is the definition file:


BootStrap: docker
From: ubuntu:22.04
%labels
    Author Stefan Bringuier
    Email stefanbringuier@gmail.com
    Version 1.1
    Description "MACE+LAMMPS (CPU version only)."

%post
    # Install system dependencies
    apt update && DEBIAN_FRONTEND=noninteractive apt install -y \
        python3 \
        python3-pip \
        python3-dev \
        git \
        cmake \
        g++ \
        libopenmpi-dev \
        wget \
        zip \
        ffmpeg \
        && apt clean \
        && rm -rf /var/lib/apt/lists/* \
        /usr/share/doc/* \ 
        /usr/share/man/* \
        /usr/share/locale/*

	# Install ASE and MACE
    pip3 install --upgrade pip
    pip3 install ase
    pip3 install torch==2.2.2 \
    	torchvision==0.17.2 \
    	torchaudio==2.2.2 \
    	--index-url https://download.pytorch.org/whl/cpu
    pip3 install mace-torch

    # MKL needed by LAMMPS
    wget \
    https://registrationcenter-download.intel.com/akdlm/IRC_NAS/cdff21a5-6ac7-4b41-a7ec-351b5f9ce8fd/l_onemkl_p_2024.2.0.664.sh
    sh ./l_onemkl_p_2024.2.0.664.sh -a -s --eula accept

    # Download and install the CPU version of PyTorch and dependencies
    wget \
    https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.2.2%2Bcpu.zip
    unzip libtorch-shared-with-deps-2.2.2+cpu.zip
    rm libtorch-shared-with-deps-2.2.2+cpu.zip
    mv libtorch $HOME/libtorch-cpu

	# FIX: Add the library path to the dynamic linker configuration
    echo "$HOME/libtorch-cpu/lib" > /etc/ld.so.conf.d/libtorch.conf
    ldconfig


    # Set up LAMMPS with MACE support (CPU version)
    git clone --branch=mace --depth=1 https://github.com/ACEsuit/lammps
    cd lammps
    mkdir build
    cd build
    cmake \
    -D CMAKE_BUILD_TYPE=Release \
    -D CMAKE_INSTALL_PREFIX=$HOME/lammps \
    -D CMAKE_CXX_STANDARD=17 \
    -D CMAKE_CXX_STANDARD_REQUIRED=ON \
    -D BUILD_MPI=ON \
    -D BUILD_OMP=ON \
    -D PKG_OPENMP=ON \
    -D PKG_ML-MACE=ON \
    -D CMAKE_PREFIX_PATH=$HOME/libtorch-cpu \
    -D MKL_INCLUDE_DIR=/opt/intel/oneapi/mkl/latest/include \
    -D MKL_LIBRARY=/opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so \
    ../cmake
    make -j 4
    make install

	# Create symbolic link to lammps and clean up pip cache
    #ln -s $HOME/lammps/bin/lmp /usr/bin/lmp
    pip3 cache purge

%environment
    export LC_ALL=C
    export PATH=/root/lammps/bin:$PATH
    export LD_LIBRARY_PATH=$HOME/libtorch-cpu/lib:$LD_LIBRARY_PATH
    export CUDA_VISIBLE_DEVICES=""
    #Default MKL/OMP Threads
    export OMP_NUM_THREADS=4
    export MKL_NUM_THREADS=4

%runscript
    echo "Starting MACE python environment"
    python3 "$@"

%startscript
    echo "Shell with MACE and LAMMPS."
    exec /bin/bash "$@"

%help
    Apptainer with MACE and LAMMPS with MACE support (CPU version).

    Usage:
    - To run a Python script with MACE:
      apptainer run MACE_CPU.sif your_script.py
    - To start an interactive bash session
    
    To build your container image, it's simple:
    apptainer build MACE_CPU.sif MACE_CPU.def

Then you'll get the .sif file, and you can run it in three different ways:

runscript, which runs a specific command and arguments. In this def file, it's Python: apptainer run MACE_CPU.sif
startscript, which will create an interactive shell for the container: apptainer shell MACE_CPU.sif
The last is specifying a command and arguments to execute: apptainer exec MACE_CPU.sif python3 -c "import mace"

The definition file above does quite a few things in the %post section. It updates the system image, installs a bunch of libraries and tools, and then installs PyTorch and MACE stuff, and finally a custom LAMMPS. This can take a considerable amount of time, and one frustrating thing I cannot figure out is how to create a temporary build so that if it fails, I don't have to start from scratch (i.e., download everything again). It would also be cool if you could have it use the local host system's cache to speed things up.

What's the selling point?

For me, that's easy: a way to nicely sandbox builds so that you don't mess your local system up. Furthermore, in theory, you get a containerized system image that is super portable and can be run on almost any machine. For CPU-based execution, I've pretty much found this to be true.

Now with regard to MACE, there was a lot of tinkering with PyTorch to get this to finally work with Python in the container. The same was the case with LAMMPS. Both are working; however, MACE evaluations are pretty slow. Here are the timings for 250 atoms for 5000 steps with 1 MPI process and 4 OMP threads:


Performance: 0.092 ns/day, 261.944 hours/ns, 0.424 timesteps/s, 106.045 atom-step/s
127.5% CPU use with 1 MPI task x 4 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
Pair    | 11786      | 11786      | 11786      |   0.0 | 99.99
Neigh   | 0          | 0          | 0          |   0.0 |  0.00
Comm    | 0.12385    | 0.12385    | 0.12385    |   0.0 |  0.00
Output  | 0.0012606  | 0.0012606  | 0.0012606  |   0.0 |  0.00
Modify  | 1.2743     | 1.2743     | 1.2743     |   0.0 |  0.01
Other   |            | 0.009874   |            |       |  0.00

All the time is in the MACE evaluation part, and as you can see, it is very time-consuming! My guess is there is a lot of trial-and-error to get the best performance or even just different compile settings. I also should mention that the MACE developers clearly state that the LAMMPS interface is still in development and that GPU inference is a much better option.

Example LAMMPS script

One of my intended uses of MACE with LAMMPS is for my NiTi shape memory alloy preprint draft. The use will be to get the temperature phonon dispersion and DOS using the fluctuation-dissipation theory approach implemented in the fix phonon command. In addition, we can use the phase order parameter to track percent phase transformation as a function of temperature or strain [3].

Since these universal GNN potentials support at least up to atomic numbers < 70, there is no real species type limitation. Here is an example LAMMPS script for the NiTi B2 structure:

# LAMMPS NiTi with MACE
units         metal
atom_style    atomic
atom_modify   map yes
newton        on

read_data data.NiTi_B2

#MACE potential
pair_style    mace no_domain_decomposition
pair_coeff    * * mace_agnesi_small.model-lammps.pt Ni Ti
fix           1 all npt temp 600.0 600.0 1.0 iso 1.0 1.0 10.0
timestep      0.001
thermo        100
thermo_style  custom step temp pe etotal press vol
run           5000

There are some important notes to make when using MACE with LAMMPS. The first is atom_modify map yes needs to be set. The second is when not using many MPI processes, you should use the no_domain_decomposition option in the pair_style command. The model file mace_agnesi_small.model-lammps.pt is converted using this utility file from this model file. When you run the utility script on the pre-trained model file, you'll get the LAMMPS version.

Till Next Time

For now, this is where I'm at: I've got an Apptainer definition file that works for CPUs and runs MACE alone and with LAMMPS. The goal is to get it running on GPUs through Apptainer on TensorDock. Then I'll get to test how much speed-up can be seen. It's also worth mentioning that there is a JAX implementation of MACE, which is 2x faster than PyTorch. I'm curious if there is a C++ library implementation for JAX to use with LAMMPS, similar to how it was done for the PyTorch version.

Footnotes

For more info on GNN and my efforts, see these posts. ↩
I don't really know what the acronym stands for, but the title of the original paper is MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. ↩
I believe Apptainer is the same as Singularity, just a renaming of the open-source version of the product 🤷‍♂️. ↩

References

[1] J. Riebesell, R.E.A. Goodall, P. Benner, Y. Chiang, B. Deng, A.A. Lee, A. Jain, K.A. Persson, Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions, (2023). DOI.

[2] I. Batatia, D.P. Kovacs, G.N.C. Simm, C. Ortner, G. Csanyi, MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields, in: A.H. Oh, A. Agarwal, D. Belgrave, K. Cho (Eds.), Advances in Neural Information Processing Systems, 2022. URL.

[3] Gur, Sourav, et al. “Evolution of Internal Strain in Austenite Phase during Thermally Induced Martensitic Phase Transformation in NiTi Shape Memory Alloys.” Computational Materials Science, vol. 133, June 2017, pp. 52–59. DOI.

Reuse and Attribution

2 comments:

AnonymousMarch 18, 2025 at 1:57 PM
Thank you very much for your post. I was able to successfully build MACE_CPU.sif, but I am unsure how to proceed next. Could you please explain how to run LAMMPS using this MACE_CPU.sif file? I would greatly appreciate your guidance. Thank you!
Stefan BringuierMarch 19, 2025 at 12:27 PM
Well first of all, thank you and congrats, you're the first commenter on this blog besides myself, 😂!

So to run you lammps from the container you would do something like:

apptainer exec --env OMP_NUM_THREADS=4,MKL_NUM_THREADS=4 MACE.sif mpirun -np 1 /root/lammps/bin/lmp -sf omp -in in.lmp

I will note that the OMP has not always worked for me, and MPI tasks > 1 fail. I haven't tried to get CPU + Kokkos working, and this might improve performance, but I don't really know.

Also sometimes you need to use the --bind flag to make sure the apptainer environment sees your files on the host machine.

Please refrain from using ad hominem attacks, profanity, slander, or any similar sentiment in your comments. Let's keep the discussion respectful and constructive.

|𝔻⟩irac's Student

Search Blogs

Thursday, August 8, 2024

Atomistics with Containers and MACE