Search Blogs

Sunday, December 28, 2025

2025 Reflections: Year of catch-up

The pace at which science and engineering is moving is pretty hard to keep up with. I've always felt things move really fast, or appear that way, because of the real time updates we get through various outlets like arXiv, LinkedIn, and other platforms. The problem is that as a researcher/scientist/engineer/technologist it's pretty hard to keep up in a competent and competitive way. From my perspective, I'm always trying to ensure my understanding and technical ability stays as close as possible with what's coming into the spotlight. This year was no different, but it was really hard to stay on top of all the emerging developments in computational science, self-driving labs, AI for science, materials science, etc.

In terms of the category coverage of my posts this is what the 2025 treemap looks like:

2025 category coverage

Top 5 labels were: Machine Learning, Research, Atomistics, Tools, and AI. The research label must be related to research papers I've read or looked at, not research I'm doing myself2. I'm not surprised about ML and AI as I've had an interest in this more and more over time. The tooling is interesting since most people probably don't think to write about this, but I'm finding this blog a good place to document things related to Linux and Python tooling.

I missed a lot of good stuff this year but I'm going to try to summarize what I thought were some of the most interesting things I came across and wrote about, as well as what I am hoping to catch up on.

Self-driving labs

My hope for this year was to cover the rise and acceleration of self-driving lab research, development, and deployment. There was a lot of good stuff from topics brought up by people like Sterling Baird, who will now lead a new vertical cloud lab at Bringham Young University. In general the field has a lot of good momentum; the challenge will of course be orchestration and integration of different synthesis, characterization, and testing equipment to form self-driving loops that are either fully autonomous or have limited human intervention. I think I only wrote one post in the beginning of the year where I tried to argue that 2025 was the year of robotics in labs. I think this wasn't a bad prediction, as the number of publications on the topic increased by ~54% over 2024 1. So the growth is strong and led by some really capable academics and start-ups in my opinion.

What I'm hoping to see is more open standards by hardware vendors to allow more rapid integration. For robotic manipulation systems, this is probably in a good place and not the major issue, but rather characterization and testing equipment don't have easy IO and primitives to make integrating with decision making loops easy. I could be wrong since I'm not an active researcher or developer in this space but rather just a tinkering enthusiast, but from what I've done, unless you're building your own equipment (i.e. think open source optical microscope), it's going to be hard to fully integrate and automate SEM characterization in a decision making loop.

Materials discovery

This in my opinion is the topic du jour in computational materials science and this is another area where the number of publications is exploding, not so much on actual new materials but rather on the computational tools and techniques to accelerate the discovery and design of new materials. I'm not going to opine so much on whether this is legit or fraught with false promises, for that there are a lot of good people who are opining on that [1]. The main thing is this is here to stay for some time as both big companies, start-ups, and academics are all racing to make their impact in the computational and lab aspects of materials discovery.

I don't think this will slow down at all in 2026; it will probably see more growth, although I don't think it will be so much on a slew of new GNN or transformer model architectures, but probably rather more data and improved integration in workflows. Fine-tuning and distillation of pre-trained models is my guess where a lot of the research action will be.

Computing & simulations

I wrote a fair amount about tooling this past year, I'm not sure why but I think it was mainly as a way to document to myself things I always come back to and am tired of figuring out over and over again. Truth is I tend to refresh my Linux machine fairly often just to keep me from getting stale or too comfortable with forgetting how to do things. I wrote a post on the awesome uv package manager for python. I strongly suggest people check out uv for their python projects. I also wrote about backing up headless Linux machines using rclone and restic/borg/rsnapshot here ... to be honest I've used deduplicated backups several times to recover things I messed up.

On the simulation side I played around with g-xTB [2] and then wrote an ASE wrapper for it and a simple react UI that I wrote a post on. This was a nice little weekend project that I enjoyed doing, though I wish I could have spent more time making it robust and generally useful for scientific work.

I also very much enjoyed the paper on Boron carbide by Ghaffari et al. [3] which I summarized in the post I now know why boron carbide shatters. It just showed the good progress data-driven methods are making in terms of capturing the mechanisms behind material response and behavior.

AI for science

I spent quite a bit of time writing posts on AI for science, as it's just so active and interesting to follow. Most of my posts are improving my understanding and knowledge of the techniques and methods being developed and used. I started off the year by looking at graph-pes since I thought it was one of the cleanest implementations to quickly build and train MLIP models. I haven't played around with it in a while but hoping to find some fun use cases for it in 2026. Related to this was a post mentioning that mace uses the total energy rather than cohesive/binding energies so this subtle point should be kept in mind when using it.

Then I wrote about the ORB v3 MLIP model and my perspective on their equigrad loss regularization technique. I thought this was a really clever/efficient way to learn the rotational equivariance without having to use any SO(3) equivariant layers, which are computationally intensive and architecturally complex. In truth it may not be as necessary if libraries like cuEquivariance and FlashTP make the tensor products faster and more efficient.

Related to MLIPs I wrote a post on synthetic datasets & model distillation and how one can efficiently create synthetic datasets from fine-tuned foundation models to train cheaper student/surrogate models. I actually really think this is the direction things will head once foundation models all start to saturate the benchmarks. We will see most researchers taking them, fine-tuning, and then building more efficient models. I even believe some of these models will be fitted to physics-based potentials rather than just pure data-driven models.

I also wrote about the Skala XC functional and how it was able to achieve near chemical accuracy while doing it at computational cost comparable to meta-GGA functionals. This is probably going to continue getting more attention and improve. As discussed in computing & simulation, the post on g-xTB discussed how semi-empirical methods for molecules are showing DFT level accuracy but at tight-binding computational cost.

More on the ML/AI side I wrote a few posts on some topics I wanted to improve my foundational understanding. One of them was writing a post that was akin to me taking notes based on Petar Veličković's review talk on GNNs. I enjoyed this because it made me feel more confident in my understanding; but as always I need to ensure I truly understand the maths and the implementation. There was also the post on conformal prediction, which was something I had not been familiar with before and after reading and posting on this topic I see the value in this approach. Finally I wrote a post on trying ChatGPT's image generation tool for science illustrations and how it was a bit of a let down, not awful just not good enough for making scientific illustrations that one would want to use in a paper.

I also wrote in the beginning of the year a blog post on Bayesian parameter estimation and how it can be used to estimate the parameters of a line or other simple models. I just like these basic posts even though for many they are elementary topics, but for me it makes me ensure I understand things from basic principles. I say this because it is very easy to get confused in the advanced aspects of these topics if you can't tie it back to the basic things. Additionally, I'm sure I'll come back to posts like this in the future and say "wait, maybe I don't understand" or "what was I saying this is wrong". The beauty of a personal blog is it is not a formal publication or communication (as long as you indicate such), and so mistakes are less embarrassing3.

Books

This was the area that I'm a little down about since I didn't get to reach my reading goal for this year. I read a few books as listed in my reading list, but far from the 10 books I wanted to read. The issue is that many of them are dense technical monographs that take a lot of concentration to sit down and read, whereas the ones I finished were more the kind you pick up and read in a few hours during those random lull moments in my life. For sure this is something I will work on in 2026.

Right now, my reading list target for 2026 that are not technical books is:

I'll be lucky if I get through 2 of these books in 2026, but I'll aim for all as I'm really eager to read these in addition to the technical books I have stacked up 😞.

Statistical mechanics & quantum mechanics

This was a bit off-topic I wrote about but it's just something that throughout the years I feel like I've become less strong in. From 2013-2016 I was in tip-top shape with my knowledge of stat-mech. I wrote towards the end of the year a post on the two roads to thermal equilibrium and how they relate. Then I wrote two posts on quantum related thoughts and topics. I felt like I need to write on this since in 2019-2021 quantum computing was my main research focus. But in recent years it's been off and on and I just feel like I've lost my sharpness in these areas. So I wanted to just write about them either in review of what I had done or just one of the topics to refresh my thinking. I'm sure mistakes were made in the post but at least I tried to put something together.

My hope is to continue on writing about these topics in 2026 to maintain my knowledge and understanding; you never know when this might come back into my main focus and thus good to ensure I'm sharp and knowledgeable about these foundational topics.

Outlook for 2026

So question is what will I try to write about in 2026? Don't know, but I'm hoping some good stuff. For one I will probably try to write posts similar to the ones that discuss papers I read. These posts help me dissect these dense papers and keep a nice log so I can refer back to them later and read something in my own words. I'd like to write some more blogs on weekend coding projects but I just don't get that much time to do this anymore. For example I have some synthetic data workflow that uses CrystaLLM and I'm hoping to write a post on it and post the dataset on Zenodo for others to explore and gauge its usefulness. I'm also hoping to write a little more on thermodynamic computing and of course self-driving labs.

Also I'm going to try to write/code a bit more on causal and Bayesian related topics since I want to become more proficient in actually building tools around these topics. Towards the other end of things I'm going to try to update my online presence, profiles, and such to keep my engagement with others in the community. So with that happy new year and see you in 2026! 🥳

Footnotes


  1. I used OpenAlex API to query the number of publications on the topic of self-driving labs. These were the query results for 2024 and 2025 as of 12/24/2025. 

  2. I don't post my work research here as this is my personal blog. Sometimes I'll post weekend projects or research-related efforts but always within my own time and not using any resources or results from work. 

  3. I never understood why making mistakes in science or technical research is viewed so negatively. I understand feeling embarrassed, but it seems like mistakes can ruin your reputation. It hasn't happened to me yet, but I've always felt that if I published a paper with a major mistake that needed to be retracted, I would (1) be grateful if someone caught and pointed it out... since we (I) don't want erroneous research to mislead others and (2) feel embarrassed, but as long as my intentions were unbiased, objective, and honest, I would acknowledge the mistake and learn from it. 

References

[1] J. Leeman, Y. Liu, J. Stiles, S. Lee, P. Bhatt, L. Schoop, R. Palgrave, Challenges in high-throughput inorganic material prediction and autonomous synthesis, (2024). https://doi.org/10.26434/chemrxiv-2024-5p9j4.

[2] T. Froitzheim, M. Müller, A. Hansen, S. Grimme, g-xTB: A General-Purpose Extended Tight-Binding Electronic Structure Method For the Elements H to Lr (Z=1–103), (2025). https://doi.org/10.26434/chemrxiv-2025-bjxvt.

[3] K. Ghaffari, S. Bavdekar, D.E. Spearot, G. Subhash, Influence of Crystal Orientation and Polymorphism on the Shock Response of Boron Carbide, SSRN Preprint, (2025). https://ssrn.com/abstract=5186595.



Reuse and Attribution

Sunday, November 30, 2025

Boltzmann vs. Gibbs: Two Roads

In statistical mechanics, sampling from the Boltzmann distribution is one of those basic principles that quietly sits underneath almost everything we model at equilibrium. It's how you characterize the distribution of microscopic states, estimate thermodynamic averages, and connect the energy landscape to measurable observables like temperature, pressure, composition, etc. Numerically, there are multiple ways to draw samples that represent the equilibrium state. Two of the most instructive ways are:

  1. Use Metropolis–Hastings to construct Markov chains that have the Boltzmann distribution as a stationary distribution.
  2. Use Gibbs sampling as a conditional and local way of doing the same thing.

The idea for this post is to give some context for these two ways of sampling, how they relate, and some general background around why the Boltzmann distribution is both fundamental and annoying to work with in practice.

Energy-Based Models (EBMs)

An energy-based model is simply a system whose probability distribution is defined through its energy function. For a thermodynamic system with degrees of freedom $x$, the probability of a configuration $x$ in the canonical ensemble is given by the Boltzmann distribution:

$$ \begin{equation} P(x) = \frac{e^{-\beta E(x)}}{Z}, \qquad Z = \int e^{-\beta E(x)} \, dx, \qquad \beta = \frac{1}{k_B T}. \label{eq:boltzmann} \end{equation} $$

Here, $E(x)$ is the microscopic energy of configuration $x$, $T$ is temperature, and $Z$ (the partition function1) ensures normalization. For discrete microstates you would replace the integral by a sum. In proper Hamiltonian mechanics you would also need to worry about how we measure phase space, but I'm ignoring that here..

What is important is that ratios of probabilities are set by energy differences:

$$ \frac{P(x')}{P(x)} = \exp\big[-\beta \big(E(x') - E(x)\big)\big] = e^{-\beta \Delta E}. $$

So low energy configurations are exponentially more probable than higher energy ones. For example, if $\Delta E = E(x') - E(x) = k_B T \ln 1000$, then $P(x')/P(x) = 10^{-3}$. This captures the essence of the canonical ensemble, that is, probability ratios are determined solely by energy differences.

There's also a lot of interest in machine learning in using EBMs as world models and for self-supervised learning [1]. In that context you again define an energy $E_\theta(x)$, but the partition function is replaced by various approximate tricks and training objectives, because you almost never compute $Z$ exactly there either.

Equilibrium and the Boltzmann Distribution

When a system is in equilibrium its probability distribution over microstates no longer changes in time. If $P(x,t)$ evolves according to some stochastic dynamics, for example the master equation, Fokker–Planck, Langevin, Glauber dynamics, etc., then at equilibrium $\frac{\partial P(x,t)}{\partial t} = 0$.

From a fluctuation–dissipation viewpoint, the microscopic probability fluxes cancel on average and macroscopic observables become stationary quantities. If we take a system coupled to a heat reservoir at fixed $T$, then the stationary distribution of microstates converges to the Boltzmann distribution in Eq. \ref{eq:boltzmann}.[6]

Estimating Expectations

The expectation value typically corresponds to the ensemble average value of an observable quantity. It's the value one would observe if one could ergodically sample from the system for a very long time. Once $P(x)$ is known, the expectation value of any observable $A(x)$ is

$$ \langle A \rangle = \int A(x)\, P(x) \, dx = \frac{\int A(x)\, e^{-\beta E(x)} \, dx}{\int e^{-\beta E(x)} \, dx}. $$

In practice we estimate the expectation value by sampling configurations $x_i \sim P(x)$ and then taking the sample average as $\langle A \rangle \approx \frac{1}{N} \sum_{i=1}^{N} A(x_i)$. The ergodic hypothesis is what lets us identify this time or trajectory average with the ensemble average, at least if the dynamics explores the full relevant region of phase space.

The Boltzmann Distribution

How did we end up with the Boltzmann distribution in the first place? Of course its due to the namesake for which the distribution is named after, Ludwig Boltzmann. In 1868 he published a paper where he derived the equilibrium velocity distribution and the general structure of what we now call the Boltzmann distribution for a kinetic model of a gas [2]. The resulting distribution expresses the probability of microstate configurations as given in Eq. $\ref{eq:boltzmann}$.

With eq. $\ref{eq:boltzmann}$, low energy states dominate while high energy states fade exponentially. Temperature acts as the control parameter that redistributes these probabilities:

  • $T \to 0$: the system collapses into ground states and very low-lying minima.
  • $T \to \infty$: the system explores accessible states almost uniformly.

The practical challenge is that the partition function, $Z$, is astronomically large for most realistic systems due to the number of degrees of freedom and the effective number of configurations. So direct evaluation or naive summation/integration is basically impossible beyond simple toy models. This is where Markov chain Monte Carlo and Gibbs sampling come in.

Markov Chains and Sampling

To get around the partition function $Z$, we can construct a Markov chain3 whose stationary distribution is the Boltzmann distribution $P(x)$. The trick is that the transition rules only need ratios of probabilities like $P(x')/P(x)$, so the $Z$ cancels nicely.

The basic idea is:

  1. Build an update rule that depends on the current state $x$ and some proposal mechanism.
  2. Make sure the probability of moving from state $x$ to $x'$ matches the probability of moving back, weighted by their relative probabilities under $P(x)$."
  3. Ensure that the chain is ergodic and aperiodic.

Over long times, the sequence of visited states should reproduce the Boltzmann distribution. So the game isn't to write $P(x)$ in closed form, but to engineer a Markov chain that has the correct stationary distribution. So how can we do this? There are two common(?) approaches, the first is the Metropolis–Hastings and the second is Gibbs sampling.

Metropolis–Hastings: Energy-Based Acceptance

The Metropolis–Hastings algorithm feels like an experimental perturbation process [3,4], that is we start with some initial setup, $x$, and then propose a new state, $x'$, and then decide to accept or reject the proposal if it gave a better result (i.e. lower energy). The steps to implement would be:

  1. Start with, $x$, propose a move $x \to x'$ from some prior distribution $q(x' \mid x)$.
  2. Compute the energy difference $\Delta E = E(x') - E(x)$.
  3. Accept the move with probability $$ \begin{equation} p_{\text{acc}}(x \to x') = \min\left(1, e^{-\beta \Delta E} \frac{q(x \mid x')}{q(x' \mid x)}\right). \label{eq:metropolis} \end{equation} $$

For symmetric proposals where $q(x' \mid x) = q(x \mid x')$, that is the reverse is equally likely as the forward, this reduces to,

$$ p_{\text{acc}}(x \to x') = \min\left(1, e^{-\beta \Delta E}\right). $$

If the move lowers the energy then it is always accepted. If it raises the energy, it may still occur with Boltzmann-weighted probability4. This acceptance rule enforces detailed balance with respect to Eq. \ref{eq:boltzmann}. Under standard conditions, the chain converges to the Boltzmann distribution as the unique stationary distribution.[3,4]

The pain point with Metropolis–Hastings is that in high dimensional systems, many naive proposals (i.e. transition moves) are rejected and the chain diffuses slowly through configuration space. This means long simulation times despite the equilibrium distribution being formally correct. So getting good sampling in a reasonable amount of wall clock time can be challenging if not impossible.

Gibbs Sampling: Conditional Equilibrium

One very useful alternative is Gibbs5 sampling [5]. Instead of proposing global moves, Gibbs sampling uses local conditional updates and draws from exact conditional distributions whenever those are tractable. The key difference from Metropolis–Hastings is that Gibbs sampling doesn't propose and then accept/reject—it samples directly from the conditional distribution, which already has the correct Boltzmann weighting built in. For example, on a spin-lattice, instead of flipping a spin and then deciding whether to accept the flip based on the energy change (as Metropolis–Hastings does), Gibbs sampling directly picks the new spin value with the correct probability given its neighbors. Since we're sampling from the right distribution from the start, every update is automatically accepted.

How do we get the exact probability? if we take a spin-site with 4 neighbors, the energy interaction depends only on those neighbors. We can compute the Boltzmann probability for each possible spin value given those interactions, then normalize. That's it—just a few local energy terms, not the entire system.

     +1
      |
+1 -- X -- +1
      |
     +1

For the center spin $X$ with all neighbors $+1$ as shown above, the energy contribution is:

$$ \begin{align} E(X=+1) &= -J\big[(+1)(+1) + (+1)(+1) + (+1)(+1) + (+1)(+1)\big] \nonumber \\ &= -J(4) = -4J \nonumber \end{align} $$

$$ \begin{align} E(X=-1) &= -J\big[(-1)(+1) + (-1)(+1) + (-1)(+1) + (-1)(+1)\big] \nonumber \\ &= -J(-4) = 4J \nonumber \end{align} $$

Then we get the unnormalized probabilities:

$$ P(X=+1) \propto e^{-\beta E(X=+1)} = e^{4\beta J}, \qquad P(X=-1) \propto e^{-\beta E(X=-1)} = e^{-4\beta J} $$

To normalize, we divide by the sum:

$$ P(X=+1) = \frac{e^{4\beta J}}{e^{4\beta J} + e^{-4\beta J}}, \qquad P(X=-1) = \frac{e^{-4\beta J}}{e^{4\beta J} + e^{-4\beta J}} $$

Now $P(X=+1) + P(X=-1) = 1$ and we can sample from this distribution. The aligned state ($X=+1$) is exponentially more probable than the anti-aligned state. As you see we are conditioning on the neighbors, that is we are sampling from the conditional distribution $P(X \mid X_1, X_2, X_3, X_4)$.

So now if we have the energy in terms of many degrees of freedom, $E(x_1, x_2, \ldots, x_N)$, with Gibbs sampling we can update one variable at a time by drawing from its conditional distribution given all the others:

$$ \begin{equation} x_i^{(t+1)} \sim P(x_i \mid x_{-i}^{(t)}) \propto \exp\left[-\beta E\big(x_i, x_{-i}^{(t)}\big)\right]. \label{eq:gibbs} \end{equation} $$

Here $x_{-i}$ means all degrees of freedom except $x_i$. The proportionality is indicating that you still have to normalize over the allowed values of $x_i$, but you never need the global partition function $Z$ which is big savings.

To recap, each degree of freedom is sampled from its local equilibrium distribution. There are no proposals to reject, so every update is "accepted" simply by construction of the conditional sampling. In many lattice models, the Hamiltonian is a sum of local terms, which means the conditional $P(x_i \mid x_{-i})$ depends only on neighboring variables. So the cost of an update is dominated by very local energy evaluations, not global ones.

In that sense, Gibbs sampling is Boltzmann sampling by conditional decomposition2. It's exact when those conditionals can be written down and sampled efficiently (i.e., the interactions are local), which is the case for a fair amount of models like the Ising model, Potts model, and more general Gibbs random fields in image analysis.

So Boltzmann or Gibbs?

The title was a bit of a mismatch since the Boltzmann distribution is the underlying thing we are trying to sample from, so Boltzmann and Gibbs aren't in competition in any meaningful sense. The main challenge with trying to use the analytical Boltzmann distribution is we can't meaningfully determine the partition function for toy and real-world systems. So what we do instead is:

  • Take the Boltzmann distribution as the target equilibrium distribution.
  • Construct Markov chains whose stationary distribution is exactly that Boltzmann measure.
  • Use different update strategies to actually explore the state space.

Both Metropolis–Hastings and Gibbs sampling are Markov chain Monte Carlo (MCMC) methods that construct Markov chains converging to the same Boltzmann distribution. They differ fundamentally in their update mechanisms:

With Metropolis–Hastings it proceeds by proposing candidate states from a proposal distribution $q(x' \mid x)$ and then accepting or rejecting them based on the energy difference. Gibbs sampling exploits conditional probablity structure by directly sampling from the exact local distributions: $P(x_i \mid x_{-i})$. As a result the the sampled outcome is the update. In systems with local interactions, this makes the process efficient because it avoids global energy evaluations. This means no wanted proposals as in Metropolis-Hastings, but its again only practical when the conditional distributions can be computed efficiently. So Gibbs isn't replacing Boltzmann. It's helping Boltzmann by providing a conditional route to sampling from the same equilibrium distribution.

Gibbs as a special case

Gibbs can be viewed as a special case of Metropolis where the proposal distribution is chosen to be the exact conditional $P(x_i \mid x_{-i})$. In that case the Metropolis–Hastings acceptance probability is identically one, so every proposal is accepted and the chain still converges to the same Boltzmann (Gibbs) distribution.

Both Metropolis–Hastings and Gibbs sampling should approach Eq. $\ref{eq:boltzmann}$ as the number of steps increases. Again they only differ in how they traverse state space and in which cases they are practical.

Method Update rule Uses proposals? Acceptance step Typical use
Metropolis–Hastings Random move + accept/reject Yes Yes, via $p_{\text{acc}}(x \to x')$ Very general purpose, can wrap many proposal types
Gibbs Sampling Exact conditional resampling No Always accepted Structured models with tractable conditionals

Numerical Example

Below is a simple example for a 2-site Ising model with 4 possible microstates. The energy function for a configuration is given by:

$$ E(x) = -J x_1 x_2 $$

where $J$ is the interaction strength and $x_i \in {-1, +1}$ is the spin state at site $i$. We set $J = 1$ and $T = 1$ (so $\beta = 1$). The four possible microstate configurations are:

$$ \begin{align} x_1 &= 1, \quad x_2 = 1 \nonumber \\ x_1 &= 1, \quad x_2 = -1 \nonumber \\ x_1 &= -1, \quad x_2 = 1 \nonumber \\ x_1 &= -1, \quad x_2 = -1 \nonumber \end{align} $$

The Python code below uses both the Metropolis–Hastings and Gibbs sampling algorithms to sample from this system:

import numpy as np
from prettytable import PrettyTable, MARKDOWN

beta, J = 1.0, 1.0
states = np.array([[+1,+1],[+1,-1],[-1,+1],[-1,-1]])

def Ising(s): return -J*s[0]*s[1] # Energy

def boltzmann(E):
    p = np.exp([-beta*E(s) for s in states])
    return p/p.sum()

def metropolis(E,T=10000):
    """Single-spin flips, accept exp(delta E / kT)"""
    s=np.array([1,1]); out=[]
    for _ in range(T):
        i=np.random.randint(2); s2=s.copy(); s2[i]*=-1
        if np.random.rand()<np.exp(-beta*(E(s2)-E(s))): s=s2
        out.append(s.copy())
    return np.array(out)

def gibbs(E,T=10000):
    """P(s_i | s_j) \\prop exp( J s_i s_j / kT)"""
    s=np.array([1,1]); out=[]
    for _ in range(T):
        for i in (0,1):
            p_up=np.exp(beta*J*s[1-i]); p_dn=np.exp(-beta*J*s[1-i])
            s[i]=+1 if np.random.rand()<p_up/(p_up+p_dn) else -1
        out.append(s.copy())
    return np.array(out)

def tally(S):
    c=np.zeros(4)
    for s in S:
        for k,st in enumerate(states):
            if (s==st).all(): c[k]+=1
    return c/len(S)

tbl = PrettyTable(["Microstate",
                   "Exact",
                   f"Metropolis",
                   f"Gibbs"])
tbl.set_style(MARKDOWN)
[tbl.add_row([i+1] + [f"{x:.3f}" for x in row])
 for i,row in enumerate(np.vstack([boltzmann(Ising),
                                   tally(metropolis(Ising)),
                                   tally(gibbs(Ising))]).T)]

Which gives the following table:

Microstate Exact Metropolis Gibbs
1 0.440 0.438 0.445
2 0.060 0.062 0.060
3 0.060 0.057 0.062
4 0.440 0.443 0.433

As you can see, both the Metropolis–Hastings and Gibbs sampling algorithms converge fairly well to the exact Boltzmann distribution, with the sampled probabilities matching the theoretical values quite closely. If you run longer chains (more samples), you'll see even closer agreement.

Footnotes


  1. The partition function is the sum (or integral) of $e^{-\beta E(x)}$ over all microscopic states of the system. For anything beyond very small or highly symmetric systems this is a combinatorial explosion and cannot be evaluated exactly. In practice we work with ratios of probabilities where $Z$ cancels, or we approximate it using specialized methods. Conceptually, $Z$ is the normalization constant that turns $e^{-\beta E(x)}$ into a proper probability distribution. 

  2. Conditional decomposition just means writing a joint distribution in terms of conditionals. For example, a generic factorization is $P(x_1,\ldots,x_N) = \prod_{i=1}^N P(x_i \mid x_1,\ldots,x_{i-1})$. In Gibbs sampling we exploit the fact that for Gibbs measures of the form $P(x) \propto e^{-\beta E(x)}$ with local interactions, the conditionals $P(x_i \mid x_{-i})$ depend only on a local neighborhood, so they can be computed and sampled efficiently. Gibbs sampling then builds a Markov chain by resampling these local conditionals one variable (or one block) at a time. 

  3. A Markov chain is a sequence of random variables $X_0, X_1, X_2, \ldots$ such that the probability of transitioning from state $X_i$ to state $X_{i+1}$ depends only on the current state $X_i$ and not on the previous states $X_0, X_1, \ldots, X_{i-1}$. In other words, we do not require full knowledge of the trajectory/history of the system to determine the future state(s). 

  4. This just means that we weight the probability of the move occuring despite not lowering the energy by the factor $e^{-\beta \Delta E}$. 

  5. Gibbs is the surname of the Josiah Willard Gibbs, who was a major figure in statistical mechanics and thermodynamics in the late 19th century. 

References

[1] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang, "A Tutorial on Energy-Based Learning," in Predicting Structured Data, MIT Press (2006).

[2] L. Boltzmann, "Studien über das Gleichgewicht der lebendigen Kraft zwischen bewegten materiellen Punkten," Sitzungsberichte der Akademie der Wissenschaften zu Wien 58, 517–560 (1868).

[3] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, "Equation of State Calculations by Fast Computing Machines," Journal of Chemical Physics 21, 1087–1092 (1953). DOI.

[4] W. K. Hastings, "Monte Carlo Sampling Methods Using Markov Chains and Their Applications," Biometrika 57, 97–109 (1970). DOI.

[5] S. Geman, D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-6(6), 721–741 (1984). DOI.

[6] L. D. Landau, E. M. Lifshitz, Statistical Physics, 3rd ed., Course of Theoretical Physics Vol. 5, Pergamon Press, Oxford (1980).


Reuse and Attribution

Saturday, October 25, 2025

Backups for Headless Linux

On my home network I have a headless Linux server that I use for a variety of tasks. One of the problems is that I can't use Deja Dup to backup the data on the server because it requires a GUI. I'm not sure why this is, but it is what it is. So I needed to find a way to backup the disks.

For some time I had used rsnapshot to take regular backups but the problem is it doesn't have a nice way to restore or find files other than searching through the folder tree, which is fine but sometimes annoying. Also there is a lot of manual config stuff I had to set to get it to behave the way I want because it doesn't use deduplicating backups but rather hard linking. The reason deduplicating backups are nice is they only perform incremental deltas and make efficient use of storage resources.

Fortunately I was able to find, maybe late to the game actually, two CLI tools that are pretty good for this purpose. The first is borgbackup. The other is restic.

Borgbackup

Borgbackup is a deduplicating backup tool ... interesting name though. Seems to be a pretty modern backup tool that is designed to be fast, secure, and efficient. Ideally it focuses on backing up and restoring your data. I had it setup, you can install from Ubuntu repos using sudo apt install borgbackup, but the downside is it doesn't really support cloud storage directly, although through rclone I think you can get it to work, on that note let me talk about rclone.

Rclone

One of the biggest gripes I have about Google is that they don't have a native Linux client/tool to support Google Drive. It is pretty ridiculous in my opinion, but for a long time the rclone project has been a great solution; it's generally cross-platform and supports a lot of cloud storage providers. It is a command line tool to sync files and directories to and from various cloud storage providers. To be honest I'm not too strong with its configuration and structure because on my personal laptop I use insync to sync my Google Drive; I paid $4.99 for a lifetime license and have been grandfathered into the newer versions which are subscription based.

The thing with rclone is if you just go through the generic setup you will have to use the rclone org's built-in Google API Client ID which is going to most likely be rate-limited and have other quotas. So the best thing to do is to create your own Google API Client ID and use that. You'll have to login to console.cloud.google.com and create a new project. Then you can go to APIs & Services > Credentials and create a new credentials for your internal use. Save the output from this its the Client ID and Client Secret.

Configuring Google API Client ID

Then when your setting up rclone and it asks you if you want to setup your own Google API Client ID then input the Client ID and Client Secret you just created. Then you can proceed to setup the rest of the configuration. This will allow you to experience a much more fully transport sync and backup experience. Essentially you have a way to sync your local filesystem to Google Drive and vice versa. Going to leave it there for now.

Restic

So with Borgbackup not being so friendly to cloud storage or rclone. I kept on my search and found restic, which supports rclone natively. The main command to leverage the restic repo (thats how they name things) is:

restic -r rclone:GoogleDrive:Backup init
restic -r rclone:GoogleDrive:Backup <commands>

This creates a backup repository using rclone on your configured Google Drive and it calls the folder Backup in your Google Drive. You can then use the following commands to backup your data:

restic -r rclone:GoogleDrive:Backup backup <path/want/to/backup>
restic -r rclone:GoogleDrive:Backup restore <snapshot_id> --target <path/where/to/restore>
restic -r rclone:GoogleDrive:Backup list snapshots
So the thing is, this is all manual and you'll most likely want to alias things so that it's more natural, because you could have several different restic repositories that you're backing your data onto. The other thing is you will want this to occur through a cron job or some other automated process that way you have regular backups and you don't have to remember to run the commands manually.

There is the downside that restic requires that you password encrypt your backups. So you can't view these backups directly, you have to leverage restic to view them and or mount them with say FUSE to view them locally.

Automating

To achieve regular backups without having to manually run the restic commands, the simplest solution is to just setup a cron job. Run something like:

crontab -e
# Run backup script daily at 2 AM
0 2 * * * /home/user/backup_script.sh

where the backup_script.sh is just a wrapper bash script for running restic and possibly any webhook notifications you want. You can add any logging stdout you want as well.

#!/bin/bash
export RESTIC_PASSWORD_FILE="/path/to/user/.restic_password"
REPO="rclone:GoogleDrive:Backup"
restic -r $REPO backup /location/to/backup
... other restic commands ...

Notice that you'll have to set the RESTIC_PASSWORD_FILE environment variable to the path to the file that contains your restic password for the repo you are backing up 😠.

Backup frequency and retention

The cron job just runs the backup_script.sh script at the specified time but it is not responsible for the backup frequency and retention policies. Although if you have daily backups that occur more than once you would want the cron job to run at that same frequency so that you keep the backups consistent.

So say you want to keep daily backups for 7 days, weekly for 4 weeks, and monthly for 12 months. You would want to run the following command after each backup:

restic -r rclone:GoogleDrive:Backup forget --keep-daily 7 --keep-weekly 4 --keep-monthly 12

# Removes the actual stale backups
restic -r rclone:GoogleDrive:Backup prune

The prune command actually removes the data that's no longer needed, while forget just marks snapshots for deletion. The forget command is more like a "soft delete" in that it marks the snapshots for deletion but doesn't actually remove the data. The prune command is more like a "hard delete" in that it actually removes the data.

Performance Considerations

Because I backup directly to Google Drive, I've noticed it's very slow and time consuming. This probably means I need to tune the restic command to be more efficient like using parallel uploading or changing the chunking size.

Final Thoughts

I'm finding that rclone + restic is a pretty good combination for my needs. The deduplication will prevent a lot of Google Drive storage consumption. I don't really care about the data encryption (wish I could turn off) but it is what it is. I think if you're working on a headless server (e.g. self-managed compute cluster) and you want to backup data on your network, this is a pretty reasonable solution.

The one thing I can't speak to is how syncing behaves, where you're working on two systems and need files to be locked and synced properly as you work on those files. Like I mentioned on my personal laptop I use insync to sync my Google Drive and this works pretty well.


Reuse and Attribution