|𝔻⟩irac's Student: Thank you Prof. KohnGPT, can you ...

As I go into the new year I've decided that I'm going to try to make more of an effort to write more blogs on my thinking. I'm also not going to try and spend a great deal polishing and formatting, rather will publish them and then update them as needed. The point is not so much for the prospective readers but for me to archive and flush out my ideas on several topics that I have been thinking about for the last year. For one I really want to understand more in-depth the design and concepts behind ChatGPT, and more specifically how transformers are used and work. I want to think about how these natural language processing models can be used to improve working scientist writing and computing tasks. More specifically, how could one retrain/teach ChatGPT (assuming it's eventually open-sourced!) to provide materials scientists with template computing workflows. Say I wanted to perform a ground-state DFT calculation using VASP for some magnetic sulfide. I know from experience how to setup the INCAR, POTCAR, KPOINT, and POSCAR files, but how cool would it be to just type into the ChatGPT box:

Create the VASP input files to calculate the ground-state of FeS. Use the known stable form of FeS for the POSCAR file and use the a functional and pseudopotentials that have been used by previous studies.

On top of that imagine that you could then ask,

Well, I actually don't have access to VASP at the moment, can you show me the equivalent input scripts for Abinit or QuantumEspresso?

if this modified ChatGPT is delivered on this, 🤯. Think about a domain large language model for quantum chemistry and DFT, say KohnGPT. I mean imagine all the time saved on redundant tasks by computational scientists and being able to go from one code to another, unbelievable. Granted it is important to know what the keywords and setup correspond to in terms of the physics being simulated, so this isn't a replacement for learning those. But wow would this be cool if you ask me.

Just so you know I tried this out on ChatGPT, and it failed pretty badly. It gets a good amount write, but it fails or just makes things up (I guess this is called hallucination in NLP). You can query ChatGPT to write some setups if your looking for suggestions, but truthfully any experience DFT practitioner would know these pretty quickly. Here are the outputs* from ChatGPT using the quotes above:

which gives a cubic structure, which is wrong based on the materials project entry for FeS and what ChatGPT says when I asked it "what is the spacegroup number for this structure", for which it says Pnma, #62. When I ask for the INCAR, I get:

It is incomplete and erroneous. The KPOINT and POTCAR request is given as:

The KPOINT file doesn't look bad, but as expected the POTCAR file is pointless. The latter makes sense since the pseudopotential files for VASP are only available for license holders. If I try and ask it to show the corresponding files for Quantum Espresso, it actually doesn't look too bad, but I'm not a heavy user of QE so I would need to verify.

Finally, I'm changing the blog name from "An equation a day keeps from going astray" and may migrate to a platform where I can write the post in Markdown markup.

* I had to ask ChatGPT to explicitly create the input files, it just gave me the steps initially

Reuse and Attribution