Search Blogs

Tuesday, January 16, 2024

Materials Project REST API w/ Julia

The Materials Project (MP) is one of the most successful computational materials science databases for exploration of materials properties [1]. On top of that, it's also large enough to do some serious machine learning for materials design and discovery. There are for sure limitations on the data generated1, but I'm not going to touch on that. What I'll focus on is actually how to grab this data via the API. The reason I'm creating a post is:

  1. To remind myself on how to use HTTPS request for a REST API.
  2. Show how straightforward it is to do in Julia.

For most, the way to access the MP data is to use the python MPRester package, which integrates with the very useful pymatgen package. However, I do a lot in Julia and although it is very easy to install python packages and use them from within Julia, it adds an additional dependency and overhead. So here I'll so how simple it is to use the REST API with only two Julia packages: HTTP and JSON. Both of these packages are implemented in pure Julia and are very mature within the Julia package eco-system, so one does not need to worry about supported operations. You can install these packages in the REPL with:

using Pkg Pkg.add("HTTP") Pkg.add("JSON")

Background

I was not familiar with REST API until about 2 years ago. A REST API (Representational State Transfer Application Programming Interface) is a set of protocols and standards used for exchanging data between systems. It utilizes HTTP requests to access and manipulate data, which can be in a variety of various formats but typically in JSON or XML. The most common use is to enable interactions between client and server in web applications, allowing for operations such as retrieving, updating, or deleting data stored on the server. Its a favorable architecture because it is scalable and performant, while also standardizing communication among different systems.

Since it uses HTTP protocol almost anything connected to the internet can use a REST API. Furthermore any modern computing language that has a library with functions that support HTTP can by defacto become a API. Thus its very easy to implement calls in Julia.

Creating a function to grab an MP entry

We can now implement a function that grabs the summary data for the a MP entry id. To use the MP REST API we need to have a base url, endpoint, and an operation. The base url for the materials project REST API is just: https://api.materialsproject.org. What is an endpoint? An endpoint is just a particular location and is associated with a particular operation or set of operations that can be performed on a resource, such as retrieving, creating, updating, or deleting data. These endpoints are accessed through standard HTTP methods like GET, POST, PUT, and DELETE. Since we have no admin privileges, we can only use GET.

Okay, there is one last thing we need. Many APIs require authentication, meaning you need to be an approved user. To do so they typical use a unique digital key, which is nothing but a token that consist of symbols2. To get a MP api key, you need to sign-up. We now have everything we need.

Note

For this function I'm just going to use the summary endpoint which provides a fairly comprehensive dataset for a materials project ID. You can modify the endpoint to get more specific data (i.e., /materials/thermo/ )

function get_mp_summary(id::String, api_key::String, all_fields=true) base_url = "https://api.materialsproject.org" endpoint = "materials/summary/$(id)?_all_fields=$(all_fields)" query_url = joinpath([base_url,endpoint]) headers = ["accept" => "application/json", "X-API-KEY" => api_key] response = HTTP.get(query_url,headers) data = JSON.parse(String(response.body)) return data end

As you can see its a very small amount of code. The endpoint variable provides the specifics about our query, which points to a materials project id, and then uses the ? to indicate a new query that states if all the data fields should be returned or not. In this case the query for all fields is yes (i.e., true). To create the HTTP url we just combine everything into query_url. The next aspect is the header variable, which is a dictionary that specifies the we are expecting a JSON formatted data and the value of the api key. Finally, we make the HTTP request and then parse the returned JSON to a Julia dictionary. Thats it!

The data

What we get is a fairly deep dictionary structure, so its useful here to go through at least the first two layers. I'll illustrate for mp-510604, which is Mn$_2$O$_3$. The first key for the dictionary is the data:

data = get_mp_summary("mp-510604",MP_API_KEY) keys(data)

KeySet for a Dict{String, Any} with 1 entry. Keys: "data"

The data key only has a single entry that is a Dict{String,Any}, this is where all the material structure and property data is. So we now want to go deep into the data, lets list all the keys in the Dict{String,Any} for data:

keys(data["data"][1])

KeySet for a Dict{String, Any} with 70 entries. Keys: "e_ionic" "chemsys" "weighted_surface_energy_EV_PER_ANG2" "material_id" "homogeneous_poisson" "deprecated" "shape_factor" "uncorrected_energy_per_atom" ⋮

Once you understand the structure of the data, you can then proceed how you intend to use the MP, thats it. As noted earlier, you can change the endpoint to look at other material properties that are calculated from the DFT data.

Footnotes


  1. One of my pet peeves is that DFT calculations have become so revered that they are often used without sufficient caution or scrutiny. As numerical methods improve, with better functionals for exchange-correlation (XC) and corrections such as DFT+U, the uncritical acceptance of large datasets using standard DFT calculations, i.e., GGA or meta-GGA, seems questionable. My bias is that the quantum physics of materials is in truth a many-body problem as well as not merely a ground-state one. Thus, if you're using large "approximate" datasets to train ML models, then you're likely to encounter problems when moving away from in-silico! Feel free to correct me. 

  2. My guess is the token is actually just a public or private ssh-key for the REST-API server that is assigned to you. 

References

[1] A. Jain, et al., Commentary: The Materials Project: A materials genome approach to accelerating materials innovation, APL Materials 1 (2013) 011002. https://doi.org/10.1063/1.4812323.


Reuse and Attribution

No comments:

Post a Comment

Please refrain from using ad hominem attacks, profanity, slander, or any similar sentiment in your comments. Let's keep the discussion respectful and constructive.