NOMAD Kaggle Competition: Leveraging Stored Materials Data

Winning solution from Tony Y. employed features generated from counting the contiguous sequence of atomic coordination environments of various lengths. The Depiction shows a crystal graph representation of In3Ga1O6, where the connections between each atom are defined based on the ionic distances and reflects the coordination environment. This model is based on a commonly used approach in the machine learning in linguistics called the n-gram model.

Finding better ways to discover new materials computationally is one of the great challenges of our time.

But just like the quote from the 1989 film Field of Dreams, "If you build it, they will come."

The NOMAD Repository built it. And Kaggle hosted it. And folks from all over the world showed up.

More specifically, an open, Big-Data Kaggle competition was organized by NOMAD for the identification of new potential transparent conductors – used, for example, for photovoltaic cells or touch screens. Progress in this field in terms of developing new materials has wide-ranging applications affecting all of us. For example, higher efficiencies in photovoltaic devices are relevant for augmenting an increase in global power consumption due to population growth with alternative energies.

The outcome of this competition was three different and accurate approaches for the prediction of two key materials properties. A more mathematical description of the competition and how the three winners, Tony Y. (Japan), Dr. Yury Lysogorskiy (Germany) and Lars Blumenthal (United Kingdom), tackled the prediction can be found here.

More generally, this Big-Data competition showed that progress can be made possible by algorithm improvements in machine learning to facilitate the development of numerically efficient and accurate computational models derived from big data, which can only happen if these data are made available for machine learning in an online format for everyone to use. Indeed, this project represents in many ways a democratization of materials research: Kaggle allows for participants from all of over the world, irrespective of background, the access to data in order to attempt to solve tough problems using machine learning.

For the solutions to big data competitions to be useful --- finding better materials for targeted applications -- requires domain experts to design datasets focused on specific problems. The growing amount of computational data in materials databases, such as the Novel Materials Discovery (NOMAD) Laboratory (as well as Citrination, Materials Project,OQMD , and AFLOWLIB) allows for machine learning to be applied to materials science.

With these repositories and machine learning, the gate is wide open for algorithm development to change the daily routine by finding new materials!