May 4, 2024
FAIR data enabling new horizons for materials research – Nature

FAIR data enabling new horizons for materials research – Nature

Putting what is outlined above into practice is a rocky road. To motivate the community to join a culture of extensive data sharing, FAIRmat’s policy is to lead by example. Two issues are obviously important to speed up the process and trigger active support: (1) successful, living examples of daily data-centric research11 to demonstrate what and how things work; and (2) outreach to the wider community, including the education of future scientists and engineers.

To cope with the first point, FAIRmat will demonstrate its approach with specific examples from diverse research fields, including battery research, heterogeneous catalysis, optoelectronics, magnetism and spintronics, multifunctional materials and biophysics. In all of this, FAIRmat will demonstrate the synergistic interplay of materials synthesis, sample preparation and experiments, as well as theory and computation, and provide a much more comprehensive picture than the single subcommunities can achieve. As such, FAIRmat will bring together not only data and tools but, most notably, also people, who will learn each other’s ‘language’. In fact, the necessary width of competences goes along with a diversity in the nomenclature, which can hamper communication as well as the definition of metadata and ontologies. Likewise, electronic lab notebooks (ELNs) must be standardized to allow seamless integration of data into automatic workflows. Dedicated data-analysis and AI tools will be developed and demonstrated that help to identify the key descriptive physicochemical parameters12,13,14,15. This will allow for predictions that go beyond the immediately studied systems and will show trends and enable the identification of materials with statistically exceptional properties16. Combining data from different repositories opens further opportunities.

Let us exemplify with two emerging classes of materials that the exploitation of an efficient data infrastructure will be not only helpful but simply mandatory for the digitization of materials research17. These examples are high-entropy alloys (HEAs) and metal–organic frameworks (MOFs). For these classes, the sheer number of possible materials is so large that conventional approaches will never be able to unleash even a small part of their full potential. For HEAs, a number of 109 possible composite materials with distinctly different properties has been estimated18, with many of them showing, for example, mechanical properties that exceed by far those of conventional alloys. This huge space of materials further contains HEA oxides with interesting properties in catalysis and energy storage. In the case of MOFs, the situation is even more pronounced. As a result of the huge diversity of MOF building blocks, inorganic clusters and multitopic molecules, the number of compounds is unlimited. Even if one limits the building block weight to that of fullerene (C60), synthesizing only one replica of each compound would already need more atoms than are available on planet Earth. Using AI to analyse the huge amount of experimental information (data for about 100,000 MOFs are stored in databases19), we will be able to identify or predict MOFs with particular properties dictated by conceived applications20; for example, in optoelectronics21, biomedicine or catalysis22.

Turning to the second point—to foster awareness of the importance of FAIR scientific data management and stewardship1—FAIRmat will reach out to current students of physics, chemistry, materials science and engineering. We aim to educate a new generation of interdisciplinary researchers, offering classes and lab courses, and to introduce new curricula. A necessary requirement is to convince teachers, professors and other decision makers. The FAIRmat consortium will initiate and organize focused, crosscutting workshops together with, for example, colleagues from chemistry and biochemistry, astroparticle and elementary particle physics, mathematics and engineering. Some topics may be general, such as ontologies or data infrastructure, whereas others will be more specific, including particular experimental techniques or specific simulation methods. Hands-on training, schools and hackathons, as well as the usual online tutorials, will be part of our portfolio. Listening to the needs of small communities or groups will make sure that no one is left behind.

Although industry is very interested in the availability of data, the materials encyclopedia and the AI tools, most investigators hesitate to contribute their own data. Understandably, a company can survive only if they create products that are better or cheaper than those of their competitors. FAIRmat accepts these worries, for example, by allowing for an embargo of uploaded data (see above). The NOMAD Oasis (see also below), which is a key element of the federated FAIRmat infrastructure, can also be operated behind industrial firewalls as a stand-alone server with full functionality.

Science is an international, open activity. So, clearly, all the concepts and plans are and will be discussed, coordinated and implemented together with our colleagues worldwide. In fact, the first FAIR-DI Conference on a FAIR Data Infrastructure for Materials Genomics had 539 participants from all over the world (https://th.fhi-berlin.mpg.de/meetings/fairdi2020/).

Let us end this section by noting that individual researchers already profit from the data infrastructure, even though we are at an early stage in progressing towards the next level of research. For example, countless CPU hours are being saved because computational results are well documented and accessible and do not need to be repeated. Consequently, human time is saved as well and scientists can concentrate on new studies. Students learn faster as they can access extensive reference data. Error or uncertainty estimates are possible and more robust when using well-documented databases. Further results not documented in publications are available in the uploaded data. Studies that were designed for a specific target can now be used for a different topic (repurposing). After receiving a digital object identifier (DOI), uploaded data become citable. This also applies to analytics tools. Although the full potential of FAIRmat will require a larger community to realize and join, the spirit of findable and AI-ready research data has already attracted substantial attention.

Source link