Pillar 3 - Exascale Big data analytics

Bringing the NOMAD tools to near-real-time performance


The overall aim of Pillar 3 is to utilize exascale technology to advance the existing big data tools and bring them towards near-real-time performance (response time of seconds or less). Present artificial intelligence (AI) tools are too slow and not fully reliable to be used together with aiCMS and HTC studies. The exascale big data analytics developed in Pillar 3 will enable users to perform general queries to the NOMAD Archive, quickly receive interim results and then adjust their query to give improved results, flexibly iterating the process to achieve the best available answer that meets their specific needs. The high-throughput workflows of Pillar 2 and the use cases in WP9 will make use of the new near-real-time high performance artificial intelligence tools capabilities to decide the optimal computations to perform next (e.g. WP4).



This Pillar will advance the existing NOMAD Big-Data analytics tools and algorithms and bring them to near-real-time performance. Moving from off-line analysis to on-line near-real time interactive processing will require a step change to bring algorithms towards exascale performance, and will pay off by opening a new paradigm in predictive materials modelling. This will involve co-design of software and hardware solutions in collaboration with the HPC centers and with second-shell hardware vendors. For example, a specialized GPU-cluster architecture will be developed and deployed for running highly scalable deep-neural-network and descriptor-discovery algorithms and thus demonstrating interactive capabilities (see WP8). Pillar 3 will be structured around set of of analysis tasks – including classification, regression and dimensional reduction - and method-oriented tasks – including neural networks, kernel methods, subgroup discovery and compressed sensing, and will deliver a new exascale-ready library, AI-X. This will be supported by the horizontal data infrastructure work package (WP7), which will expose the high performance artificial intelligence tools developed to end users in a consistent manner through standardized interfaces and public APIs. The existing NOMAD Repository and Archive underpin the new analysis tasks and will be kept up to date with developments in hardware as well as in the community codes that feed into the repository.


Brief description of the WP in Pillar 3

Work Package 6: Big-Data Analytics

In this WP, we develop towards-exascale artificial-intelligence tools with near-real-time performance and demonstrate them using a range of highlight applications drawn from Pillars 1 and 2. For these purposes, four complementary methods will be used: neural networks (NN), kernel methods, compressed sensing (SISSO), and subgroup discovery.

PI and Co-PI: Luca Ghiringhelli and James Kermode