WP2: Unified programming model for data-intensive applications
The first task of Work Package 2 is Task 2.1 that has the goal to design the Data- centric Programming Model for Exascale Systems (DCEx) based upon data-aware basic operations for data-intensive applications supporting the scalable use of a massive number of processing elements and develop a prototype API based on that model. The DCEx programming model uses private data structures and limits the amount of shared data among parallel threads. The basic idea of DCEx is structuring programs into data-parallel blocks. Blocks are the units of shared- and distributed-memory parallel computation, communication, and migration in the memory/storage hierarchy. Computation threads execute close to the data, using near-data synchronization. In the DCEx model three main types of parallelism are exploited: data parallelism, task parallelism (data–driven), and SPMD (Singe- program multiple-data) parallelism.
This document discusses libraries and software tools for extreme data processing, it is structured as follows. Section 2 covers the state-of-the-art of existing libraries and tools for both data-intensive and data-analytics applications. Section 3 presents a prototype of a parallel programming framework for data-intensive application based on task parallelism and workflow execution.
WP3: Scalable Monitoring and Auto-tunning
Deliverable 3.1 is intended to serve as an scientific summary of the research activ- ities conducted within the first half of Task 3.1. More concretely, the deliverable describes multiple concepts pertaining to the development of a general monitoring model capable of supporting exascale architectures. Therefore, the initial activities conducted within Task 3.1 included requirements analysis for the monitoring system in relation to the use-cases. Consequently, the results from the requirements analysis were used to define the monitoring approach, which has been purposely tuned for data-intensive applications, thus offering fine-granular performance information with low-overhead at the exascale level. The approach defined during the modeling stage, will later be developed as a monitoring tool and will be integrated within the exascale monitoring system as defined in Task 3.2.
This deliverable reports on the activities performed in WP3 for the initial definition of the Extreme Scale Monitoring Architecture. The work presented here is the result of T3.1 and T3.2. We present a comprehensive background and related work focus- ing on those components which can offer benefits to our Extreme Scale Monitoring Architecture.
WP4: Exascale data management
This deliverable reports on the activities carried out during the first half of Task T4.1. The main activity of Task T4.1 has been to to develop a methodology for profiling and analyzing data-intensive applications for identifying opportunities for exploiting data locality. This methodology has been used in WP5 to trace data operations in ASPIDE use cases. From those data and the previous experience of project partners, the dynamics of data movement and layout throughout the whole storage I/O data path from the back-end storage up to the application has been traced and studied to design mechanisms for exposing and exploiting data locality. Finally, in this tasks, techniques for providing dynamic configurations of the I/O system have been developed to enhance the data life-cycle by reflecting applications I/O patterns and needs.
WP5: Validation through applications
The first activity of Task T5.1 has been the definition of the strategy for use-case requirements collection and analysis. The strategy includes guidelines for identifying and formalizing these requirements. Requirements from all use cases have gathered in a repository that will be used during the project for follow up activities and their result evaluations. A sharing and negotiation process between the partners has been carried out to ensure that all project members share the same understanding and prioritization of requirements. The second activity of the task has been to organize, sort, and cluster require- ments that are common to all the use cases. Common requirements are addressed with priority as that may have a major impact on the project. The set of use cases (urban computing, opinion mining, biomedicine and deep learning) have been selected to provide a good landscape to understand the needs of data-intensive applications in Exascale systems.