WP2: Unified programming model for data-intensive applications
The first task of Work Package 2 is Task 2.1 that has the goal to design the Data- centric Programming Model for Exascale Systems (DCEx) based upon data-aware basic operations for data-intensive applications supporting the scalable use of a massive number of processing elements and develop a prototype API based on that model. The DCEx programming model uses private data structures and limits the amount of shared data among parallel threads. The basic idea of DCEx is structuring programs into data-parallel blocks. Blocks are the units of shared- and distributed-memory parallel computation, communication, and migration in the memory/storage hierarchy. Computation threads execute close to the data, using near-data synchronization. In the DCEx model three main types of parallelism are exploited: data parallelism, task parallelism (data–driven), and SPMD (Singe- program multiple-data) parallelism.
This document discusses libraries and software tools for extreme data processing, it is structured as follows. Section 2 covers the state-of-the-art of existing libraries and tools for both data-intensive and data-analytics applications. Section 3 presents a prototype of a parallel programming framework for data-intensive application based on task parallelism and workflow execution.
This deliverable covers the description of the explored data collection and mining methodologies for improving data placement in data-intensive applications.
WP3: Scalable Monitoring and Auto-tunning
Deliverable 3.1 is intended to serve as an scientific summary of the research activ- ities conducted within the first half of Task 3.1. More concretely, the deliverable describes multiple concepts pertaining to the development of a general monitoring model capable of supporting exascale architectures. Therefore, the initial activities conducted within Task 3.1 included requirements analysis for the monitoring system in relation to the use-cases. Consequently, the results from the requirements analysis were used to define the monitoring approach, which has been purposely tuned for data-intensive applications, thus offering fine-granular performance information with low-overhead at the exascale level. The approach defined during the modeling stage, will later be developed as a monitoring tool and will be integrated within the exascale monitoring system as defined in Task 3.2.
This deliverable reports on the activities performed in WP3 for the initial definition of the Extreme Scale Monitoring Architecture. The work presented here is the result of T3.1 and T3.2. We present a comprehensive background and related work focus- ing on those components which can offer benefits to our Extreme Scale Monitoring Architecture.
Deliverable 3.3 is intended to serve as a scientific summary of the research activities conducted within the second half of Task 3.1 and its relation to Tasks 3.2 and 3.3. More concretely, the deliverable describes multiple concepts pertaining to the development of the analytical and monitoring system capable of collecting and mining the system and application performance data in exascale environments. Therefore, after the requirement analysis and the research of the initial version of the algorithm and tool for monitoring agents and aggregators assignment, next activities conducted within Task 3.1 included exploration of methods for data collection for the monitoring system in relation to the use-cases. Consequently, after collecting the data, pre-porcessing algorithms were explored for reduction of the dimensionality of the data. Thereafter, machine learning and data mining techniques were explored to extract reliable and meaningful information from the monitoring data, which has been purposely tuned for auto tuning of data-intensive applications, thus offering fine-granular performance information with low-overhead at the exascale level.
WP4: Exascale data management
This deliverable reports on the activities carried out during the first half of Task T4.1. The main activity of Task T4.1 has been to to develop a methodology for profiling and analyzing data-intensive applications for identifying opportunities for exploiting data locality. This methodology has been used in WP5 to trace data operations in ASPIDE use cases. From those data and the previous experience of project partners, the dynamics of data movement and layout throughout the whole storage I/O data path from the back-end storage up to the application has been traced and studied to design mechanisms for exposing and exploiting data locality. Finally, in this tasks, techniques for providing dynamic configurations of the I/O system have been developed to enhance the data life-cycle by reflecting applications I/O patterns and needs.
This deliverable reports on the activities carried out during the Task T4.2 in ASPIDE. The main activity of Task T4.2 has been to provide solutions to leverage cross-layer data management for resilience and performance in Exascale systems. This work is based on the methodologies proposed in D4.1, resulting of Task 4.1.
WP5: Validation through applications
The first activity of Task T5.1 has been the definition of the strategy for use-case requirements collection and analysis. The strategy includes guidelines for identifying and formalizing these requirements. Requirements from all use cases have gathered in a repository that will be used during the project for follow up activities and their result evaluations. A sharing and negotiation process between the partners has been carried out to ensure that all project members share the same understanding and prioritization of requirements. The second activity of the task has been to organize, sort, and cluster require- ments that are common to all the use cases. Common requirements are addressed with priority as that may have a major impact on the project. The set of use cases (urban computing, opinion mining, biomedicine and deep learning) have been selected to provide a good landscape to understand the needs of data-intensive applications in Exascale systems.
This report comprises detailed descriptions of the applications, the list of used WP2- WP4 concepts, their modes of integration with the applications, recommendations and APIs that will be taken into account in the redesign process, as well as the key performance indicators for the future evaluation of the improvements in performance through the re-design and re-implementation.