Ph.D defense

Time has come! I will defend my thesis on July 8th, 2015 at 2pm (Paris time) at the École Normale Supérieure de Lyon, in Amphi B.


Everyone is welcome to attend and/or come have a drink afterwards :-)


Title: “Active Data: Enabling Smart Data Life Cycle Management for Large Distributed Scientific Data Sets”

Abstract:
In many domains, scientific discoveries rely increasingly on our ability to exploit ever growing volumes of data. A key point is managing the complexity of data life cycles, i.e. the various operations that happen to data from their creation to their deletion: transfer, archival, replication, disposal, etc. These formerly straightforward operations become intractable when data volume grows dramatically, because of the heterogeneity of data management software on the one hand, and the complexity of the infrastructures involved on the other. In this context, cooperation between different systems becomes very complex and requires ad-hoc solutions and many human interventions.


This thesis contributes theoretical and practical tools that allow a formal and efficient management of data life cycles in large scientific applications. We propose a meta-model that allows for the first time to represent formally and graphically the life cycle of data distributed in an assemblage of systems on heterogeneous infrastructures. Then, we present Active Data, an implementation of this meta-model and a programming model which allow to execute code at each step of the data life cycle. Active Data programs have access to the complete state of data at any time, in any system or infrastructure they are distributed on. These programs can thus make informed decisions based on a global knowledge, and implement many optimizations that would otherwise have been impossible.


We finally present performance evaluations and use-cases that demonstrate the expressivity of the programming model and the quality of the implementation.


Finally, here is my manuscript and there are my defense slides.