Research
My work focuses on advancing the potential of data science in scientific computing through the development of robust, efficient, and sustainable system solutions. As scientific research becomes increasingly reliant on data-driven approaches, the need to effectively manage and utilize complex data science ecosystems has never been more pressing. By addressing challenges in system design, performance engineering, and interdisciplinary collaboration, my research aims to bridge the gap between data science and systems engineering to optimize scientific computing infrastructures for emerging applications.
Addressing Key Challenges in Data Science Ecosystems
My research directly addresses several critical challenges in data science ecosystems:
- Resource Management: The efficient execution of data-intensive workloads across diverse hardware platforms is a core challenge. Effective resource management requires developing autonomous systems for scaling resources dynamically, particularly in cloud-based environments. Research in this area focuses on optimizing resource allocation, enhancing performance prediction, and managing the complexity of data analytic tasks, ensuring efficient use of computational resources.
- Dynamic Adaptation: AI model training is an iterative process, and the continuous influx of new data demands constant adaptation. Dynamic adaptation involves techniques for autonomously scaling resources and adjusting models in production environments based on real-time data. This includes strategies for automating scaling decisions, ensuring that systems can evolve in response to changing workloads and model requirements.
- Data Privacy and Security: As data is exchanged between different parties, preserving privacy is crucial. Techniques such as synthetic data generation and homomorphic encryption are vital for ensuring that sensitive data can be shared and analyzed without compromising privacy. Research in this area focuses on developing methods that allow secure data exchanges, enabling collaboration while safeguarding personal and sensitive information.
- System-Level Optimization: The complexity of optimizing configurations across interconnected components within data science ecosystems remains a significant challenge. This includes benchmarking data science clouds, container orchestration platforms, and other key infrastructure components. Effective optimization requires developing standardized methodologies to evaluate system performance, ensuring that the entire ecosystem functions efficiently.
- Sustainable Data Science: The environmental impact of computational systems is a growing concern. Research in sustainable data science focuses on improving the energy efficiency of machine learning models and deep learning systems, which are often computationally intensive. Developing benchmarks for energy-efficient AI and data analytics is key to reducing the carbon footprint of modern data science applications.
A Translational and Interdisciplinary Approach
My research is characterized by its translational focus, turning theoretical advancements into practical solutions. Through interdisciplinary collaboration, I tackle complex problems by working with experts across various fields. This approach is exemplified in my partnerships with healthcare institutions, such as the University Hospital Würzburg and The Paracelsus Medical University in Salzburg, where we combine expertise to address real-world challenges.