Welcome to my profile
Hey! My name is Alba and I am a Sandbox Data Scientist at the Health Data Science Center at the University of Copenhagen. Throughout my academic career as a PhD student and postdoctoral researcher, I gained strong expertise in large-scale genomics and the development of computational pipelines and environments on computing clusters.
Currently, my focus is on creating online training modules and providing computational services for researchers. My goal is to help others manage their research data effectively. I provide computational support by offering lectures on tools such as Cookiecutter, Shiny apps, Git, GitHub, Snakemake, Nextflow, Docker, and Conda, which assist in building FAIR (Findable, Accessible, Interoperable, Reusable) pipelines and software to perform data analysis. Additionally, I conduct practical sessions involving hands-on projects, enabling researchers to apply these tools in real-world scenarios. For instance, we deploy SQLite catalogs using Shiny apps, creating a practical and interactive method for exploring and managing data.
I am responsible for maintaining our website, which is written using Quarto, and managing several modules:
- Computational RDM: We cover key Research Data Management (RDM) concepts and best practices tailored for the omics field, providing researchers with a toolkit to help them organize, integrate, and visualize their data. These tools will help researchers ease effective data management and reduce the time spent on managing their data.
- HPC Best Practices: main contributor to
HPC Pipes
. This module serves as an introduction to pipelines and workflows in bioinformatics. We explore two of the main languages, Snakemake and Nextflow, and go through some community-based nf-core pipelines, examining their templates. Additionally, we discuss the structure of processes or rules, the advantages of using these tools, how to set resources, and the importance of benchmarking. We also ensure efficiency and scalability by optimizing pipelines and ultimately reducing the time researchers spend on data management tasks.
As part of the Sandbox project, we develop “apps” that contain notebooks and teaching materials that are containerized using Docker. This allows us to package them into portable and reproducible environments, simplifying deployment and ensuring consistency across different systems. I am currently involved in developing, expanding, and maintaining our applications, which utilize the latest analysis software and tools for omics data analysis.
Interests
- (Health) Data Science
- (Population) Genomics
- Research data management
Education:
- PhD in Population Genetics (2019-2021)
- Master in Bioinformatics (2017-2019)
- Institution: University of Copenhagen
- BSc in Health Biology (2013-2017)
- Institution: University of Alcalá de Henares
- Erasmus Program at Vrije Universiteit Brussels
About me
I have a deep appreciation for nature and find peace in hiking, while my love for travel brings me adventure and the opportunity to explore other cultures.