Work package 1. Data processing object extraction, Curation FAIR Compliance
π― Objectivs
Work package 1: Extract, curate, and organise Data Processing Objects (DPOs) from heterogeneous omics and biomedical datasets, ensuring they are reproducible, FAIR-compliant, and ready for cross-domain use. The aim is to generate and annotate DPOs in a containerised infrastructure. The DPOs will be extracted from papers and resources relying on cross-domain experts. The work will validate DPOs for correctness, runtime, and performance while maintaining a curated data lake that meets FAIR principles. The created DPOs will be used later in training the DPO Recommender.
π§ Tasks
Task 1.1 DPO Extraction:
1.1.a β Generate DPOs from raw and processed biomedical datasets by manuel collection from cross domain experts or automated and semi-automated workflows (e.g., Work package 2).
1.1.b β Map and integrate DPOs across biomedical domains (e.g., genomics, imaging, clinical).
1.1.a β Annotate workflows and datasets with structured metadata (e.g., phenopackets) to standardise representation.
1.1.d β Support development of novel tools, pipelines, or data query protocols for cross-domain research.
Task 1.2 -Curate and validate DPOs
1.2.a β Design the curation guidlene for DPO using cross domain exports.
1.2.b β Validate DPOs for correctness, reproducibility, runtime, and interoperability.
1.2.c β Maintain a curated DPO data lake and FAIR compliance checklist.
Task 1.3 - DPOs hosting infrastructure
1.3.a β Implement a container-based infrastructure where each DPO runs in its own isolated environment for reproducibility.
1.3.b β Integrate automated evaluation and monitoring to capture workflow correctness, runtime, and performance metrics.
1.3.c β Distribute the infrastructure across organisational resources to provide redundancy, enable seamless exchange, and ensure backup and resilience.
Task 1.4 β FAIR & sustainability guidelines and training
1.4.a -β Produce white papers, documentation, and workflow tutorials for DPO curation and FAIR compliance.
1.4.b β Disseminate training materials via community platforms (Galaxy, Nextflow, GH4GH).
π Deliverables:
- D1.1 - Curated DPO data lake + FAIRβness checklist.
- D1.2 - DPO evaluation/sustainability infrastructure.
- D1.3 - FAIR & sustainability guidelines and training for DPO curation [white paper]
- D1.4 - Integrated pipeline for research cross-domain studies (complete analysis, de novo workflow, de novo dataset, de novo data query protocols, novel tools or models)