Work package 1. Data processing object extraction, Curation FAIR Compliance

🎯 Objectivs

Work package 1: Extract, curate, and organise Data Processing Objects (DPOs) from heterogeneous omics and biomedical datasets, ensuring they are reproducible, FAIR-compliant, and ready for cross-domain use. The aim is to generate and annotate DPOs in a containerised infrastructure. The DPOs will be extracted from papers and resources relying on cross-domain experts. The work will validate DPOs for correctness, runtime, and performance while maintaining a curated data lake that meets FAIR principles. The created DPOs will be used later in training the DPO Recommender.

🚧 Tasks

Task 1.1 DPO Extraction:

1.1.a – Generate DPOs from raw and processed biomedical datasets by manuel collection from cross domain experts or automated and semi-automated workflows (e.g., Work package 2).
1.1.b – Map and integrate DPOs across biomedical domains (e.g., genomics, imaging, clinical).
1.1.a – Annotate workflows and datasets with structured metadata (e.g., phenopackets) to standardise representation.
1.1.d – Support development of novel tools, pipelines, or data query protocols for cross-domain research.

Task 1.2 -Curate and validate DPOs

1.2.a – Design the curation guidlene for DPO using cross domain exports.
1.2.b – Validate DPOs for correctness, reproducibility, runtime, and interoperability.
1.2.c – Maintain a curated DPO data lake and FAIR compliance checklist.

Task 1.3 - DPOs hosting infrastructure

1.3.a – Implement a container-based infrastructure where each DPO runs in its own isolated environment for reproducibility.
1.3.b – Integrate automated evaluation and monitoring to capture workflow correctness, runtime, and performance metrics.
1.3.c – Distribute the infrastructure across organisational resources to provide redundancy, enable seamless exchange, and ensure backup and resilience.

Task 1.4 – FAIR & sustainability guidelines and training

1.4.a -– Produce white papers, documentation, and workflow tutorials for DPO curation and FAIR compliance.
1.4.b – Disseminate training materials via community platforms (Galaxy, Nextflow, GH4GH).

🚚 Deliverables:

  • D1.1 - Curated DPO data lake + FAIR’ness checklist.
  • D1.2 - DPO evaluation/sustainability infrastructure.
  • D1.3 - FAIR & sustainability guidelines and training for DPO curation [white paper]
  • D1.4 - Integrated pipeline for research cross-domain studies (complete analysis, de novo workflow, de novo dataset, de novo data query protocols, novel tools or models)