An Exchange of Lessons Learned session brought together representatives from three European-funded projects: ONCOVALUE, EUCanImage, and RadioVal, to share experiences and challenges in developing artificial intelligence (AI) for cancer research.
Organized by CiaoTech, ttopstart, and the University of Barcelona, the session served as a platform to reflect on challenges such as data accessibility, quality, and standardization, alongside the complexities of building trustworthy AI models to advance precision medicine.
- ONCOVALUE aims to harness real-world data from European cancer hospitals to support regulatory decisions on the cost-effectiveness of novel cancer therapies.
- EUCanImage is building a secure, large-scale European cancer imaging platform to unlock AI’s potential in oncology.
- RadioVal aims to use AI and radiomics to predict responses to neoadjuvant chemotherapy in breast cancer patients.
Centralized Repositories vs. Cloud-Based Training
A key discussion centred on strategies for managing clinical data access in AI model development, with Siemens Healthineers, a partner in both ONCOVALUE and EUCanImage, presenting two contrasting approaches: centralized repositories and decentralized cloud-based training. Each method offers distinct advantages and challenges.
In EUCanImage, centralized repositories simplify access and streamline AI training by allowing data to be processed locally at technical partner sites. This approach supports efficient AI development, leverages standardized repository tools, and creates a
homogeneous database for all partners, making it easy to analyse and combine AI results. That said, the centralized system demands a complex legal framework, where each clinical site must sign individual contracts—a process that frequently causes significant delays.
In contrast, ONCOVALUE utilizes remote AI training within a private cloud environment at clinical partner sites, such as Helsinki University Hospital. This approach complies with local regulations, avoids cross-border data-sharing issues, and offers flexibility for customized development environments. Additionally, it facilitates secure annotation with specialized tools and ensures all results remain securely within the private cloud. Nevertheless, this approach comes with challenges: high costs for external training GPUs (Graphics Processing Units) and limited opportunities for cross- partner validation.
Challenges with Data Sharing and Quality
Data-sharing agreements were identified as a major bottleneck. Negotiating Data Transfer Agreements (DTAs) and Data Processing Agreements (DPAs) with individual clinical sites is often time-intensive, given the varying legal and institutional requirements across countries. While frameworks like Joint Controller Agreements (JCAs) provide partial relief by centralizing oversight, site-specific customization remains necessary, underscoring the importance of addressing data-sharing logistics during the project proposal stage.
In addition, data quality proved to be a significant challenge in EUCanImage, emphasizing the need for consistent practices and standards. Key obstacles include inconsistent annotation practices, variations in medical terminologies across languages, and differing levels of digital maturity among clinical partners. Inconsistent classifications or artifacts in imaging data further compromise the effectiveness of AI training. Additionally, the process of curating, organizing, and segmenting data requires substantial technical and administrative support at the local level, adding complexity to project timelines.
To address these challenges, large-scale distributed annotation efforts must rely on clear and comprehensive guidelines to ensure uniformity and accuracy in data labelling. Natural Language Processing (NLP) tools were discussed as solutions for harmonizing clinical data. However, their effectiveness is often constrained by being predominantly trained in English, limiting their applicability in multilingual contexts.
These challenges underline the need for robust tools and standardized practices to harmonize data, enabling reliable AI training and consistent performance across diverse clinical environments.
Opportunities: Applying the FUTURE-AI principles
RadioVal showcased its successful implementation of the FUTURE AI principles: Fair, Universal, Traceable, Usable, Robust, and Explainable, fostering data compatibility across institutions and enabling the development of trustworthy and transparent AI models.
Lessons Learned
The session highlighted challenges in data sharing, AI development, and standardization, offering valuable lessons for future projects. Early legal and technical planning, improved digital infrastructure, and better data standardization were identified as critical areas to address. While each project has taken different approaches, they continue to learn from one another, setting the stage for more effective and efficient AI development in cancer research moving forward.