The Innovative Health Initiative (IHI), the public-private partnership (PPP) between the European Union and the European life science industries, has launched a new project to harness the potential of synthetic data for research.
The project, Synthetic Data Generation Framework for Integrated Validation of Use Cases and AI Healthcare Applications (SYNTHIA), commenced with a kick-off meeting in Spain last month.
Synthetic data offers a potential solution to several issues in health research, including a lack of real, high-quality datasets that can be used in research and concerns about patient privacy. However, questions remain regarding the quality of synthetic datasets, and the best synthetic data generation methods to use in different situations.
The aim of new IHI project SYNTHIA is to deliver validated, reliable tools and methods for synthetic data generation (SDG). The tools will cover multiple data types including lab results, clinical notes, genomics, imaging and m-health data. SYNTHIA also hopes to make possible the generation of longitudinal data.
The project will address six diseases: lung cancer, breast cancer, multiple myeloma, diffuse large B-cell lymphoma, Alzheimer’s disease, and type 2 diabetes.
SYNTHIA academic lead, Guillermo Sanz of La Fe Health Research Institute (IIS La Fe) in Spain, said that in the era of precision medicine, with drugs targeting specific gene mutations, new tools are required to deal with patients’ data privacy.
“Whole genome sequencing, digital imaging, and electronic health record data are the ID of any individual person,” Dr Sanz said. “All of them are required to be able to provide the patient with the best available treatment. Nevertheless, personal data privacy is a must.
“Generation of efficient synthetic databases by using artificial intelligence is the unique way to pursue the goals of maintaining data privacy while offering the tools to advance in precision medicine. The SYNTHIA project, a new pioneering public-private partnership, is the first IHI synthetic data project to deal with this urgent need.”
The project outputs will be made available to the research community through a dedicated online platform. In addition to synthetic data generation workflows that can be used in different situations, the platform will include assessment frameworks to help users evaluate the synthetic data generated for privacy, quality, and applicability.
The platform will also boast a repository of high-quality synthetic data sets, each of which will be labelled with its suitability for specific applications.
SYNTHIA industry lead Gopal Avinash, vice president for AI smart devices at GE Healthcare, said synthetic data had huge potential to enhance research and product development in healthcare by augmenting available data.
“Along with GE HealthCare’s AI strategy, synthetic data can help mitigate bias and drifts in algorithms and reduce privacy risks,” Dr Avinash said. “Synthetic data can also help speed up the development of robust and generalisable AI models in the healthcare industry.”