Delivering the Cancer-free Frontier through Harnessing AI in Real-World Oncology Data


By Aik Choon Tan, Ph.D. Senior Director of Data Science, Jon M. and Karen Huntsman Endowed Chair in Cancer Data Science, Professor of Oncological Sciences and Biomedical Informatics, Huntsman Cancer Institute, University of Utah

The advancement of computing power, the availability of big data, and innovations in computational algorithms have propelled the development and deployment of Artificial Intelligence (AI) in various domains, ranging from healthcare, finance, manufacturing, science, art and beyond. The integration of AI in oncology has revolutionized the way we approach the continuum of cancer research and treatment. One of the key drivers of this transformation is the accessibility and utilization of real-world data (RWD) in oncology. RWD are data relating to patient health status and/or the routine delivery of health care collected from various sources outside of controlled clinical trials. In oncology, examples of RWD include electronic health records (EHRs), medical claims data, imaging data, molecular testing data, cancer registries, patient reporting outcomes (PROs), data gathered from wearables and digital health technologies, and other data collected from non-clinical trial settings. Unlike data collected in controlled clinical trials, RWD provides insights into diverse patient demographics, treatment responses, and long-term outcomes in real-world settings. This richness and diversity of RWD present a valuable opportunity for AI to uncover patterns and correlations that may not be apparent in traditional clinical settings.

AI Applications in Analyzing Real-World Oncology Data

The following are some examples of AI applications in analyzing RWD in oncology:

  1. Early detection and diagnosis of cancer: AI algorithms have been commonly evaluated and used in assisting clinicians and pathologists to determine cancer diagnosis from imaging data. For example, AI-based software can assist pathologists in identifying suspicious biopsies for cancer from digital slide images. Similarly, several retrospective and prospective studies have demonstrated the added values of AI in assisting clinicians to improve mammography screening accuracy and reduce screen-reading workload. The integration of AI in early detection workflow improves cancer prognosis and treatment outcomes.
  1. Precision oncology: With the routine adoption of advanced technologies for acquiring high-resolution genomics and imaging data in clinics, AI techniques have been utilized to integrate these multi-modalities datasets. This integration aims to identify molecular biomarkers linked to specific types of cancer. Consequently, oncologists can personalize treatments for patients based on these biomarkers, which predict how individuals may respond to therapies.
  1. Clinical Trial Design and Patient Matching: One of the challenges of recruiting patients in clinical trials is to perform eligibility screening. This is a laborious and error-prone process that requires significant time and resources. However, with the integration of AI, such as large-language models (LLMs), into EHR, this process could be streamlined and optimized to match patients to the right trials. For example, by learning the inclusion and exclusion of a clinical trial, AI can easily identify suitable patient cohorts based on trial eligibility, estimate the recruitment trial, and potentially predict trial outcomes based on historical RWD. In summary, integrating AI into EHR systems speeds up the identification of eligible patients, thereby improving the efficiency and completion rates of clinical trials.
  1. Predictive analytics: Machine learning and AI models trained on RWD can be used to predict cancer recurrence risk, disease progression, responses to treatment as well as patient survival outcomes. These models are usually more generalizable to independent cohorts as compared to restricted clinical trials data. These accurate and interpretable models using RWD features could help improve clinical decision-making in precision medicine by maximizing patient benefits from therapies.
  1. Treatment Optimization: With access to RWD, AI could be deployed to analyze optimal treatment regimens tailored to individual patient demographics, cancer types, mutational profiles, comorbidities, and treatment histories. Furthermore, AI can be used to detect toxicity signals and adverse events from RWD, which could be used to enhance treatment efficacy and reduce drug adverse events.

The synergy between AI and RWD in oncology holds promise for advancing cancer research and treatment.

Challenges and Potential Solutions to Adapting AI in Oncology

While the integration of AI and RWD holds immense promise, it presents several challenges and limitations that need to be addressed for effective implementation whilst considering ethical practice. Here are some challenges and potential solutions to adapting AI in oncology to analyze RWD.

  1. Accessing Real-World Data.
    • Challenge: RWD often exists as silos and resides in fragmented sources. Accessing AI/ML-ready datasets from RWD poses technical and logistical challenges.
    • Solutions:
      • Data Sharing Agreements: Establish agreements and partnerships between healthcare providers, research institutions, consortiums and data vendors to facilitate secure data sharing.
      • Data Governance Frameworks: Establish governance frameworks and policies that define data ownership, access rights, and responsibilities of stakeholders involved in RWD.
      • Interoperability Standards: Adhere to standardized formats and interoperability standards (e.g., HL7 FHIR) to enable seamless data exchange and integration for the AI platform.
      • Data Quality and Standardization: As RWD were collected from diverse data sources, it is crucial to implement standardized data collection protocols, quality assurance measures and data commons. Developing AI algorithms that can handle missing or incomplete data through techniques like imputation or robust statistical methods could assist model training.
      • Data Access Platforms: Develop centralized platforms or data repositories with secure APIs that allow authorized users to access and analyze RWD across multiple sources. This will provide security and comply with regulatory requirements for accessing and analyzing RWD.
  1. Regulatory and Privacy Compliance.
    • Challenge: Strong privacy and security laws (e.g., HIPAA in the USA, GDPR in the EU) govern the collection, storage, and use of personal health information (PHI) and hinder access to RWD for AI/ML. These strict regulations require robust security measures and patient consent for data access. Data breaches or unauthorized access can lead to legal consequences, erosion of patient trust, and ethical concerns.
    • Solutions:
      • Data Anonymization: Implement techniques such as anonymization and de-identification to protect patient privacy while allowing data access for research and AI applications.
      • Ethical Governance: Establish clear rules and ethical guidelines for accessing and using EHR data, in line with regulations and ethical norms. Ensure transparency in AI-based decision-making and set up protocols for getting patient consent and managing sensitive data. Work with regulatory bodies to simplify compliance and uphold ethical standards.
      • Patient Consent Management: Develop secure consent management systems that track and manage patient consent preferences regarding data sharing and use.
      • Data Encryption and the use of Distributed Ledger Technology (DLT): Enhance data encryption protocols and access controls to safeguard patient information in cloud platforms. DLT, such as blockchain, could be used as a secure way to provide trustworthy and transparent EHR data sharing.
  1. Technical Infrastructure and Capabilities:
    • Challenge: Accessing and processing large-scale RWD for AI applications requires robust technical infrastructure, including scalable computing resources and advanced analytics platforms.
    • Solutions:
      • Cloud Computing: Leverage cloud-based solutions to provide scalable storage and computing resources for processing and analyzing RWD, as well as deploying AI algorithms.
      • Big Data Technologies: Utilize big data technologies (e.g., distributed computing frameworks, data lakes) to manage and analyze large volumes of RWD efficiently.
      • AI-Driven Data Integration: Develop AI-driven approaches for data integration and preprocessing to automate data harmonization and improve data accessibility for AI applications.
  1. Bias and Fairness:
    • Challenge: AI algorithms trained on biased or incomplete datasets can increase disparities in healthcare outcomes. Bias may stem from demographic imbalances in data, historical treatment practices, or incomplete data representation of the community. Therefore, biased AI models can lead to inequitable treatment recommendations, exacerbating healthcare disparities across different patient populations and making them less generalizable to real-world populations.
    • Solutions: Conduct thorough bias assessments and audits of AI algorithms using diverse and representative RWD datasets. Implement algorithmic transparency and explainability frameworks to understand how AI models make decisions. Regularly update and retrain AI models to mitigate biases using diverse datasets.
  1. Clinical Validation and Interpretability:
    • Challenge: AI models developed using RWD must undergo rigorous validation to ensure clinical relevance, reliability, and generalizability. Additionally, interpreting AI-generated insights and integrating them into clinical workflows require healthcare provider training and adaptation.
    • Solutions: Validate AI models through rigorous clinical trials and studies that demonstrate their reliability, accuracy, interpretability and clinical relevance. Develop user-friendly interfaces and decision support tools that healthcare professionals can easily interpret and integrate into clinical workflows.
  1. Cost and Resource Allocation:
    • Challenge: Implementing AI solutions in healthcare settings requires significant financial investment in technology, infrastructure, training, personnel and ongoing maintenance. Cost considerations may limit access to AI-driven healthcare advancements. The lack of appropriate resource allocation presents a challenge in harnessing AI in oncology. Furthermore, unequal access to AI technologies can widen healthcare disparities, disadvantaging patients and healthcare providers in resource-constrained settings.
    • Solutions: Foster public-private partnerships and collaborations to share resources and infrastructure costs associated with AI implementation. Explore innovative funding models, such as grants, subsidies, or reimbursement incentives, to support healthcare institutions adopting AI technologies. Promote scalability and cost-effectiveness in AI solutions through cloud computing and shared services (e.g., NIH STRIDES initiative).

Future Directions

The synergy between AI and RWD in oncology holds promise for advancing cancer research and treatment. Overcoming challenges requires collaborative efforts among stakeholders to improve data accessibility, enhance transparency, refine AI algorithms, and prioritize patient-centric outcomes. The future of AI in oncology from RWD represents a transformative shift toward achieving the ambitious goals of the Cancer Moonshot initiative: reducing the cancer death rate by at least 50% in 2047 and improving patient experiences. Huntsman Cancer Institute (HCI) at the University of Utah exemplifies this integration, leveraging AI across various workflows to deliver the cancer-free frontier!


Author’s Bio:
Aik Choon Tan received his B.Eng. degree in Chemical/Bio-process Engineering from the University of Technology Malaysia, and his Ph.D. degree in Computer Science/Bioinformatics from University of Glasgow, UK, in 2000 and 2005, respectively. Dr. Tan conducted his post-doctoral research training at the Johns Hopkins University School of Medicine from 2004 to 2009. He was an Assistant Professor at the University of Colorado Anschutz Medical Campus in 2009 and promoted to Associate Professor in 2013. Dr. Tan was recruited to the Moffitt Cancer Center in 2019 as the Vice-Chair of the Department of Biostatistics and Bioinformatics. In 2022, Dr. Tan was appointed as the inaugural Senior Director of Data Science at the Huntsman Cancer Institute, University of Utah. He holds the Jon M. and Karen Huntsman Endowed Chair in Cancer Data Science, Professor of Oncological Sciences and Biomedical Informatics. His research interests are translational bioinformatics and cancer systems biology, primarily by developing computational and machine learning methods for the analysis and integration of high-throughput cancer “omics” data in understanding and overcoming treatment resistance mechanisms in cancer. His lab acts as “connector” to provide seamless integration of computational and statistical methods in experimental and clinical cancer research.