From Data to Knowledge, Node by Node
By Mathias Brochhausen PhD, Professor & Vice-Chair for Academic Programs & Faculty Development, Department of Biomedical Informatics (DBMI), Professor Department of Medical Humanities and Bioethics, Associate Director for Strategic Collaborations Translational Research Institute (TRI), Director, Clinical and Translational Sciences Program Graduate | University of Arkansas for Medical Sciences (UAMS)
In biomedical and clinical informatics, we are in the middle of a momentous transition that comes with a lot of promise, to address issues that are hindering us from providing better and timelier care for patients. The explosive growth in AI tools, from machine learning algorithms to Large Language Models, and the ever-growing amount of data accessible for biomedical research are raising our hopes that the data driven future will allow us to move the needle forward on many important issues, such as rare diseases, rural and global health, and faster translation from bench to bedside. However, how to integrate the data and the plethora of existing tools at this point is, for many clinical researchers and clinicians, still uncharted territory.
All these technical opportunities together form real opportunities that amount to using KGs to transform data into knowledge.
A key technology that is frequently mentioned as a tool to bridge the gap between AI and medical data to drive discovery is knowledge graphs (KGs) [1]. Knowledge graphs use nodes and edges to represent data, rather than a table or a system of tables as more traditional data management does, linking the data points in each domain of interest together. KGs have shown huge promise on multiple different applications, including Medical Question Answering [2]. Using KGs generates multiple advantages: They facilitate semantic integration of heterogeneous data, provide information about the context to the data elements, and typically enable inference. The graph-based representation itself expands the analytical opportunities by adding graph measures that can provide insightful information. Most importantly, KGs allow users to visually explore data, an option that, with the growing availability of biomedical data generated in different contexts, e.g., clinical care, clinical trial, or basic science research, can only be underestimated. Locally, but also in larger and national clinical research networks, stakeholders frequently need a lot of assistance from data curators to establish which data elements can and need to be used to run queries answering their research questions. This process can be streamlined by allowing the stakeholder to explore the data structure and the available data using a KG. All these technical opportunities together form real opportunities that amount to using KGs to transform data into knowledge.
To realize the full potential of semantic integration, it is advisable to use ontologies to harmonize data and provide shared semantics. Ontologies are superior to traditional biomedical vocabularies and terminologies as they not only provide definitions in a human-interpretable format, but they also allow the computer to access the semantics of the data, and automatically harmonize data from heterogeneous sources, e.g., the EHR and a research database. Even within one database, ontologies yield advantages by allowing to check the consistency of the data gathered and ensuring that the semantics follow the shared understanding of the data. Ontologies and KGs together provide a highly useful representation of data that allows stakeholders to generate new hypotheses and change processes, as has been shown for trauma care [3]. While KGs are typically used to make large amounts of data accessible for large, often multinational communities, in medicine, we are also working with ways to generate local knowledge graphs to support knowledge management on an institutional level. This is particularly relevant where we want to grant clinicians or clinical scientists a full view of the type of data that is available, showing how data elements are interlinked.
One of the potential pitfalls to widespread use of KB in the biomedical domain can be that KB developers often rely on home-grown and purpose-built ontologies to provide the semantics for their KB. This means that some of the core promises of ontologies and the underlying technologies, such as easy integration of data from multiple sources based on the shared semantics, cannot be realized. The better option is to use collaboratively built ontologies that are used by multiple users. Doing so will ultimately facilitate integration and harmonization of large portions of publicly available biomedical data. This strategy also allows you to locally merge publicly available data with clinical data from your institution to build KGs and help with decision making. A great source of ontologies is the Open Biomedical and Biological Ontologies (OBO) Foundry that provides quality checked ontologies built and maintained by active biomedical research communities [4].
All the tremendous opportunities and staggering promise of using KGs in the healthcare arena cannot divert from the fact that their implementation faces a major hurdle: stakeholders and decision makers too often still see traditional table-based data management systems as a safer bet. They are, so to speak, “the devil we know”, while ontology-driven KGs are novel and feel different. While we did see a small shift towards No-SQL solutions in the past, e.g., to data lakes, we must advertise and educate anew. With the huge promise KGs hold in providing multi-level knowledge extraction and their potential, especially in collaboration with AI solutions, KGs are a technology that cannot and should not be ignored.
References
[1] Ding, K., Zhu, Z., Tang, Y., Feng, K., Zhuang, X., Wang, H., Yang, Y., Du, H., Ni, Z., Wang, S. and Fan, X., 2026. Bridging Data and Discovery: A Survey on Knowledge Graphs in AI for Science. National Science Review, p.nwag140.
[2] Rezaei, M.R., Fard, R.S., Parker, J.L., Krishnan, R.G. and Lankarany, M., 2025. Agentic medical knowledge graphs enhance medical question answering: Bridging the gap between llms and evolving medical knowledge. arXiv preprint arXiv:2502.13010.
[3] Chappell, E., Whorton, J., Villafranca, A.A., Shahriari, R., Brakenridge, S.C., Bennett, J.L., Ounpraseuth, S., Ragan, E.D., Hogan, W.R., Bona, J. and Sexton, K.W., 2025, November. Generation of Interactive Knowledge Graphs to Enable Research of the Effects of Trauma Center Organization on Patient Outcomes. In International Knowledge Graph and Semantic Web Conference (pp. 1-8). Cham: Springer Nature Switzerland.
[4] Jackson, R., Matentzoglu, N., Overton, J.A., Vita, R., Balhoff, J.P., Buttigieg, P.L., Carbon, S., Courtot, M., Diehl, A.D., Dooley, D.M. and Duncan, W.D., 2021. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies. Database, 2021, p.baab069.

