Towards a Framework for Interoperability and Reproducibility of Predictive Models

Increased computational power, alongside access to the abundance of electronic health records (EHRs) and other emergent types of data (e.g., omics, imaging, mHealth), are accelerating the influx of data-driven approaches for biomedical applications. Artificial intelligence and machine learning (AI/ML) methods show exciting potential in healthcare, providing novel predictive methods to improve screening, diagnosis, and treatment [1]. As these predictive models are deployed into real-world settings, several government agencies, including in the European Union, Canada, and the United States (US), have started to publish guidance around its use [2]. Within the US, as of October 2022, over 500 algorithms are approved and/or under consideration by the Federal Drug Agency (FDA) for classification as Software as a Medical Device (SaMD) [3], [4], with many more awaiting approval.

Yet the promise of these models and widespread adoption remains unfulfilled, in part given the ongoing challenges and concerns around reproducibility [5]. Predictive model reproducibility is hampered by a lack of standard approaches for the model development and application pipeline; incomplete descriptions in publications (e.g., assumptions, preprocessing steps, etc.); as well as barriers presented by privacy-protected datasets and closed-source codebases. Pointedly, several reviews have shown that the quality of reporting in published articles describing the development or validation of multivariable prediction models in medicine is poor [6], [7]. The use of AI/ML models in a clinical setting requires ensuring a full understanding of its appropriate contextual usage and abilities [8], [9]. In the absence of detailed and transparent reporting of key study details, it is difficult for the scientific and healthcare communities to objectively judge or externally validate the strengths and weaknesses of a predictive model study [10]. Unfortunately, the inability to properly reproduce AI/ML models also prevents proper comparisons across competing models. Collectively, these issues can be attributed to a lack of agreed-upon methods for the development, analysis, and evaluation of predictive models [11]. There is a need for approaches that will facilitate AI/ML reproducibility in healthcare.

We are addressing this need by developing a web-based platform, the PREdictive Model Index and Exchange Repository (PREMIERE), with the goal of creating a collaborative ecosystem for sharing and evaluating predictive models by capturing every element involved in their design, testing, and deployment. Importantly, PREMIERE ensures that predictive models are FAIR (findable, accessible, interoperable, reusable) and reproducible [12]. As a first step in PREMIERE, we developed an Automated Metadata Pipeline (AMP) that facilitates reproducibility assessment of a given predictive model. Based on a review of existing ML checklists, standards [13] and representations of datasets and models (i.e model cards and data cards) [14], we established a comprehensive set of information and metadata requisite for scientific reproducibility of different types of ML models. Leveraging the Predictive Model Markup Language (PMML) as a common data language for ML model representation, we implemented extensions to capture and validate the metadata, automatically assessing the completeness of a provided model for reproducibility. In this paper, we present PREMIERE’s AMP for streamlining the task of logging model metadata, dataset descriptions, evaluation methods, and performance metrics, illustrated through use cases involving different ML models across different scenarios and tasks.

Comments (0)

No login
gif