Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

Eun Joo Kim; Seong Kwang Kim

doi:10.7475/kjan.2025.0327

Articles

Page Path

Invited Article

Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

Eun Joo Kim¹

, Seong Kwang Kim²

Korean Journal of Adult Nursing 2025;37(3):189-214.
Published online: August 29, 2025

DOI: https://doi.org/10.7475/kjan.2025.0327

¹Associate Professor, Department of Nursing, Gangneung-Wonju National University, Wonju, Korea

²Ph.D. Student, Department of Nursing, Gangneung-Wonju National University, Wonju, Korea

Corresponding author: Seong Kwang Kim Department of Nursing, Gangneung-Wonju National University, 150 Namwon-ro, Heungop-myeon, Wonju 26403, Korea. Tel: +82-33-760-8650 Fax: +82-33-760-8641 E-mail: ksk1677@naver.com

• Received: March 27, 2025 • Revised: May 22, 2025 • Accepted: July 30, 2025

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

7,376 Views
101 Download

Full Article

Download PDF

Abstract
INTRODUCTION
METHODS
RESULTS
DISCUSSION
CONCLUSION
SUPPLEMENTARY MATERIAL
REFERENCES
Appendices

Abstract

Purpose
This study analyzed the methodological characteristics of machine learning (ML) applications in nursing research, evaluated their reporting quality against standardized guidelines, and assessed progress toward clinical implementation.
Methods
A PRISMA-compliant systematic review (PROSPERO CRD42024595877) searched nine English- and Korean-language databases through September 27, 2024. Included studies applied ML to a nursing question and had at least one nursing-affiliated author. Two reviewers independently extracted data following the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework. Reporting quality was appraised using the TRIPOD+AI checklist.
Results
Of 125 included studies, supervised learning predominated (93.6%), with random forest, logistic regression, and support vector machines as common algorithms. The most frequent performance metrics were the area under the receiver operating curve and accuracy. Mean TRIPOD+AI compliance was 50.4% (standard deviation=9.37), with reporting quality lowest for data preparation (48.0%) and class imbalance handling (22.4%). Research focused on predicting pressure injuries, falls, and readmissions. Only seven studies described clinical deployment, often citing ethical or workflow barriers.
Conclusion
While ML studies in nursing are increasing and show strong discriminatory accuracy, their impact is limited by inconsistent reporting, limited external validation, and rare clinical deployment. Translating these algorithms into practice requires adopting comprehensive reporting guidelines like TRIPOD+AI, documenting each CRISP-DM phase, and integrating nurse-centered decision-support pathways.
Key Words: Nursing; Machine learning; Systematic review

INTRODUCTION

1. Background

Artificial intelligence (AI) encompasses technologies that enable computers and machines to mimic human capabilities such as learning, understanding, problem-solving, decision-making, creativity, and autonomy [1]. AI is recognized as a defining technology of the Fourth Industrial Revolution [2], with the field continually achieving remarkable milestones—from Google DeepMind’s AlphaGo in 2016 to OpenAI’s ChatGPT in 2022 [3].

Machine learning (ML), a branch of AI, focuses on training algorithms to develop predictive or classification models based on data, allowing for learning and inference without explicit programming [1]. Enhanced computing power and the availability of large-scale datasets have fueled the rapid advancement and widespread adoption of ML across diverse sectors, including healthcare, finance, manufacturing, and transportation [4]. Due to its exceptional predictive capabilities, interdisciplinary research involving ML has expanded significantly, establishing it as a key technology for tackling contemporary challenges [4,5].

ML applications in nursing research are also becoming more prevalent [6]. For example, prior studies have predicted nursing students’ graduation likelihood using academic performance in their first year, achieving over 80% accuracy, which increased to as high as 99% with three-year longitudinal data, thus enabling automated, personalized assessments for students at risk of attrition [7]. ML has also been used to predict the 30-day readmission probability for heart failure patients based on multiple clinical variables [8], and to forecast nurse turnover rates using personnel data [9]. In addition, automated extraction and analysis of nursing documentation have enhanced administrative decision-making, improving both speed and accuracy [10]. Successful ML implementation allows nurses to dedicate more time to direct patient care, thereby raising the overall quality of nursing services [11].

Nevertheless, several limitations remain regarding the application of ML in nursing research. When predicting rare events—such as specific disease occurrences or serious medical conditions—datasets are often imbalanced, containing significantly fewer event cases than non-events. Such an imbalance can undermine a model’s ability to accurately detect rare cases, compromising both predictive performance and generalizability [12]. Moreover, heterogeneity in data collection methods—including varying formats, measurement techniques, and terminologies—leads to non-standardized datasets [12]. This heterogeneity complicates data preprocessing, diminishes model consistency and reliability [9], and is further aggravated by the challenges of data collection and limited participant recruitment commonly encountered in nursing studies, often resulting in small sample sizes [9]. Collectively, these factors adversely affect the performance and clinical relevance of ML models.

These constraints may ultimately impair ML models’ performance and clinical validity. To harness the potential of ML in nursing, systematic analyses of current research are essential. Although the use of ML in nursing studies is expanding, comparison across studies remains difficult due to non-standardized evaluation criteria, inconsistent data handling methods, and variable reporting practices. Furthermore, assessments of practical applicability—including clinical utility, cost-effectiveness, and patient safety—are needed. Accordingly, this systematic review provides a comprehensive evaluation of the current landscape of ML in nursing research. Specifically, this study aims to (1) systematically identify and describe methodological characteristics using the Cross-Industry Standard Process for Data Mining (CRISP-DM) framework; (2) critically assess methodological rigor and reporting transparency according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis for Artificial Intelligence (TRIPOD+AI) checklist; and (3) evaluate practical deployment and clinical readiness. Through this multi-dimensional analysis, we aim to identify key gaps and offer evidence-based recommendations to guide future research toward improved reproducibility, standardization, and, ultimately, clinical impact.

2. Conceptual Framework

In this systematic review, the CRISP-DM methodology [13] was adopted as the conceptual framework for systematically evaluating studies employing ML techniques.

Earlier standardized data analysis methodologies, such as Knowledge Discovery in Databases (KDD) and Sample, Explore, Modify, Model, Assess (SEMMA), also include data preprocessing stages. However, these frameworks tend to focus primarily on analytical techniques rather than comprehensively guiding the entire analysis process [14]. In contrast, CRISP-DM offers a structured workflow with explicit guidance for each phase, ensuring alignment with broader business objectives. A key feature of CRISP-DM is its iterative feedback loop, which allows movement between phases as needed. For instance, if problems are identified during data preparation, the process can return to the business understanding phase to revise objectives accordingly. This flexibility sets CRISP-DM apart from more linear models like KDD and SEMMA, making it especially suitable for practice-oriented data analysis and clinical applications in nursing research.

CRISP-DM is a standardized process model for data mining projects, encompassing six phases: (1) business understanding, (2) data understanding, (3) data preparation, (4) modeling, (5) evaluation, and (6) deployment. In this review, study procedures were structured based on the CRISP-DM methodology, as outlined in Table 1. This conceptual framework enabled a systematic evaluation of nursing literature involving ML techniques.

METHODS

The protocol for this study was registered with PROSPERO (registration number: CRD42024595877) on February 10, 2024.

1. Study Design

This study is a systematic review aimed at identifying and critically appraising the methodological characteristics of ML applications in nursing research. The review was conducted according to the guidelines of the Cochrane Handbook for Systematic Reviews of Interventions 6.4 [15] and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [16].

2. Key Questions and Eligibility Criteria

To achieve the study’s overall objective of evaluating the methodological landscape and clinical readiness of nursing ML research, the following key questions were addressed. These were structured to align with the CRISP-DM framework and our central themes of reporting standards and practical application:

In the business understanding phase (phase 1), we examine the research objectives and questions of nursing studies applying ML techniques, and the reasons for applying ML in these studies. In the data understanding phase (phase 2), we investigate the data sources and data quality of nursing studies applying ML techniques. The data preparation phase (phase 3) examines the data preprocessing methods used in nursing studies applying ML techniques and the transparency with which they are reported to ensure reproducibility. In the modeling phase (phase 4), we assess how model training, validation, and testing are performed in nursing studies applying ML techniques. The evaluation phase (phase 5) considers the performance evaluation metrics used in nursing studies applying ML techniques and examines the extent to which they reflect clinical utility beyond algorithmic accuracy. The deployment phase (phase 6) addresses whether ethical considerations have been addressed and the degree to which these models have been practically deployed in clinical settings, bridging the gap from research to practice.

Inclusion criteria were as follows: (1) studies involving nursing research, operationalized as having at least one author affiliated with a nursing-related institution (e.g., nursing school or research center), chosen as an objective and reproducible proxy for identifying research likely to be informed by a nursing perspective; and (2) direct application of ML methodologies (e.g., predictive or classification models, including single-layer Artificial Neural Networks [ANNs]). During data extraction, studies were screened to ensure that only those employing ML techniques were included. Deep learning (DL) studies were identified and excluded based on the use of multi-layer neural networks. Studies were excluded if they: (1) lacked authors with a nursing background or affiliation; (2) did not directly apply ML as a methodology (e.g., studies evaluating ML-based wearable devices without directly applying ML); (3) were not human-subject studies (e.g., animal or plant experiments, or robot development); (4) were review articles; (5) lacked full research results (e.g., abstracts or poster presentations only); or (6) used DL methodologies (e.g., multi-layer neural networks). The exclusion of studies using DL was intentional to ensure methodological coherence. Traditional ML (e.g., random forest, support vector machine [SVM], logistic regression) and DL (e.g., models with multi-layer neural networks such as Convolutional Neutral Networks [CNNs] or Recurrent Neural Networks [RNNs]) often differ significantly regarding data types, feature engineering, and model complexity. Combining these distinct paradigms would have introduced substantial heterogeneity, potentially obscuring trends specific to each. Thus, this review focuses on traditional ML techniques, which represent the foundational and most prevalent approach in the nursing literature to date.

3. Literature Search and Selection Process

Two researchers independently conducted the literature search from September 27 to September 28, 2024, including all literature published up to September 27, 2024. To effectively identify nursing-related literature, electronic databases were selected using the Core, Standard, Ideal (COSI) model proposed by the National Library of Medicine (NLM) [17]. Korean databases included KoreaMed, Kmbase, KISS, NDSL, and KISTI, while international databases comprised Cochrane CENTRAL, MEDLINE, and Embase. Additional sources included PubMed (provided by the NLM), specialized databases such as CINAHL and PsycINFO, and broad academic databases like Scopus and Web of Science.

Advanced searches were conducted using the Participant, Intervention, Comparison, Outcome, Time, Setting, and Study Design (PICOTS-SD) framework. The core search strategy combined terms related to participants (using keywords such as “nurs*” to capture variations of nurse/nursing) with terms representing study design focused on ML methodologies. Specifically, searches included “machine learning” together with techniques such as “classif*,” “regress*,” “predict*,” “forecast*,” “cluster*,” “dimensionality reduction,” “reinforcement learning,” or “policy learning” (using OR logic within this group and AND logic to combine with “nurs*”). The final search string was: ((Nurs*) AND ((Machine Learning) AND (Classif* OR Regress* OR Predict* OR Forecast* OR Cluster* OR "Dimensionality Reduction" OR "Reinforcement Learning" OR "Policy Learning"))).

Keywords were searched within Titles and Abstracts; when simultaneous searching in both fields was not possible, Abstract searches were prioritized. The goal was a comprehensive and systematic review of the extensive nursing-related literature. To ensure inclusivity and capture a broad range of relevant terms, truncation with the asterisk (*) was intentionally applied. For example, searching for “Nurs*” retrieved terms such as “nurses,” “nursing,” “nursing student,” and “nurse aide.” Controlled vocabularies like MeSH (for PubMed) or Emtree (for Embase) were not used. With the exception of database-specific adjustments to the Title/Abstract field, the same search string was otherwise used unchanged across all databases (Supplementary Table 1).

The initial database search yielded 540 articles from PubMed, 611 from Embase, 438 from MEDLINE, 72 from the Cochrane Library, 181 from CINAHL, 52 from PsycINFO, 548 from Web of Science, 1,199 from Scopus, and 12 from ScienceON, totaling 3,653 records. After removing 2,102 duplicates using EndNote 21, 1,551 records remained for screening. Titles and abstracts were reviewed, resulting in the exclusion of 1,285 records that were clearly unrelated to the research topic. Because detailed methodological information is often unavailable at this stage, a conservative approach was taken: records were excluded only when no relevance was evident, while ambiguous or potentially related items were retained for full-text screening. Subsequently, 266 full-text articles were assessed against the exclusion criteria, with 114 meeting the inclusion criteria. An additional 11 studies were identified by screening the reference lists of related systematic reviews. Ultimately, 125 studies were included in the final synthesis (Figure 1). The full list of the 125 included studies is provided in Appendix 1, cited in-text with an “A” prefix (e.g., [A1]). The 152 studies excluded at the full-text screening stage, along with reasons for exclusion, are listed in Appendix 2 and are cited with an “E” prefix (e.g., [E1]) when referenced.

4. Quality Assessment of Included Studies

The quality of the included studies was assessed using the TRIPOD+AI checklist [18]. TRIPOD+AI is an extension of the original TRIPOD 2015 guidelines, providing a comprehensive 27-item assessment tool specifically designed to ensure transparent reporting in ML-based predictive model studies. This checklist is optimized for evaluating both methodological rigor and the reliability of result interpretation in medical AI research. TRIPOD+AI systematically evaluates key domains, including: standardized reporting of model development and validation processes (e.g., data sources [item 5a], justification of sample size [item 10], and handling of missing data [item 11]), thereby emphasizing methodological transparency; AI-specific methodological considerations such as hyperparameter tuning (item 12c), class imbalance handling (item 13), and algorithmic fairness evaluation (item 14); and clinical applicability, including interpretation of model outcomes (item 15) and user interaction requirements in real-world clinical environments (item 27b), thus facilitating evaluation of nursing utility. Two independent reviewers (reviewer A and reviewer B), both trained in the TRIPOD+AI guidelines, assessed each included study. Discrepancies were discussed face-to-face; if consensus was not reached, a third reviewer (reviewer C) adjudicated.

5. Ethical Considerations

This study is a secondary data analysis of previously published literature, conducted using systematic review methodology. Ethical review approval was requested from the Institutional Review Board (IRB) of the researchers’ affiliated institution (GWNUIRB-R2024-65), which confirmed that this research does not involve human subjects and is therefore exempt from further ethical review.

6. Data Analysis

The selected final articles were descriptively summarized in a case report format using Excel 2016 (Microsoft, Redmond, WA, USA). The case report template consisted of 29 items structured according to the phases of the CRISP-DM methodology, as shown in Table 2. To improve coherence, the RESULTS section is organized in a two-tier hierarchy: (1) Methodological Characteristics of ML Studies, sub-organized by the CRISP-DM stages, and (2) Reported Outcomes and Practical Impact.

RESULTS

1. Methodological Characteristics of ML Studies

1) Quality assessment of included studies

The methodological quality of the 125 included studies was evaluated using the 27-item TRIPOD+AI checklist, which assesses transparent reporting in ML-based predictive model research. The average compliance rate was 50.4% (standard deviation=9.37), with a range from 22.9% to 79.2%. Most studies consistently reported on model development or validation (98.4%), data sources (96.8%), and outcome definitions (98.4%). However, compliance was notably lower for data preparation (48.0%), class imbalance handling (22.4%), and algorithmic fairness (2.4%). No studies fully adhered to abstract reporting guidelines or shared their protocols publicly. These reporting gaps highlight persistent challenges in achieving transparency and reproducibility, particularly in areas such as preprocessing and fairness, which are critical issues for nursing-related applications. Only seven studies reported real-world deployment, further underscoring the limited clinical translation of ML models in nursing research. Notably, the lowest quality scores were observed in the studies by Choi et al. [A8] and Chavan et al. [A92], both with a score of 22.9%. In contrast, the highest quality score was achieved by Shao et al. [A106], at 79.2%.

2) General characteristics of the included studies

An analysis of country of origin showed that the United States produced the largest share of studies (n=49, 30.8%), followed by China (n=23, 14.5%) and South Korea (n=17, 10.7%).

Examining the publication years from 2006 to 2024, only a few studies appeared in the early years: 2 (1.6%) in 2006, and just 1 each (0.8%) in 2007 and 2008. A significant increase occurred beginning in 2020, with 11 studies (8.8%) published in 2020, 20 (16.0%) in 2021, 18 (14.4%) in 2022, and 24 (19.2%) in 2023. The highest number, 26 studies (20.8%), was published in 2024.

Analysis of research team composition revealed that studies with 0%–25% nurse involvement were least common (n=16, 12.8%), whereas those with 75%–100% nurse involvement were most frequent (n=42, 33.6%), indicating that nurses comprised at least 25% of the team in the majority of studies.

Among the 125 studies, supervised learning methods were most prevalent (n=117, 93.6%), while unsupervised learning was rare (n=5, 4.0%), and a mixed approach appeared in three studies (2.4%).

In terms of algorithm types, classification algorithms dominated (n=101, 80.8%), followed by regression (n=14, 11.2%) and clustering algorithms (n=7, 5.6%). The most common research objective was pressure injury/ulcer prediction (n=24, 19.2%), followed by readmission- or utilization-related outcomes (n=17, 13.6%), and fall-risk prediction (n=10, 8.0%). The remaining studies (n=74, 59.2%) covered a broad range of topics, including mental health screening, infection detection, workload assessment, and violence prevention.

With respect to journal distribution, CIN: Computers, Informatics, Nursing was the most frequently represented journal (n=7, 5.6%), followed by Applied Clinical Informatics, International Journal of Environmental Research and Public Health, and Journal of the American Medical Informatics Association, each with four studies (3.2%). Additionally, BMC Medical Informatics and Decision Making, International Journal of Medical Informatics, Journal of Advanced Nursing, Journal of Nursing Management, and Nursing Research each published three studies (2.4%) (Table 3).

3) Summary of results according to phase 1: business understanding

Analysis of the 125 included studies identified pressure ulcers, falls, and hospital readmissions as major research foci. Pressure ulcer studies concentrated on early prediction of risk among hospitalized and postoperative patients, aiming to improve nursing quality and patient safety through targeted prevention strategies. Many emphasized the use of ML-based predictive models to identify high-risk patients and enable preventive nursing interventions. Studies related to falls focused on predicting fall risk in both hospitalized patients and nursing home residents, using ML to analyze clinical records, identify key risk factors, and enhance intervention strategies. Research on readmission prediction and management prioritized early identification of high-risk patients to improve management and reduce healthcare costs through timely intervention. These studies often targeted populations such as patients with diabetes, pediatric patients, and individuals requiring post-acute care, developing predictive models based on ML techniques.

Regarding data mining analysis frameworks, the majority of studies (n=100) did not specify an analytic framework (“NI”—no information). Of those that did, CRISP-DM was most commonly mentioned (n=3), followed by KDD (n=2). Other frameworks cited once each included Data, Information, Knowledge, Wisdom (DIKW), the Ahituv Information Flow Model (Ahituv IFM), Plan-Do-Study-Act (PDSA), and the Healthcare Process Modeling to Phenotype Clinician Behaviors Framework (HPM-ExpertSignals). Supplementary Table 2 provides detailed descriptions of the objectives and data mining frameworks employed by all included studies.

4) Summary of results according to phase 2: data understanding

Analysis of the 125 included studies showed that R, Python, and SPSS were the most commonly used tools for data analysis. Additional software packages reported in some studies included SAS (SAS Institute, Cary, NC, USA), Weka (University of Waikato, Hamilton, New Zealand), MATLAB (MathWorks, Natick, MA, USA), MeCab (Nara Institute of Science and Technology, Ikoma, Japan), JMP Pro (SAS Institute), and Modeller (University of California, San Francisco, CA, USA). However, 21 studies did not explicitly specify which software was used. Regarding data sources, electronic medical records and electronic health records (EHRs) were the primary datasets. Other major sources included survey data, student academic records, nursing documentation, and hospital administrative data. Studies utilizing image or audio data were relatively rare.

Exploratory data analysis (EDA) primarily included descriptive statistical methods such as frequency analysis, percentages, mean, and standard deviation. Other methods, such as minimum and maximum values, interquartile range, and data ranges, were less frequently employed. Some studies included natural language processing analyses utilizing text mining, while clustering and dimensionality reduction analyses were occasionally employed. Regarding study populations, patient-centered research was most common, followed by studies targeting nurses, nursing home residents, and nursing students. Studies involving hospital administrators and older adults residing in the community were also present. The study with the largest sample analyzed data from approximately 3.6 million patients, followed by another study analyzing around 1.93 million patient episodes. Another large-scale study included about 1.53 million patients.

In contrast, the smallest study involved around 1,300 qualitative data points collected from 43 patients. Supplementary Table 3 provides detailed information on the software, data sources and types, EDA methods, study populations, and sample sizes for each included study.

5) Summary of results according to phase 3: data preparation

Analysis of data preprocessing techniques across the 125 included studies showed that handling missing data was the most frequently employed approach. Methods for managing missing values included simple or multiple imputation, K-nearest neighbors imputation, omission of missing cases, and various recoding strategies. Standardization and encoding were the next most common preprocessing techniques. Standardization methods included Z-score standardization, scaling, and zero-centering, while encoding approaches comprised label encoding, one-hot encoding, use of dummy variables, and binary recoding. Feature selection methods included variable selection, identification of key predictors, the least absolute shrinkage and selection operator, and recursive feature elimination with cross-validation. Normalization techniques such as min-max normalization and other data normalization strategies were also described.

The majority of studies used fewer than 50 predictor variables, although studies employing text mining techniques (e.g., term frequency-inverse document frequency or Word2Vec), imaging, or sensor data often involved high-dimensional datasets with more than 100 predictors, sometimes ranging from 800,000 to 1,000,000 dimensions. In certain clinical and nursing studies, the number of independent variables varied depending on the feature engineering process and the study stage.

The most common outcome or target variables were related to pressure ulcers, such as pressure injury risk, occurrence, and hospital-acquired pressure injury. Studies on falls often analyzed fall occurrence and severity or type, from binary outcomes (fall/no fall) to more nuanced classifications. Readmission and healthcare utilization studies targeted outcomes such as readmission within defined timeframes (e.g., 30-day, 90-day), emergency department visits, hospital length of stay, and frequency of hospital visits. Additional studies focused on predicting infections (e.g., sepsis, urinary tract infections), mental and psychological states (depression, anxiety, burnout, suicide risk), and mortality.

For data partitioning, simple proportional splits (such as 70% training/30% testing or 80% training/20% testing) were common. Cross-validation methods, including 10-fold, 5-fold, 3-fold, and leave-one-out cross-validation, were also frequently applied. Supplementary Table 4 provides detailed information on data preprocessing methods, predictor and outcome variables, and data partitioning strategies used in the 125 included studies.

6) Summary of results according to phase 4: modeling

Analysis of the 125 included studies revealed that random forest was the most frequently employed ML algorithm, followed by logistic regression (including several modified forms), SVM (including variants), decision trees (including variants), and eXtreme Gradient Boosting (XGBoost). Other algorithms reported included CatBoost, Gradient Boosting Machine (GBM), LightGBM, Elastic Net, stochastic gradient descent, and Bayesian networks. Regarding hyperparameter optimization and parameter settings, most studies either used default parameters or did not report any details (“NI”). This indicates a general lack of explicit reporting or limited use of advanced hyperparameter tuning. Among studies that described tuning approaches, grid search was the most commonly used method. Supplementary Table 5 contains detailed descriptions of the ML models and hyperparameter optimization methods used in each study.

2. Reported Outcomes and Practical Impact

1) Summary of results according to phase 5: evaluation

Analysis of performance metrics and ML model usage across the 125 studies showed that the most frequently reported metric was the area under the receiver operating characteristic curve (AUC-ROC), cited in 68 studies. Accuracy was reported in 64 studies, while sensitivity and specificity appeared in 52 and 47 studies, respectively. Other reported metrics included F1-score (n=26), precision (n=25), positive predictive value (n=20), recall (n=19), and negative predictive value (n=18).

Random forest was most often identified by individual studies as the highest-performing algorithm in their comparisons (35 studies), either as the exclusive model or within comparative analyses. XGBoost was specifically reported as the top performer in 11 studies, either as “XGBoost” or “Only one used (XGBoost),” followed by GBM (n=8), logistic regression (n=15), and SVM-based methods (n=7). Additional algorithms such as M5P tree, ANNs, and Bayesian networks were also used in several studies.

For feature importance, 42 studies provided no information (“NI”) or did not report explicit importance analyses. “Feature importance” was specifically reported in 19 studies, with Shapley additive explanations (SHAP) (n=10), information gain (n=6), permutation importance, recursive feature elimination, and Gini impurity each appearing multiple times. Other methods occasionally used included logistic regression coefficients, mutual information, Markov blanket analysis, and normalized importance. Supplementary Table 6 gives detailed accounts of performance metrics, best-performing models, and feature importance techniques for each included study.

2) Summary of results according to phase 6: deployment

Among the 125 studies analyzed, hospital settings were the most frequently represented research environments, followed by home healthcare, nursing homes, community-based settings, and general healthcare contexts. IRB ethical approval was obtained in 95 studies (76.0%), while 30 studies (24.0%) did not report or obtain IRB approval.

Practical deployment and real-world implementation were described in seven studies. Examples included development of a mobile application for early delirium screening in long-term care, establishment of EHR systems for automated patient risk-factor collection, creation of AI-based systems for assessing emergency department visits and hospitalization risk, web-based applications applying ML models for stroke mortality prediction, pressure injury risk prediction for intensive care unit patients, and provision of online decision-support platforms along with real-time telemedicine services. Supplementary Table 7 provides detailed information about each study’s setting, IRB approval status, and deployment cases.

A summary of all findings based on the CRISP-DM framework is provided in Table 4.

DISCUSSION

This review was designed to systematically evaluate the methodological characteristics, reporting quality, and clinical translation of ML in nursing research. Our findings reveal a field facing a critical paradox. While the use of ML is rapidly expanding, its translation into robust, reproducible, and clinically impactful tools is hindered by significant methodological shortcomings. This is starkly illustrated by a mean TRIPOD+AI compliance rate of only 50.4%, which signals substantial gaps in both reporting transparency and methodological rigor [18]. Major deficiencies in data preparation (48.0%), class imbalance handling (22.4%), and algorithmic fairness (2.4%) undermine transparency, particularly in the CRISP-DM Data Preparation and Modeling phases. Many studies also lacked clear frameworks during the business Understanding phase, further weakening the prospects for clinical translation. Inadequate reporting of preprocessing and validation increases the risk of overfitting and reduces reproducibility, while neglect of fairness considerations may perpetuate social biases [19]. To address these issues, future research should adhere to TRIPOD+AI, publish relevant artifacts (e.g., preprocessing pipelines, fairness audits), and systematically address data imbalance to enhance clinical reliability. Standardized reporting is essential to align the rapid growth of the field with higher methodological quality. Our analysis reveals that the field is both maturing and globalizing, but this growth is marked by significant imbalance in its geographic and methodological focus, potentially producing a skewed evidence base [20,21]. The marked surge in publications since 2021, driven by improved data access and the impacts of coronavirus disease 2019, demonstrates rapid expansion [22,23]. However, this growth remains concentrated in certain countries and is heavily focused on supervised learning algorithms. This focus may bias the clinical questions addressed and the solution strategies chosen from a CRISP-DM business understanding perspective, thereby limiting global generalizability and methodological diversity. On a positive note, the strong presence of nurses—with over two-thirds of studies including at least 50% nursing authors—remains a notable strength [24]. Direct nursing involvement is critical for ensuring that research addresses authentic clinical problems and that models are designed with practical workflows in mind, thus improving clinical relevance and the likelihood of successful implementation. The prominence of journals like CIN: Computers, Informatics, Nursing underscores the field’s shift toward digital methods but also highlights the need for broader dissemination across a wider spectrum of clinical and general nursing journals.

In the business understanding phase, our findings indicate that research has overwhelmingly focused on fundamental, high-impact clinical challenges: pressure ulcers, falls, and hospital readmissions. These topics not only appear frequently in the literature but also represent core nursing-sensitive outcomes where ML has a clear potential to enhance patient safety and optimize care quality. For example, the sustained emphasis on pressure ulcer prediction, from early explorations [25] to recent, more sophisticated applications [26], reflects an ongoing effort to shift from reactive treatment to proactive prevention. Similarly, the evolution of fall prediction models from initial systems [27] to tailored applications in nursing homes [28] demonstrates the field’s progression toward targeting high-risk populations. Studies of readmission prediction, spanning patient groups from adults with diabetes [29] to pediatric populations [30], further illustrate the strategic use of ML to advance system-level goals such as cost reduction and continuity of care. Despite this clear clinical focus, a substantial gap exists in the formal use of data mining frameworks. Although CRISP-DM is the most frequently cited methodology, indicating recognition of the need for structured approaches [13], its principles were rarely applied thoroughly. This reveals a frequent disconnect between business understanding and data understanding, undermining the foundations for effective modeling and deployment.

A central paradox emerged in the Data Preparation and Modeling phases: although studies increasingly employ sophisticated techniques, progress was undermined by a persistent lack of reporting transparency, which threatens reproducibility. In the data preparation phase, for instance, researchers used a wide array of methods, ranging from standard encoding to advanced multiple imputation [31,32], and analyzed complex variables for predicting outcomes such as infections and mental health. In the modeling phase, random forests remained the most popular algorithm, recognized for their robust performance with complex data [33], alongside a wide array of other models from interpretable logistic regression to various boosting algorithms. However, this methodological sophistication was not matched by reporting rigor. Our TRIPOD+AI analysis identified transparency gaps that directly compromise reproducibility: for example, only 48.0% of studies reported on missing data handling (Item 11), and just 22.4% described their approach to class imbalance (Item 13), which is a crucial aspect for many clinical outcomes. This deficit extended to the modeling phase, where hyperparameter tuning was often limited to basic grid search [34], and no study reported plans for model recalibration (Item 12f) to address performance drift after deployment. This disconnect between methodological application and transparent reporting severely restricts replication and obscures risks, such as those related to class imbalance in rare but important clinical outcomes. Bringing these methodological threads together, we propose that future nursing ML studies explicitly map every methodological decision to the relevant CRISP-DM phase and publish corresponding artifacts, such as problem definition sheets, EDA dashboards, preprocessing pipelines, tuning logs, and drift-monitoring plans. This level of transparency will enhance reproducibility, facilitate peer auditing, and accelerate clinical translation.

In the evaluation phase, our analysis shows that nursing ML research continues to prioritize algorithmic performance over clinical interpretability, thus limiting translational potential. The reliance on global performance metrics, primarily AUC-ROC and accuracy, demonstrates an emphasis on overall model correctness. This pattern aligns with the frequent identification of ensemble methods like random forest as the top-performing algorithm, praised for their high predictive accuracy on complex nursing datasets [33]. Yet, this focus on aggregate performance can obscure clinical utility. For example, greater attention should be paid to metrics such as sensitivity and specificity, which often hold more clinical significance (e.g., minimizing false negatives for high-risk conditions). This narrow emphasis is compounded by a lack of model interpretability: most studies limited themselves to simple performance comparisons, with relatively little use of tools such as SHAP to explain predictions. This failure to prioritize explainability remains a major barrier to building clinical trust and ensuring that model decisions are safe and equitable. By focusing narrowly on a limited set of performance metrics—without sufficient regard for explainability or fairness audits—current practice falls short of fully satisfying the goals of the CRISP-DM evaluation phase, which should integrate model assessment with broader objectives for patient safety and health equity.

The most significant gap identified in this review lies in the deployment phase, as the vast majority of studies do not progress beyond model evaluation to real-world clinical implementation. Notably, only seven of the 125 analyzed studies reported any form of practical deployment, highlighting a critical research-to-practice gap. This scarcity reflects the immense challenges of clinical integration, which go far beyond model development and require addressing complex safety, reliability, and ethical issues [35]. While hospital-based research remains dominant, recent expansion into home and community care settings is a promising trend. Nevertheless, to bridge the divide between high-performing models and tangible patient impact, future work must prioritize implementation science. This includes developing user-centered evaluation standards and strengthening research on embedding ML tools into diverse clinical workflows to genuinely improve care quality and patient safety.

Concrete solutions include participatory design, real-time missing-data pipelines, and cluster randomized controlled trials. To translate predictive performance into bedside impact, we propose a three-tiered roadmap: (1) Participatory co-design, which involves engaging bedside nurses in early prototype testing to align alert frequency with cognitive load; (2) Integration with EHR and Clinical Decision Support, by leveraging Fast Healthcare Interoperability Resources-based Application Programming Interfaces so that model outputs populate existing decision-support widgets instead of separate dashboards; and (3) Prospective hybrid trials, combining A/B-tested usability endpoints with cluster randomized outcome metrics to evaluate both adoption and effectiveness. Key barriers include data governance concerns, alert fatigue, and algorithmic bias. Mitigation strategies may include federated learning to address data privacy, threshold-adaptive alerting, and continuous fairness audits.

In summary, when viewed through the CRISP-DM framework, methodological weaknesses accumulate across phases—from insufficient problem formalization and superficial data exploration, to opaque preprocessing, limited hyperparameter optimization, narrow evaluation, and minimal deployment planning. These limitations constrain the real-world impact of nursing ML. Addressing them will require transparent protocols, fairness-aware analytics, robust tuning and monitoring, and rigorous clinical trials to ensure that predictive models ultimately translate into improved patient outcomes and nursing practice.

Despite the review’s methodological rigor, several limitations should be acknowledged. First, the definition of 'nursing research' was operationalized by requiring at least one author with a nursing affiliation. While this provided an objective and reproducible screening criterion, it has limitations as a proxy for direct relevance to nursing practice. This approach may have excluded valuable interdisciplinary studies where ML was applied to nursing-sensitive outcomes (e.g., patient falls, pressure injuries) but conducted by teams lacking a formally affiliated nursing researcher. Conversely, it may have included studies where a nursing author's involvement was minimal and the research focus was not central to clinical nursing. Future reviews could use a more nuanced, two-stage approach: an initial broad search for nursing-sensitive outcomes, followed by a content-based assessment of each study’s direct applicability to nursing practice, though this would introduce greater subjectivity into the selection process. Second, while the review team’s nursing background ensured clinical relevance, their interpretations of ML methods and performance may reflect a nursing-centered perspective; researchers from computer science or ML fields might have offered different evaluations of model selection, hyperparameter tuning, or performance metrics. Third, although the TRIPOD+AI checklist is suitable for evaluating clinical prediction models, it may not fully capture the methodological nuances of nursing research, particularly in studies involving exploratory or unsupervised approaches. The moderate mean compliance rate of 50.4% underscores broader issues in methodological rigor and reporting transparency. Fourth, our deliberate exclusion of DL studies, while necessary for methodological consistency, means that this review does not represent the entire landscape of AI in nursing. The rapidly growing body of research utilizing CNNs for medical imaging or RNNs for sequential EHR analysis falls outside the scope of this review. Consequently, our findings and conclusions are specific to traditional ML applications, and a separate, dedicated systematic review is needed to characterize the unique methodologies and challenges of DL in the nursing field.

Nevertheless, this review has notable strengths. It is the first synthesis to apply the TRIPOD+AI checklist and PRISMA framework to 125 nursing ML studies, anchored in a pre-registered PROSPERO protocol and structured by the CRISP-DM model. The search strategy encompassed nine international and five Korean databases, thereby minimizing language and regional bias. Dual independent screening and quality appraisal further reduced reviewer subjectivity, while the large sample size enabled robust identification of reporting and algorithmic trends spanning two decades. By linking checklist findings to phase-specific recommendations, this review provides actionable guidance for future nursing ML research and editorial policy development.

CONCLUSION

This study systematically reviewed and analyzed the application of ML in nursing, highlighting both its achievements and limitations. The use of ML in nursing research has recently increased rapidly, with a predominant focus on patient safety and healthcare quality improvement—particularly in areas such as pressure ulcer prediction, fall prevention, and hospital readmission management. Algorithms such as random forest, XGBoost, and logistic regression were widely employed, with performance metrics like AUC-ROC and accuracy most commonly used for evaluation. However, several factors limit comparability and reproducibility across studies, including inadequate reporting of data preprocessing methods, inconsistent performance evaluation criteria, and insufficient attention to algorithm fairness and ethical considerations. The generally low compliance rate with the TRIPOD+AI checklist underscores the need to improve transparency and reliability in nursing ML research. Additional shortcomings included insufficient detail on hyperparameter tuning and model performance evaluation processes, as well as limited real-world deployment of ML tools. Consequently, there is an urgent need to standardize research design and reporting practices to enhance the quality of ML studies in nursing. Adherence to structured reporting guidelines, such as TRIPOD+AI, can significantly improve transparency and reproducibility. Furthermore, future research should prioritize practical deployment in diverse clinical settings, ongoing model performance optimization, and fairness assurance to strengthen clinical applicability. By adopting systematic and standardized approaches, future nursing ML research can enhance practical relevance and contribute to improved patient-centered nursing care quality.

CONFLICTS OF INTEREST

The authors declared no conflict of interest.

AUTHORSHIP

Study conception and/or design acquisition - EJK and SKK; analysis - EJK and SKK; interpretation of the data - EJK and SKK; and drafting or critical revision of the manuscript for important intellectual content - EJK and SKK.

FUNDING

None.

ACKNOWLEDGEMENT

None.

DATA AVAILABILITY STATEMENT

No new data were created or analyzed during this study. Data sharing is not applicable to this article.

SUPPLEMENTARY MATERIAL

Supplementary materials can be found via https://doi.org/10.7475/kjan.2025.0327.

Supplement Table 1.

Presenting the PICOTS-SD Process Applied in the Screening Step

kjan-2025-0327-Supplemental-Table-1.pdf

Supplement Table 2.

Analysis of Business Understanding phase based on CRISP-DM methodology in selected studies (N=125)

kjan-2025-0327-Supplemental-Table-2.pdf

Supplement Table 3.

Results Based on Data Understanding in CRISP-DM (N=125)

kjan-2025-0327-Supplemental-Table-3.pdf

Supplement Table 4.

Results Based on Data Preparation in CRISP-DM (N=125)

kjan-2025-0327-Supplemental-Table-4.pdf

Supplement Table 5.

Results Based on Modeling in CRISP-DM (N=125)

kjan-2025-0327-Supplemental-Table-5.pdf

Supplement Table 6.

Results Based on Evaluation in CRISP-DM (N=125)

kjan-2025-0327-Supplemental-Table-6.pdf

Supplement Table 7.

Results based on Deployment in CRISP-DM (N=125)

kjan-2025-0327-Supplemental-Table-7.pdf

Figure 1.

PRISMA flow diagram.

Table 1.

CRISP-DM Process Model Descriptions

No.	Phases	Short descriptions
1	Business understanding	- Determine business objectives, assess situation, define data mining goals, develop project plan.
1	Business understanding	Understanding business goals and translating them into data mining objectives.
2	Data understanding	- Initial data collection, data description, data exploration, data quality assessment.
2	Data understanding	Collecting data, familiarizing oneself with data, identifying quality issues, and gaining initial insights.
3	Data preparation	- Data selection, data cleaning, data construction, data integration, data formatting.
3	Data preparation	All activities required to construct the final dataset from initial raw data.
4	Modeling	- Select modeling techniques, design tests, build models, evaluate models.
4	Modeling	Selecting, applying, and optimizing modeling techniques.
5	Evaluation	- Evaluate results, review processes, determine next steps.
5	Evaluation	Evaluating the model from the perspective of achieving business objectives and reviewing the entire process.
6	Deployment	- Plan deployment, monitor and maintain the model, produce final reports, review project.
6	Deployment	Integrating the model into actual business processes and ensuring organizational usage of the outcomes.

CRISP-DM=Cross-Industry Standard Process for Data Mining.

Table 2.

Data Extraction Plan

No.	Phases	Data extraction details
1	General characteristics	1) Journal name, 2) Year of publication, 3) Authors, 4) Proportion of nursing-affiliated authors (determined based on institutional affiliation), 5) Country, 6) Type of ML (supervised/unsupervised/reinforcement), 7) Type of algorithm used (prediction/classification/clustering, etc.)
2	Business understanding	8) Research objective, 9) Research design and methodology (KDD, CRISP-DM, etc.)
3	Data understanding	10) Tools and software used for data analysis, 11) Data source, 12) EDA methods, 13) Target of ML application, 14) Sample size
4	Data preparation	15) Data preprocessing techniques (normalization, standardization, encoding, etc.), 16) Predictor (explanatory/training) variables, 17) Number of predictor variables, 18) Target variable (the variable to be predicted, classified, or analyzed), 19) Data split ratio
5	Modeling	20) Applied ML algorithms and modeling techniques, 21) Hyperparameter tuning methods
6	Evaluation	22) Confusion matrix, 23) Performance evaluation metrics for classification and regression models, 24) Performance evaluation results for each model, 25) Best model, 26) Method of analyzing variable importance or the impact on the target variable (feature importance, SHAP value, etc.)
7	Deployment	27) Research environment, 28) IRB approval, 29) Whether the model was actually deployed

CRISP-DM=Cross-Industry Standard Process for Data Mining; EDA=exploratory data analysis; IRB=Institutional Review Board; KDD=Knowledge Discovery in Databases; ML=machine learning; SHAP=Shapley additive explanations.

Table 3.

General Characteristics of the Selected Studies (N=125)

Characters	Categories	n (%)
Location^†	Asia	71 (35.9)
	North America	59 (29.8)
	Europe	22 (11.1)
	Oceania	4 (2.0)
	South America	2 (1.0)
Publication year	Before 2020	26 (20.8)
	2020	11 (8.8)
	2021	20 (16.0)
	2022	18 (14.4)
	2023	24 (19.2)
	2024	26 (20.8)
Percentage of nurses on the research team (%)	0.0–0.25	16 (12.8)
	0.25–0.5	26 (20.8)
	0.5–0.75	41 (32.8)
	0.75–1.0	42 (33.6)
ML type	Supervised learning	117 (93.6)
	Unsupervised learning	5 (4.0)
	Supervised and unsupervised learning	3 (2.4)
Algorithm type	Classification	101 (80.8)
	Regression	14 (11.2)
	Clustering	7 (5.6)
	Classification, association rule mining	2 (1.6)
	Regression, dimensionality reduction	1 (0.8)
Research objective	Pressure-injury/ulcer	24 (19.2)
	Readmission/utilization	17 (13.6)
	Fall risk	10 (8.0)
	Others	74 (59.2)
Journal	CIN: Computers, Informatics, Nursing	7 (5.6)
	Applied Clinical Informatics	4 (3.2)
	International Journal of Environmental Research and Public Health	4 (3.2)
	Journal of the American Medical Informatics Association	4 (3.2)
	BMC Medical Informatics and Decision Making	3 (2.4)
	International Journal of Medical Informatics	3 (2.4)
	Journal of Advanced Nursing	3 (2.4)
	Journal of Nursing Management	3 (2.4)
	Nursing Research	3 (2.4)
	BMC Nursing	2 (1.6)
	Journal of Biomedical Informatics	2 (1.6)
	Journal of Emergency Nursing	2 (1.6)
	Journal of Medical Internet Research	2 (1.6)
	Journal of Nursing Scholarship	2 (1.6)
	Journal of the American Medical Directors Association	2 (1.6)
	International Journal of Nursing Studies	2 (1.6)
	Healthcare	2 (1.6)
	Innovation in Applied Nursing Informatics	2 (1.6)
	Nurse Education Today	2 (1.6)
	Archives of Psychiatric Nursing	2 (1.6)
	Others	69 (55.2)

ML=machine learning;

^†The number of “location” values exceeds the 125 included studies because location data were compiled for each author, and a single study could involve authors from multiple locations.

Table 4.

Summary of the Findings of This Study

Sections	Key findings
Quality appraisal of the studies	The TRIPOD+AI appraisal showed moderate compliance (≈50%). Core methods were well reported (>85%), but transparency and ethical aspects were weak (<25%).
General characteristics of the selected studies	USA (30.8%), China (14.5%), and Korea (10.7%) dominated. Most studies used supervised learning (93.6%), especially classification tasks (80.8%).
Problem definition for research objective	Top topics were pressure injury, fall, and readmission. Most studies lacked formal data-mining frameworks; CRISP-DM was the most used among those that did.
Data collection and exploration	R, Python, and SPSS were the most used tools. EMR/EHR and survey data dominated. Sample sizes ranged from 5 to over 3.5 million cases.
Data preparation	Common preprocessing included standardization, normalization, and imputation. Label encoding and one-hot encoding were frequent. Most studies used <50 predictors.
Model building	Random forest was the most used algorithm, followed by logistic regression, SVM, and XGBoost. Hyperparameter tuning was often omitted; Grid Search was most common when used.
Evaluation and review	AUC-ROC, accuracy, sensitivity, and F1-score were most reported. RF was most often top-performing; 42 studies did not report variable importance.
Deployment	Hospitals were the most common setting. IRB approval was reported in 76.0% of studies. Only seven studies described actual deployment.

AUC-ROC=area under the receiver operating characteristic curve; CRISP-DM=Cross-Industry Standard Process for Data Mining; EHR=electronic health record; EMR=electronic medical record; IRB=Institutional Review Board; RF=random forest; SVM=support vector machine; TRIPOD+AI=Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis+Artificial Intelligence extension; XGBoost=eXtreme gradient boosting.

REFERENCES

1. IBM. What is artificial intelligence (AI)? [Internet]. Armonk, NY: IBM; 2025 [cited 2025 March 21]. Available from: https://www.ibm.com/topics/artificial-intelligence
2. Ham S. The fourth industrial revolution and the changing world of work: an occupational health perspective. J Korean Soc Occup Environ Hyg. 2024;34(2):134-8. https://doi.org/10.15269/JKSOEH.2024.34.2.134
Article
3. Miao Q, Zheng W, Lv Y, Huang M, Ding W, Wang FY. DAO to HANOI via DeSci: AI paradigm shifts from AlphaGo to ChatGPT. IEEE/CAA J Autom Sin. 2023;10(4):877-97. https://doi.org/10.1109/JAS.2023.123561
Article
4. Bertolini M, Mezzogori D, Neroni M, Zammori F. Machine learning for industrial applications: a comprehensive literature review. Expert Syst Appl. 2021;175:114820. https://doi.org/10.1016/j.eswa.2021.114820
Article
5. Hong EJ. Beyond ChatGPT: the future of generative AI – Part 2 [Internet]. Seoul: Samsung SDS; 2023 [cited 2025 March 21]. Available from: https://www.samsungsds.com/kr/insights/future_of_generative_ai_2.html
6. Ruksakulpiwat S, Thorngthip S, Niyomyart A, Benjasirisan C, Phianhasin L, Aldossary H, et al. A systematic review of the application of artificial intelligence in nursing care: where are we, and what's next? J Multidiscip Healthc. 2024;17:1603-16. https://doi.org/10.2147/jmdh.S459946
Article
PubMed
PMC
7. Hannaford L, Cheng X, Kunes-Connell M. Predicting nursing baccalaureate program graduates using machine learning models: a quantitative research study. Nurse Educ Today. 2021;99:104784. https://doi.org/10.1016/j.nedt.2021.104784
Article
PubMed
8. Yu MY, Son YJ. Machine learning-based 30-day readmission prediction models for patients with heart failure: a systematic review. Eur J Cardiovasc Nurs. 2024;23(7):711-9. https://doi.org/10.1093/eurjcn/zvae031
Article
PubMed
9. Kim SK, Kim EJ, Kim HK, Song SS, Park BN, Jo KW. Development of a nurse turnover prediction model in Korea using machine learning. Healthcare (Basel). 2023;11(11):1583. https://doi.org/10.3390/healthcare11111583
Article
PubMed
PMC
10. Topaz M, Murga L, Gaddis KM, McDonald MV, Bar-Bachar O, Goldberg Y, et al. Mining fall-related information in clinical notes: comparison of rule-based and novel word embedding-based machine learning approaches. J Biomed Inform. 2019;90:103103. https://doi.org/10.1016/j.jbi.2019.103103
Article
PubMed
11. Westbrook JI, Duffield C, Li L, Creswick NJ. How much time do nurses have for patients? A longitudinal study quantifying hospital nurses' patterns of task time distribution and interactions with health professionals. BMC Health Serv Res. 2011;11:319. https://doi.org/10.1186/1472-6963-11-319
Article
PubMed
PMC
12. Wang W, Kiik M, Peek N, Curcin V, Marshall IJ, Rudd AG, et al. A systematic review of machine learning models for predicting outcomes of stroke with structured data. PLoS One. 2020;15(6):e0234722. https://doi.org/10.1371/journal.pone.0234722
Article
PubMed
PMC
13. Shearer C. The CRISP-DM model: the new blueprint for data mining. J Data Warehousing. 2000;5(4):13-22.
14. Shafique U, Qaiser H. A comparative study of data mining process models (KDD, CRISP-DM and SEMMA). Int J Innov Sci Res. 2014;12(1):217-22.
15. Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane handbook for systematic reviews of interventions version 6.4. London: Cochrane; 2023.
16. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. https://doi.org/10.1136/bmj.n71
Article
PubMed
PMC
17. Bidwell S, Jensen MF. What is a search protocol? In: Bidwell S, Jensen MF, editors. Etext on health technology assessment (HTA) information resources [Internet]. Bethesda, MD: National Library of Medicine; 2003 [cited 2024 October 16]. Available from: https://www.nlm.nih.gov/archive/20060905/nichsr/ehta/chapter3.html#1
18. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. https://doi.org/10.1136/bmj-2023-078378
Article
PubMed
PMC
19. Alruwaili AR, Alrowaili TE, Alanazi N, Alanazi B, Al Rowaili EW, Alotaibi M, et al. Comprehensive analysis of the role of nursing in promoting healthcare equity. J Ecohumanism. 2024;3(8):4153-61. https://doi.org/10.62754/joe.v3i8.5070
Article
20. Choi J, Lee H, Kim-Godwin Y. Decoding machine learning in nursing research: a scoping review of effective algorithms. J Nurs Scholarsh. 2025;57(1):119-29. https://doi.org/10.1111/jnu.13026
Article
PubMed
PMC
21. Lin CP, Chen LA. Application of artificial intelligence models in nursing research. Hu Li Za Zhi. 2024;71(5):14-20. Chinese. https://doi.org/10.6224/jn.202410_71(5).03
Article
PubMed
22. Yang L, Wang Y, Mu X, Liao Y. A visualized and bibliometric analysis of nursing research during the COVID-19 pandemic. Medicine (Baltimore). 2024;103(32):e39245. https://doi.org/10.1097/md.0000000000039245
Article
PubMed
PMC
23. Holtz BE, Urban FA, Oesterle J, Blake R, Henry A. The promise of remote patient monitoring. Telemed J E Health. 2024;30(12):2776-81. https://doi.org/10.1089/tmj.2024.0521
Article
PubMed
24. Jacobson AF, Warner AM, Fleming E, Schmidt B. Factors influencing nurses' participation in clinical research. Gastroenterol Nurs. 2008;31(3):198-208. https://doi.org/10.1097/01.SGA.0000324112.63532.a2
Article
PubMed
25. Kim TY, Lang N. Predictive modeling for the prevention of hospital-acquired pressure ulcers. AMIA Annu Symp Proc. 2006;2006:434-8.
PubMed
PMC
26. Lee JH, Yu JY, Shim SY, Yeom KM, Ha HA, Jekal SY, et al. Development of a pressure injury machine learning prediction model and integration into clinical practice: a prediction model development and validation study. Korean J Adult Nurs. 2024;36(3):191-202. https://doi.org/10.7475/kjan.2024.36.3.191
Article
27. Yokota S, Endo M, Ohe K. Establishing a classification system for high fall-risk among inpatients using support vector machines. Comput Inform Nurs. 2017;35(8):408-16. https://doi.org/10.1097/cin.0000000000000332
Article
PubMed
28. Shao L, Wang Z, Xie X, Xiao L, Shi Y, Wang ZA, et al. Development and external validation of a machine learning-based fall prediction model for nursing home residents: a prospective cohort study. J Am Med Dir Assoc. 2024;25(9):105169. https://doi.org/10.1016/j.jamda.2024.105169
Article
PubMed
29. Kwon JY, Karim ME, Topaz M, Currie LM. Nurses "Seeing Forest for the Trees" in the age of machine learning: using nursing knowledge to improve relevance and performance. Comput Inform Nurs. 2019;37(4):203-12. https://doi.org/10.1097/cin.0000000000000508
Article
PubMed
30. Zhou H, Albrecht MA, Roberts PA, Porter P, Della PR. Using machine learning to predict paediatric 30-day unplanned hospital readmissions: a case-control retrospective analysis of medical records, including written discharge documentation. Aust Health Rev. 2021;45(3):328-37. https://doi.org/10.1071/ah20062
Article
PubMed
31. Hasan MK, Alam MA, Roy S, Dutta A, Jawad MT, Das S. Missing value imputation affects the performance of machine learning: a review and analysis of the literature (2010–2021). Inform Med Unlocked. 2021;27:100799. https://doi.org/10.1016/j.imu.2021.100799
Article
32. Sharma V. A study on data scaling methods for machine learning. Int J Glob Acad Sci Res. 2022;1(1):31-42. https://doi.org/10.55938/ijgasr.v1i1.4
Article
33. Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. https://doi.org/10.1023/A:1010933404324
Article
34. Subasi N. Comprehensive analysis of grid and randomized search on dataset performance. Eur J Eng Appl Sci. 2024;7(2):77-83. https://doi.org/10.55581/ejeas.1581494
Article
35. Naik N, Hameed BMZ, Shetty DK, Swain D, Shah M, Paul R, et al. Legal and ethical consideration in artificial intelligence in healthcare: who takes responsibility? Front Surg. 2022;9:862322. https://doi.org/10.3389/fsurg.2022.862322
Article
PubMed
PMC

List of studies included in the systematic review
List of studies excluded in the systematic review

Figure & Data

References

Citations

Citations to this article as recorded by

Cite

CITE

export

Copy Download

Format
XML Download

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

RIS — For EndNote, ProCite, RefWorks, and most other reference management software
BibTeX — For JabRef, BibDesk, and other BibTeX-specific software

Include:

Citation for the content below
Citation and abstract for the content below

Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

Korean J Adult Nurs. 2025;37(3):189-214. Published online August 29, 2025

DOI: https://doi.org/10.7475/kjan.2025.0327

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

RIS — For EndNote, ProCite, RefWorks, and most other reference management software
BibTeX — For JabRef, BibDesk, and other BibTeX-specific software

Include:

Citation for the content below
Citation and abstract for the content below

Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

Korean J Adult Nurs. 2025;37(3):189-214. Published online August 29, 2025

DOI: https://doi.org/10.7475/kjan.2025.0327

Figure

Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

Figure 1. PRISMA flow diagram.

Figure 1.

Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

No.	Phases	Short descriptions
1	Business understanding	- Determine business objectives, assess situation, define data mining goals, develop project plan.
1	Business understanding	Understanding business goals and translating them into data mining objectives.
2	Data understanding	- Initial data collection, data description, data exploration, data quality assessment.
2	Data understanding	Collecting data, familiarizing oneself with data, identifying quality issues, and gaining initial insights.
3	Data preparation	- Data selection, data cleaning, data construction, data integration, data formatting.
3	Data preparation	All activities required to construct the final dataset from initial raw data.
4	Modeling	- Select modeling techniques, design tests, build models, evaluate models.
4	Modeling	Selecting, applying, and optimizing modeling techniques.
5	Evaluation	- Evaluate results, review processes, determine next steps.
5	Evaluation	Evaluating the model from the perspective of achieving business objectives and reviewing the entire process.
6	Deployment	- Plan deployment, monitor and maintain the model, produce final reports, review project.
6	Deployment	Integrating the model into actual business processes and ensuring organizational usage of the outcomes.

No.	Phases	Data extraction details
1	General characteristics	1) Journal name, 2) Year of publication, 3) Authors, 4) Proportion of nursing-affiliated authors (determined based on institutional affiliation), 5) Country, 6) Type of ML (supervised/unsupervised/reinforcement), 7) Type of algorithm used (prediction/classification/clustering, etc.)
2	Business understanding	8) Research objective, 9) Research design and methodology (KDD, CRISP-DM, etc.)
3	Data understanding	10) Tools and software used for data analysis, 11) Data source, 12) EDA methods, 13) Target of ML application, 14) Sample size
4	Data preparation	15) Data preprocessing techniques (normalization, standardization, encoding, etc.), 16) Predictor (explanatory/training) variables, 17) Number of predictor variables, 18) Target variable (the variable to be predicted, classified, or analyzed), 19) Data split ratio
5	Modeling	20) Applied ML algorithms and modeling techniques, 21) Hyperparameter tuning methods
6	Evaluation	22) Confusion matrix, 23) Performance evaluation metrics for classification and regression models, 24) Performance evaluation results for each model, 25) Best model, 26) Method of analyzing variable importance or the impact on the target variable (feature importance, SHAP value, etc.)
7	Deployment	27) Research environment, 28) IRB approval, 29) Whether the model was actually deployed

Characters	Categories	n (%)
Location^†	Asia	71 (35.9)
	North America	59 (29.8)
	Europe	22 (11.1)
	Oceania	4 (2.0)
	South America	2 (1.0)
Publication year	Before 2020	26 (20.8)
	2020	11 (8.8)
	2021	20 (16.0)
	2022	18 (14.4)
	2023	24 (19.2)
	2024	26 (20.8)
Percentage of nurses on the research team (%)	0.0–0.25	16 (12.8)
	0.25–0.5	26 (20.8)
	0.5–0.75	41 (32.8)
	0.75–1.0	42 (33.6)
ML type	Supervised learning	117 (93.6)
	Unsupervised learning	5 (4.0)
	Supervised and unsupervised learning	3 (2.4)
Algorithm type	Classification	101 (80.8)
	Regression	14 (11.2)
	Clustering	7 (5.6)
	Classification, association rule mining	2 (1.6)
	Regression, dimensionality reduction	1 (0.8)
Research objective	Pressure-injury/ulcer	24 (19.2)
	Readmission/utilization	17 (13.6)
	Fall risk	10 (8.0)
	Others	74 (59.2)
Journal	CIN: Computers, Informatics, Nursing	7 (5.6)
	Applied Clinical Informatics	4 (3.2)
	International Journal of Environmental Research and Public Health	4 (3.2)
	Journal of the American Medical Informatics Association	4 (3.2)
	BMC Medical Informatics and Decision Making	3 (2.4)
	International Journal of Medical Informatics	3 (2.4)
	Journal of Advanced Nursing	3 (2.4)
	Journal of Nursing Management	3 (2.4)
	Nursing Research	3 (2.4)
	BMC Nursing	2 (1.6)
	Journal of Biomedical Informatics	2 (1.6)
	Journal of Emergency Nursing	2 (1.6)
	Journal of Medical Internet Research	2 (1.6)
	Journal of Nursing Scholarship	2 (1.6)
	Journal of the American Medical Directors Association	2 (1.6)
	International Journal of Nursing Studies	2 (1.6)
	Healthcare	2 (1.6)
	Innovation in Applied Nursing Informatics	2 (1.6)
	Nurse Education Today	2 (1.6)
	Archives of Psychiatric Nursing	2 (1.6)
	Others	69 (55.2)

Sections	Key findings
Quality appraisal of the studies	The TRIPOD+AI appraisal showed moderate compliance (≈50%). Core methods were well reported (>85%), but transparency and ethical aspects were weak (<25%).
General characteristics of the selected studies	USA (30.8%), China (14.5%), and Korea (10.7%) dominated. Most studies used supervised learning (93.6%), especially classification tasks (80.8%).
Problem definition for research objective	Top topics were pressure injury, fall, and readmission. Most studies lacked formal data-mining frameworks; CRISP-DM was the most used among those that did.
Data collection and exploration	R, Python, and SPSS were the most used tools. EMR/EHR and survey data dominated. Sample sizes ranged from 5 to over 3.5 million cases.
Data preparation	Common preprocessing included standardization, normalization, and imputation. Label encoding and one-hot encoding were frequent. Most studies used <50 predictors.
Model building	Random forest was the most used algorithm, followed by logistic regression, SVM, and XGBoost. Hyperparameter tuning was often omitted; Grid Search was most common when used.
Evaluation and review	AUC-ROC, accuracy, sensitivity, and F1-score were most reported. RF was most often top-performing; 42 studies did not report variable importance.
Deployment	Hospitals were the most common setting. IRB approval was reported in 76.0% of studies. Only seven studies described actual deployment.

Table 1. CRISP-DM Process Model Descriptions

CRISP-DM=Cross-Industry Standard Process for Data Mining.

Table 2. Data Extraction Plan

Table 3. General Characteristics of the Selected Studies (N=125)

ML=machine learning;

†

The number of “location” values exceeds the 125 included studies because location data were compiled for each author, and a single study could involve authors from multiple locations.

Articles

Page Path

Machine Learning Applications in Nursing-Affiliated Research: A Systematic Review

Full Article

Abstract

INTRODUCTION

METHODS

RESULTS

1) Quality assessment of included studies

2) General characteristics of the included studies

3) Summary of results according to phase 1: business understanding

4) Summary of results according to phase 2: data understanding

5) Summary of results according to phase 3: data preparation

6) Summary of results according to phase 4: modeling

1) Summary of results according to phase 5: evaluation

2) Summary of results according to phase 6: deployment

DISCUSSION

CONCLUSION

SUPPLEMENTARY MATERIAL

Supplement Table 1.

Supplement Table 2.

Supplement Table 3.

Supplement Table 4.

Supplement Table 5.

Supplement Table 6.

Supplement Table 7.

REFERENCES

Appendices

Appendix 1.

Appendix 2.

Figure & Data

References

Citations

CITE

Download Citation

Format:

Include:

Figure

Figure 1.

Table 1.

Table 2.

Table 3.

Table 4.

ABOUT

BROWSE ARTICLES

EDITORIAL POLICY

FOR CONTRIBUTORS