Back to Journals » International Journal of Women's Health » Volume 18

Machine Learning Models for Predicting Liver Metastasis at Diagnosis and Overall Survival in Ovarian Cancer: A SEER-Based Study

Authors Li C, Huang L, Jiang R

Received 21 January 2026

Accepted for publication 4 May 2026

Published 26 May 2026 Volume 2026:18 597887

DOI https://doi.org/10.2147/IJWH.S597887

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Matteo Frigerio



Chao Li,1 Lihong Huang,1 Rui Jiang2

1Department of Obstetrics and Gynecology, Wuhan Fourth Hospital, Wuhan, Hubei, 430000, People’s Republic of China; 2Department of Gastrointestinal Surgery, Wuhan Fourth Hospital, Wuhan, Hubei, 430000, People’s Republic of China

Correspondence: Chao Li, Email [email protected]

Purpose: Ovarian cancer (OC) is one of the most common gynecological tumors, and liver metastases are the most common metastatic sites of OC. However, there is a lack of machine learning (ML) models that can predict the diagnosis and prognosis of liver metastases in OC patients. Therefore, this study aims to develop an effective predictive model.
Methods: This was a retrospective study based on the Surveillance, Epidemiology, and End Results (SEER) database. Patients with OC diagnosed from 2010 to 2020 were extracted from the SEER database. The dataset was partitioned into a training cohort (60%) and a validation cohort (40%). The primary endpoints of this study were the liver metastases at diagnosis in OC patients and the 12-, 36-, and 60-month overall survival (OS) of OC liver metastases patients. After feature selection via Boruta algorithm, 9 ML diagnostic models and 5 prognostic models were constructed. In the diagnostic model, area under the curve (AUC), accuracy, kappa, sensitivity, specificity, positive predictive value, negative predictive value evaluated model reliability. AUC and Brier score were used for prognostic models.
Results: Of 27,065 OC patients, 1053 had liver metastases at diagnosis. It was observed that histological type, T stage, grade, age, N stage, CA125, laterality, and race were associated with liver metastasis at diagnosis in OC patients. Histological type, chemotherapy, surgery, radiotherapy, lung metastasis, bone metastasis, age, tumor grade, and marital status were associated with OS in patients with OC liver metastasis. In the training cohort, KNN had the highest AUC value (0.863). In the validation cohort, Ridge had the highest AUC (0.758). Among the 9 diagnostic models, Ridge regression stands out the most, with AUC (0.758), sensitivity (0.929), and negative predictive value (0.994) were the highest. For 12-month OS, RSF model had highest AUC (0.876) in both training and validation groups. Among 5 prognostic models, RSF showed best comprehensive performance. The stage T was the most discriminative features for diagnosing liver metastasis. For OS at 12, 36, and 60 months, the most discriminative prognostic features were chemotherapy, histological type, and age, respectively. Surgery and chemotherapy were associated with improved OS.
Conclusion: The Ridge regression and RSF had favorable predictive performance in the diagnostic and prognostic models, respectively, compared with the other tested models; this may further help clinicians identify patients with liver metastasis at the time of OC diagnosis and select appropriate treatment options.

Keywords: ovarian cancer, liver metastases, machine learning, diagnosis, prognosis

Introduction

Ovarian cancer (OC) is the third most common malignant tumor of the female reproductive system. It is reported that in 2022, a total of 324,398 new cases of OC were diagnosed globally, and 206,839 deaths were attributed to OC.1 With the advancement of medical science and technology, the survival rate of OC patients has significantly increased, the number of OC patients with metastasis is also on the rise.2 The common metastatic routes of OC include hematogenous metastasis, lymphatic metastasis, seeding metastasis and direct invasion.3 More than two-thirds of OC patients have distant metastases at the time of diagnosis.4 It is well known that the liver is the most common site of distant metastasis of OC, and nearly 57% of distant metastases are liver metastases.5 Patients with liver metastatic OC at diagnosis have a poorer prognosis, with a median survival time of only 30 months in this population.6 In addition, liver-related events caused by liver metastasis, such as liver failure, have a significant negative impact on the prognosis of OC patients.7 Therefore, identifying the presence of liver metastasis at the time of OC diagnosis, and predicting the subsequent survival of those patients, is of great clinical importance. The Tumor, Node, Metastasis (TNM) staging system and the pathological classification have always been regarded as the prognostic assessment systems for liver metastases of OC.8–12 However, these systems lack sufficient basic demographic information and have limited variable inclusion. Some studies have found that age, race, and treatment are all predictive factors for liver metastases of ovarian cancer.10 This study aims to develop a more accurate clinical model that includes as many variables as possible.

In terms of model development, the nomogram is currently the most commonly used predictive model for liver metastases of OC, but its overall predictive performance is not high [area under the curve (AUC)=0.764].13 With its powerful advantages of handling massive amounts of data and achieving high-precision predictions, Machine Learning (ML) is being increasingly widely applied in the medical field.14,15 In this retrospective study, we extracted demographic, pathological, and survival information of OC patients from the Surveillance, Epidemiology, and End Results (SEER) database. By comparing the performance of multiple ML models, we aimed to build a reliable model for (1) diagnosing the presence of liver metastasis at the time of initial diagnosis, and (2) predicting overall survival among OC patients with liver metastasis. The results are intended to provide targeted references for the treatment of OC patients with liver metastasis.

Materials and Methods

Study Population

This retrospective study used data from the SEER database. The SEER database, established by the National Cancer Institute (NCI) in 1973, is one of the most representative large-scale tumor registry databases in North America, collecting a large amount of data related to evidence-based medicine and providing systematic evidence support and invaluable first-hand information for clinicians’ evidence-based practice and collects a large amount of data related to evidence-based medicine, which provides systematic evidence support and valuable first-hand information for clinicians’ evidence-based practice and clinical research. In this study, SEER*Stat software was used to identify patients diagnosed with OC from the SEER database 2010–2020.

The study population needed to meet the following inclusion criteria: (1) diagnosed with OC (C56.9) according to the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3); (2) the age at diagnosis was between 18 and 75 years old; (3) having complete clinical, pathological and follow-up data (such as liver metastasis status, survival time). The exclusion criteria included: (1) patients with missing information on the degree of tumor differentiation; (2) insufficient follow-up time (<1 month), (3) unknown AJCC T and N stages, and AJCC T0 stage. Finally, a total of 27,065 participants were included in this study. The process of selecting participants was shown in Figure 1. Ethical approval and the requirement for informed consent forms were waived in this study, as the data were accessed from SEER, a publicly available database.

Flowchart of ovarian cancer patient screening from SEER database, detailing exclusions and final study population.

Figure 1 Flow chart of patient screening.

Data Collection

The primary endpoints of this study were liver metastases in OC patients and the 12-, 36- and 60-month overall survival (OS) of OC patients with liver metastases. OS refers to the time from the diagnosis to death due to any cause. The following variables were extracted from the SEER database: age, race (black, white, other), marital status (married, single), laterality (only one side, bilateral), stage T (T1, T2, T3), stage N (N0, N1), grade (I, II, III, IV), histologic type (serous, endometrioid, clear cell, mucinous, other), cancer antigen-125 (CA-125) (negative, positive), surgery (no, yes), chemotherapy (no/unknown, yes), and radiotherapy (no/unknown, yes), bone metastasis (no, yes), brain metastasis (no, yes), lung metastasis (no, yes), liver metastasis (no, yes). Specifically, the variables included in the prediction of liver metastasis at diagnosis were limited to age, race, marital status, laterality, stage T, stage N, grade, histologic type, and CA-125; all variables were included in the prediction of OS.

Feature Selection and Validation Strategy

In order to minimize the negative impacts of overfitting, we carried out feature selection to remove irrelevant or redundant invalid features. In this study, the Boruta algorithm was adopted to systematically evaluate the importance of features. Variables with P<0.01 were ultimately included in the prediction and diagnostic models.

The overall dataset collected from the SEER database was randomly divided into two cohorts, namely the training cohort and the validation cohort, at a ratio of 6:4. In the diagnostic model, indicators such as accuracy, kappa, sensitivity, specificity, positive predictive value, negative predictive value and AUC were used to evaluate the reliability of the nine ML models. In the prognostic model, the AUC and Brier score was applied to assess the reliability of the five ML models. After a comprehensive comparison of different ML models, we selected the model with the best predictive ability as the final predictive model. To further confirm the applicability of the selected model, we evaluated it in the validation cohort.

The importance of each feature was ranked through the Shapley Additive Explanation (SHAP) in diagnostic models. The feature importance was ranked through the Brier score in prognostic models.

ML Algorithms

R language (version 4.4.3) software is used to build machine learning prediction models.

Ridge regression is a biased estimation regression method specifically used for the analysis of collinear data. In essence, it is a modified least squares estimation method. By giving up the unbiasedness of the least squares method and at the cost of losing some information and reducing the accuracy, it obtains a regression method in which the regression coefficients are more in line with reality and more reliable.

Logistic regression is a generalized linear regression analysis model. In fact, it is mainly used to solve binary classification problems or multi-class classification problems. The model is trained with the given n sets of data (training cohort), and after the training is completed, it classifies the given one or more sets of data (validation cohort).

The Neural Network (NN) is a mathematical or computational model that imitates the structure and function of a biological neural network (the central nervous system of animals, especially the brain). It is used to estimate or approximate functions and is mainly composed of an input layer, hidden layers, and an output layer.

Support Vector Machine (SVM) is a classic supervised learning algorithm used to solve binary and multi-class classification problems. Its core idea is to find an optimal hyperplane in the feature space for classification, with the maximum margin. SVM can perform linear or non-linear classification, regression, and even outlier detection tasks.

Light Gradient Boosting Machine (LightGBM) is a tree-based ensemble learning method that employs gradient boosting technology. It combines multiple weak learners into a powerful model. It is a gradient boosting machine algorithm used to solve classification and regression problems.

Decision Tree (DT) is a model that presents decision rules and classification results in a tree-shaped data structure. As an inductive learning algorithm, its focus is to transform seemingly disordered and chaotic known data into a tree-shaped model that can predict unknown data through certain technical means. Each path from the root node (the attribute that contributes the most to the final classification result) to the leaf node (the final classification result) represents a decision rule.

Random Forest (RF) is an ensemble learning algorithm based on multiple decision trees. It can be used not only for classification problems but also for regression problems. Randomness is introduced during the construction process of the RF, which helps to reduce overfitting.

Extreme Gradient Boosting (XGBoost) is a kind of composite algorithm formed by combining basis functions and weights, which has a good fitting effect on data. Different from traditional gradient boosting decision trees, XGBoost adds a regularization term to the loss function. And since some loss functions are difficult to calculate the derivative, XGBoost uses the second-order Taylor expansion of the loss function as the fitting of the loss function.

The K-Nearest Neighbor (KNN) algorithm can be used for both classification and regression. KNN performs classification by measuring the distances between different feature values. For any n-dimensional input vector in the KNN algorithm, which corresponds to a point in the feature space, the output is the corresponding category label or predicted value of the feature vector.

The Cox Proportional Hazards (Coxph) is used to analyze the impact of covariates on survival time in survival data. Based on a semi-parametric method, this model does not require specific assumptions about the hazard function. Instead, it uses the hazard ratio (HR) to compare the risks among different covariate groups.

Lasso regression is mainly designed to address the problems that traditional linear regression encounters when dealing with high-dimensional data. In a high-dimensional space, traditional Ordinary Least Squares (OLS) regression may face issues such as difficulties in variable selection and model overfitting. By introducing a tuning parameter (λ), Lasso penalizes the absolute values of the coefficients, compelling some unimportant coefficient values to become zero.

Random Survival Forest (RSF) is a survival analysis method based on the random forest. It constructs a large number of survival trees and, in the form of voting, weighted elects the final prediction result from individual trees.

Statistical Analysis

R 4.4.3 software was used for data description and statistical analysis. Normally distributed quantitative data were shown as mean [standard deviation, (SD)] and analyzed by Student’s t- test. Non-normal quantitative data were presented as median (interquartile range, [IQR]) and analyzed by Mann–Whitney U-test. Categorical variables were described as frequency (%) and analyzed by chi-square test. The Boruta algorithm was used to screen important features. Variables with P<0.01 were included in the prediction and diagnostic models. To explore the impact of surgery, chemotherapy, and radiotherapy on OS in OC patients, age, histological type, grade, surgery (matched when analyzing chemotherapy and radiotherapy), radiotherapy (matched when analyzing chemotherapy and surgery), and chemotherapy (matched when analyzing radiotherapy and surgery) were subjected to propensity score matching (PSM), followed by Kaplan-Meier survival analysis. Difference was considered statistically significant when the P<0.05.

Results

Population Features

A total of 27,065 OC patients were included in this study. Among them, 1,053 patients developed liver metastasis at diagnosis, and 26,012 patients did not liver metastasis at diagnosis. Table 1 showed the demographic and clinical characteristics of OC patients with and without liver metastasis at diagnosis. Compared with OC patients without liver metastasis at diagnosis, OC patients with liver metastasis at diagnosis were older (P<0.001), were more likely to be black (P=0.046), were more likely to be single marital status (P=0.031), had bilateral tumors (P<0.001), presented with a more advanced T stage (P<0.001), presented with a more advanced N stage (P<0.001), had a higher pathological grade (P<0.001), had more serous histology (P<0.001), had more CA125 positive (P<0.001).

Table 1 Demographic and Clinicopathological Characteristics of All Included Patients

Risk and Prognostic Factors for OC Liver Metastasis at Diagnosis

Feature selection was carried out using the Boruta algorithm, and the results were shown in Figure 2. It was observed that histological type, T stage, grade, age, N stage, CA125, laterality, and race were associated with liver metastasis at diagnosis in OC patients. In the meantime, histological type, chemotherapy, surgery, radiotherapy, lung metastasis, bone metastasis, age, grade and marital status with OS in OC liver metastasis patients.

Two graphs showing feature importance by Boruta for diagnostic and prognostic models.

Figure 2 Feature Importance by Boruta. (A) Diagnostic mode. (B) Prognostic model. Green boxplots represent features with high importance that passed statistical significance testing. Red boxplots indicate low-importance features that failed to meet statistical significance. Blue boxplots represent shadow features.

Predictive Performance of the ML Models for Diagnosis and Prognosis

Diagnosis Model

Figure 3 showed the receiver operating characteristic (ROC) curves of nine ML models used for predicting the training cohort and the validation cohort. In the training cohort, the AUC value of KNN was the highest, which was 0.863 (Table 2). In the validation cohort, the AUC value of Ridge was the highest, reaching 0.758 (Table 3). However, given the uneven distribution of positive and negative events in the dataset, the AUC alone is insufficient to explain the performance of the model. Therefore, accuracy, kappa, sensitivity, specificity, positive predictive value, and negative predictive value were adopted to make up for the deficiencies of the ROC curve, so as to further evaluate the advantages and disadvantages of the model. Among the 9 ML models, Ridge regression showed comparatively favorable comprehensive performance. In the validation cohort set, the AUC (0.758), sensitivity (0.929), and negative predictive value (0.994) were the highest (Table 3).

Table 2 Performance Metrics of Machine Learning Algorithms in Diagnosis Model on the Training Cohort

Table 3 Performance Metrics of Machine Learning Algorithms in Diagnosis Model on the Validation Cohort

Two ROC curve graphs comparing machine learning models' sensitivity and specificity.

Figure 3 The ROC curves of diagnostic models based on machine learning. (A) Training cohort. (B) Validation cohort.

Abbreviation: ROC, receiver operating characteristic.

Prognostic Model

Five ML algorithms, namely Coxph, LightGBM, Lasso, RSF, and XGBoost, were used for survival analysis. Those models evaluated through AUC and Brier score. Among five models, RSF showed the best comprehensive performance. The ROC curves of five ML models in the training cohort and the validation cohort at 12-months OS, 36 months OS, and 60 months OS, were shown in Table 4. At the 12-month OS, the AUC of the RSF model in the training cohort was the highest, reaching 0.876, and similar results were found in the validation cohort. At the 36-month OS, the AUC value of the XGBoost model in the training cohort was the highest, reaching 0.814, and the highest AUC value of the RSF model in the validation cohort was 0.720. At the 60-month OS, the AUC value of the XGBoost model in the training cohort was the highest, reaching 0.827, and the highest AUC value of the Coxph model in the validation cohort was 0.689 (Figure 4 and Table 4). Overall, the AUC values of most models generally showed a decrease over time, both in the training cohort and the validation cohort (Figure 5) and Table 5 showed the results of the Brier score. Regardless of whether it was at 12-month OS, 36-month OS or 60-month OS, the Brier score of the XGBoost in the training cohort was the lowest, which were 0.095, 0.173 and 0.154 respectively. In the validation cohort, the Brier score of the RSF was the lowest, which were 0.124, 0.215 and 0.187 respectively. In terms of the trend, the Brier scores of each model first increase and then decrease over time (Figure 6).

Table 4 Time-Dependent Area Under Curve of Machine Learning-Based Survival Models

Table 5 Time-Dependent Brier Score of Machine Learning-Based Survival Model

Three ROC curve graphs for machine learning models at different time points: 12, 36 and 60 months.

Figure 4 The time- dependent ROC curves of prognostic models based on machine learning in the validation cohort. (A) 12-month OS, (B) 36-month OS, (C) 60-month OS.

Abbreviations: ROC, receiver operating characteristic, OS, overall survival.

Two line graphs showing AUC over time for five ML models: Coxph, Lasso, LightGBM, RSF and XGBoost.

Figure 5 The time-dependent AUC of prognostic models based on machine learning. (A) Training cohort. (B) Validation cohort.

Abbreviation: AUC, area under the curve.

Brier scores over time for models: Coxph, Lasso, LightGBM, RSF, XGBoost shown in two line graphs.

Figure 6 The time-dependent Brier score of prognostic models based on machine learning. (A) Training cohort. (B) Validation cohort.

Feature Importance in the Machine Learning Models

The Ridge regression with favorable performance was selected for the subsequent SHAP importance analysis. As can be seen from Figure 7A, among those factors, the stage T was the most discriminative features for diagnosing liver metastasis at diagnosis, followed by histological type, stage N, and grade. Considering the performance of AUC and Brier score in both the training cohort and the validation cohort, the RSF model showed the best comprehensive performance. As shown in Figure 7B, for the OS at 12 months, chemotherapy was the most discriminative feature for prognosis, followed by histological type, surgery and age. For the OS at 36 months, histological type was the most discriminative feature for prognosis, followed by chemotherapy, surgery and age. For the OS at 60 months, age was the most discriminative feature for prognosis, followed by histological type, grade and surgery.

Two bar charts showing feature importance for liver metastasis diagnosis and prognosis over time.

Figure 7 Feature importance ranking. (A) Ridge regression. (B) Random Survival Forest.

Survival Analysis

The results of PSM were shown in Table 6–8. As depicted in Figure 8, compared with patients who did not receive surgery, those who underwent surgery had a significantly higher survival probability (P<0.0001). Similarly, chemotherapy recipients had a higher survival probability than those who did not receive chemotherapy (P<0.0001). However, patients who did not receive radiotherapy had a higher survival probability.

Table 6 Comparison of Patient Characteristics According to Chemotherapy Before and After Propensity Score Matching

Table 7 Comparison of Patient Characteristics According to Surgery Treatment Before and After Propensity Score Matching

Table 8 Comparison of Patient Characteristics According to Radiotherapy Before and After Propensity Score Matching

Three survival probability graphs showing effects of chemotherapy, surgery and radiotherapy over time.

Figure 8 Kaplan-Meier (K-M) survival analysis after propensity score matching. (A) OS of patients with different surgical treatment; (B) OS of patients with/without chemotherapy; (C) OS of patients with/without radiotherapy.

Abbreviation: OS, overall survival.

Discussion

In this study, multiple ML models were used based on the SEER database to establish diagnostic models and prognostic models for liver metastasis at diagnosis of OC patients and the importance of features in the models was explained. This study found that Ridge regression showed favorable performance in the diagnostic model, while the RSF model demonstrated relatively better overall performance in the prognostic model.

With the rapid development of artificial intelligence technology, ML is being applied more and more in the medical field. Zhou et al used six ML models to predict the recurrence of OC and it was found that ML algorithms outperformed traditional logistic regression analysis, and XGBoost demonstrated the best performance in predicting the recurrence of OC, with an accuracy rate of 0.95.16 Feng et al established a diagnostic model for OC and found that the AUC curve of the diagnostic model reached 0.948, at the same time, its diagnostic efficiency is significantly superior to the traditional method of detecting CA125 alone.17 A SEER-based study has found that the RSF performs better in predicting the 1-year, 3-year, and 5-year survival rates of epithelial ovarian cancer, with the AUC of the 1-year, 3-year, and 5-year survival rates being 0.926, 0.748, and 0.836 respectively.18 The RSF algorithm is a popular integrated ML tool, which is widely applied to clinical decision support and prognosis prediction tasks.19,20 In this study, RSF also demonstrated the relatively favorable predictive performance among the five ML models. The model has a high predictive value, providing clinicians with more accurate predictions to inform clinical decisions.

In this study, the incidence of liver metastasis in patients with OC was 3.89%. Yuan et al analyzed the SEER data of OC patients from 2010 to 2014 and found that 3.61% of OC patients had liver metastasis.11 Another study based on the SEER database identified 1,744 patients with liver metastasis of ovarian cancer from 2010 to 2016, accounting for 6.7% of all OC patients.10 In another study, it was found that 5.67% of OC patients had liver metastasis.21 The occurrence of the above situation may be related to the inclusion and exclusion criteria.

In this study, the Boruta algorithm was used to screen out eight factors related to liver metastasis at diagnosis, including histological type, stage T, grade, age, stage N, CA125, laterality, and race. According to the importance order of the SHAP diagram, the features with prominent contributions were stage T, stage N, histological type, and grade. The study showed that the increase in the stage T and stage N of malignant tumors indicated an increase in tumor volume, as well as an expansion in the extent of involvement of adjacent tissues and lymph nodes, which was indicative of the further progression of malignant tumors.22 The stage T and stage N were found to be associated with liver metastasis in patients with various cancers. In patients with primary epithelial ovarian cancer, the stage T and stage N were also found to be associated with the occurrence of liver metastasis.23 It was found that the stage T, and stage N were risk factors for liver metastasis in gallbladder cancer patients.24 The main histological types of ovarian cancer include serous carcinoma, endometrioid carcinoma, clear cell carcinoma, mucinous carcinoma, etc.25 High-grade serous carcinoma is the most common type of ovarian cancer, usually with a high degree of invasiveness and metastatic potential, and it is more likely to develop liver metastasis.26,27 The grade of tumors is usually determined based on the degree of differentiation of tumor cells. High-grade ovarian cancer often has a higher frequency of gene mutations and more complex genomic alterations.28,29 These alterations may enable cancer cells to acquire stronger abilities of migration, invasion, and angiogenesis, thus promoting the occurrence of liver metastasis.30 Besides, it was found that chemotherapy was the most crucial factor for the 12-month OS of OC patients with liver metastasis. For OC patients with liver metastasis, chemotherapy is one of the important treatment methods.31 By using chemotherapy drugs, such as cisplatin, it can effectively kill cancer cells, control tumor growth, and prolong the survival period of OC patients with liver metastasis.32 In this study, surgery was associated with improved OS. Multiple studies have found that surgery is beneficial for the survival of OC patients.33,34 However, in this study, radiotherapy was associated with OS, which may be related to the small sample size of patients who received radiotherapy.

Our study has several limitations. First, it is important to clarify that our diagnostic model addresses the presence of liver metastasis at the time of initial diagnosis, not the future development of liver metastasis. Second, the SEER database only includes cancer information from the U.S. population, which inevitably introduces selection bias. Data from other countries are needed to validate the generalizability of our findings. Third, the SEER database lacks detailed information on certain clinical variables, including routine blood parameters (eg., white blood cell count, red blood cell distribution width) and comorbidities. Moreover, the granularity of available variables is limited., which may affect the feature importance and predictive performance of our model, Future studies with more rigorous temporal ordering of predictors. Fourth, due to the absence of key biological data (such as biomarkers, genetic or molecular profiles), we could not fully account for underlying biological mechanisms that might influence the outcomes. Fifth, as a retrospective study, our analysis is subject to inherent biases, including information bias and unmeasured confounding. Prospective studies with more comprehensive data collection and randomized designs are needed to confirm our results. Finally, we acknowledge that the clinical utility of our models is currently limited by the lack of external validation and the modest predictive performance observed in the validation cohort.

Conclusion

Histological type, stage T, grade, age, stage N, CA125, laterality, and race were associated with liver metastasis at diagnosis in OC patients. Meanwhile, marital status, grade, age, bone metastasis, lung metastasis, radiotherapy, surgery, chemotherapy, and histological type were associated with OS in OC patients with liver metastasis at diagnosis. The Ridge regression and RSF had favorable predictive performance in the diagnostic and prognostic models, respectively, compared with the other tested models. These findings may provide supportive evidence to assist clinicians in identifying OC patients at risk of liver metastasis at diagnosis and facilitate individualized clinical decision-making.

Data Sharing Statement

The datasets generated during and/or analysed during the current study are available in the SEER repository, https://seer.cancer.gov/.

Ethics Approval and Informed Consent

This study was approved by the Ethics Committee of Wuhan Fourth Hospital.The requirement for informed consent was waived. All procedures were conducted in accordance with the principles of the Declaration of Helsinki.

Acknowledgments

We are grateful for the SEER database for the freely accessible data for research analysis.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Disclosure

The authors declare that they have no competing interests.

References

1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–15. doi:10.3322/caac.21834

2. Tavares V, Marques IS, Melo IG, Assis J, Pereira D, Medeiros R. Paradigm shift: a comprehensive review of ovarian cancer management in an era of advancements. Int J Mol Sci. 25(3). doi:10.3390/ijms25031845

3. Szczerba A, Śliwa A, Pieta PP, Jankowska A. The role of circulating tumor cells in ovarian cancer dissemination. Cancers. 2022;14(24). doi:10.3390/cancers14246030

4. Ghirardi V, Fagotti A, Ansaloni L, et al. Diagnostic and therapeutic pathway of advanced ovarian cancer with peritoneal metastases. Cancers. 15(2). doi:10.3390/cancers15020407

5. Gardner AB, Charo LM, Mann AK, Kapp DS, Eskander RN, Chan JK. Ovarian, uterine, and cervical cancer patients with distant metastases at diagnosis: most common locations and outcomes. Clin Exp Metastasis. 2020;37(1):107–113. doi:10.1007/s10585-019-10007-0

6. Deng K, Yang C, Tan Q, et al. Sites of distant metastases and overall survival in ovarian cancer: a study of 1481 patients. Gynecologic Oncol. 2018;150(3):460–465. doi:10.1016/j.ygyno.2018.06.022

7. Shan Y, Jin Y, Pan L. Hepatic metastases in ovarian cancer. Hepatobiliary Surg Nutr. 2022;11(6):924–926. doi:10.21037/hbsn-22-484

8. Tokunaga H, Shimada M, Ishikawa M, Yaegashi N. TNM classification of gynaecological malignant tumours, eighth edition: changes between the seventh and eighth editions. pn. J. Clin. Oncol. 2019;49(4):311–320. doi:10.1093/jjco/hyy206

9. Breen J, Allen K, Zucker K, Godson L, Orsi NM, Ravikumar N. A comprehensive evaluation of histopathology foundation models for ovarian cancer subtype classification. NPJ Precis. Oncol. 9(1):33. doi:10.1038/s41698-025-00799-8

10. Zhao H, Xu F, Li J, Ni M, Wu X. A population-based study on liver metastases in women with newly diagnosed ovarian cancer. Front Oncol. 2020;10:571671. doi:10.3389/fonc.2020.571671

11. Yuan Y, Wang R, Guo F, et al. A clinical model to predict the risk of liver metastases in newly diagnosed ovarian cancer: a population-based study. Transl Cancer Res. 2020;9(11):7044–7053. doi:10.21037/tcr-20-2321

12. Grimley PM, Liu Z, Darcy KM, et al. A prognostic system for epithelial ovarian carcinomas using machine learning. Acta Obstet. Gynecol. Scand. 2021;100(8):1511–1519. doi:10.1111/aogs.14137

13. Hou GM, Jiang C, Du JP, et al. Nomogram models for predicting risk and prognosis of newly diagnosed ovarian cancer patients with liver metastases - a large population-based real-world study. J Cancer. 2021;12(24):7255–7265. doi:10.7150/jca.64255

14. Ghadirinejad K, Milimonfared R, Taylor M, et al. Supervised machine learning for the prediction of post-operative clinical outcomes of Hip and knee replacements: a review. ANZ J. Surg. 2024;94(7–8):1228–1233. doi:10.1111/ans.19003

15. Wang Z, Xu C, Liu W, et al. A clinical prediction model for predicting the risk of liver metastasis from renal cell carcinoma based on machine learning. Front Endocrinol. 2022;13:1083569. doi:10.3389/fendo.2022.1083569

16. Zhou L, Hong H, Chu F, Chen X, Wang C. Predicting the Recurrence of Ovarian Cancer Based on Machine Learning. Cancer Manage Res. 2024;16:1375–1387. doi:10.2147/cmar.S482837

17. Feng Y. An integrated machine learning-based model for joint diagnosis of ovarian cancer with multiple test indicators. Jovarian Res. 2024;17(1):45. doi:10.1186/s13048-024-01365-9

18. Wei L, Chen G, Liang H, Li L. Random survival forest model in patients with epithelial ovarian cancer: a study based on SEER database and single center data. Am. J. Cancer Res. 2025;15(2):769–780. doi:10.62347/pldh8547

19. Chen Y, Li G, Jiang W, et al. Prognostic risk factor of major salivary gland carcinomas and survival prediction model based on random survival forests. Cancer Med. 2023;12(9):10899–10907. doi:10.1002/cam4.5801

20. Zhang L, Huang T, Xu F, et al. Prediction of prognosis in elderly patients with sepsis based on machine learning (random survival forest). BMC Emerg Med. 2022;22(1):26. doi:10.1186/s12873-022-00582-z

21. Jiang F, Yao C. Nomograms for predicting risk and prognosis of liver metastases in ovarian cancer patients. J Gynecol Obstet Hum Reprod. 2025;54(4):102918. doi:10.1016/j.jogoh.2025.102918

22. Zhong X, Lin Y, Zhang W, Bi Q. Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning. Sci Rep. 2023;13(1):18301. doi:10.1038/s41598-023-45438-z

23. Hussain I, Xu J, Deng K, et al. The Prevalence and associated Factors for Liver Metastases, Development and Prognosis in newly diagnosed Epithelial Ovarian Cancer: a large Population-Based Study from the SEER Database. J Cancer. 2020;11(16):4861–4869. doi:10.7150/jca.40590

24. Fang C, Li W, Wang Q, et al. Risk factors and prognosis of liver metastasis in gallbladder cancer patients: a SEER-based study. Front Surg. 2022;9:899896. doi:10.3389/fsurg.2022.899896

25. Nisha Singla SN, Tiwana KK, Gupta P. Clinicopathological spectrum of surface epithelial ovarian carcinoma and its association with serum ca-125 levels: a cohort study. J. Clin. Diagn. Res. 2021;15(6):EC28–EC31. doi:10.7860/jcdr/2021/48492.15058

26. Yang H, Gu X, Fan R, et al. Deciphering tumor immune microenvironment differences between high-grade serous and endometrioid ovarian cancer to investigate their potential in indicating immunotherapy response. Jovarian Res. 2023;16(1):223. doi:10.1186/s13048-023-01284-1

27. Wang Y, Xie H, Chang X, et al. Single-cell dissection of the multiomic landscape of high-grade serous ovarian cancer. Cancer Res. 2022;82(21):3903–3916. doi:10.1158/0008-5472.Can-21-3819

28. Wallbillich JJ, Morris RT, Ali-Fehmi R. Comparing mutation frequencies for homologous recombination genes in uterine serous and high-grade serous ovarian carcinomas: a case for homologous recombination deficiency testing in uterine serous carcinoma. Gynecologic Oncol. 2020;159(2):381–386. doi:10.1016/j.ygyno.2020.08.012

29. McCool KW, Freeman ZT, Zhai Y, et al. Murine oviductal high-grade serous carcinomas mirror the genomic alterations, gene expression profiles, and immune microenvironment of their human counterparts. Cancer Res. 2020;80(4):877–889. doi:10.1158/0008-5472.Can-19-2558

30. Novikov NM, Zolotaryova SY, Gautreau AM, Denisov EV. Mutational drivers of cancer cell migration and invasion. Br. J. Cancer. 2021;124(1):102–114. doi:10.1038/s41416-020-01149-0

31. Li N, Jin S, Wu J, Ji H, Du C, Liu B. Effect of different treatment modalities on ovarian cancer patients with liver metastases: a retrospective cohort study based on SEER. PLoS One. 2024;19(4):e0299504. doi:10.1371/journal.pone.0299504

32. Jiang C, Shen C, Ni M, et al. Molecular mechanisms of cisplatin resistance in ovarian cancer. Genes Dis. 2024;11(6):101063. doi:10.1016/j.gendis.2023.06.032

33. Zhou Y, Wang A, Sun X, Zhang R, Zhao L. Survival prognosis model for elderly women with epithelial ovarian cancer based on the SEER database. Front Oncol. 2023;13:1257615. doi:10.3389/fonc.2023.1257615

34. Zhao H, Zhang Y, Zhu Q. Long-term trends analysis of the incidence and mortality in patients with ovarian cancer: a large sample study based on SEER database. Postgrad. Med. J. 2025;101(1194):302–312. doi:10.1093/postmj/qgae143

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.