Back to Journals » Clinical Epidemiology » Volume 17

A Claims-Based Algorithm for Identifying Hidradenitis Suppurativa Severity

Authors Schneeweiss MC, Anand P ORCID logo, Mostaghimi A, Landon J, Shay D, Davies OMT, Kumar AM, Shang A, Tran T, Glynn RJ, Lin KJ ORCID logo, Wyss R ORCID logo

Received 18 June 2025

Accepted for publication 6 October 2025

Published 7 November 2025 Volume 2025:17 Pages 935—944

DOI https://doi.org/10.2147/CLEP.S547935

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 3

Editor who approved publication: Professor H Sorensen



Maria C Schneeweiss,1– 3 Priyanka Anand,1,2 Arash Mostaghimi,2,3 Joan Landon,1,2 Denys Shay,1,4 Olivia MT Davies,2,3 Anusha Mohan Kumar,2,3 Aijing Shang,5 Tanja Tran,6 Robert J Glynn,1,2 Kueiyu Joshua Lin,1,2,7,8 Richard Wyss1,2

1Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA; 2Harvard Medical School, Boston, MA, USA; 3Department of Dermatology, Brigham and Women’s Hospital, Boston, MA, USA; 4Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA; 5UCB, Basel, Switzerland; 6UCB, Brussels, Belgium; 7Clinical Phenotyping and Outcome Validation Program, Mass General Brigham Center for Integrated Healthcare Data Research, Boston, MA, USA; 8Division of General Internal Medicine, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Correspondence: Maria C Schneeweiss, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 1620 Tremont St. Suite 3030, Boston, MA, 02120, USA, Tel +1 617 278-0930, Fax +1 617 232-8602, Email [email protected]

Purpose: Information on severity of hidradenitis suppurativa (HS) is not available in administrative claims databases. Accurately identifying HS severity in claims data is important for identifying treatment effect modification by severity. We sought to develop and validate a claims-based algorithm to identify patients with mild, moderate, or severe HS.
Methods: Mass General Brigham (MGB) electronic health records (EHR) were linked to Medicaid claims data in the US from October 2016 through December 2019 to identify 350 patients aged 10 years and older with an ICD-10 diagnosis code for HS (L73.1). Chart review determined HS severity. A multinomial LASSO regression within a 70% training sample determined the most influential claims-based variables out of 30 candidates associated with mild, moderate, or severe HS. This model was used to calculate the positive predictive values (PPVs) for each level of HS within the hold-out testing sample.
Results: The study cohort was predominantly female (81%) aged 18– 45 years (74%) with 26% White and 21% Black patients. We identified 72 patients with mild/uncertain HS, 173 with moderate HS, and 105 with severe HS. One ICD-10 diagnosis of HS had a PPV of 89%, which was further improved to 100% when also requiring the concurrent use of a systemic medication for HS. The PPV was 20% for mild/uncertain, 54% for moderate and 67% for severe HS. When combining severity into mild/moderate versus severe the PPV was 71%, indicating that among those classified as severe, 71% were truly severe.
Conclusion: The claims-based algorithm has a reasonable performance in identifying severe HS but had limitations distinguishing moderate and mild HS. The algorithm performed best at distinguishing severity when combining mild and moderate versus severe HS.

Keywords: claims data, validation, hidradenitis suppurativa, severity, ICD-10, algorithm, medicaid

Introduction

Hidradenitis suppurativa (HS) is a chronic inflammatory skin disease characterized by follicular occlusion leading to nodule and abscess formation that can progress to sinus tracts and subsequent scarring.1 HS has a significant disease burden and high impact on quality of life, especially for severe recurrent forms, yet claims data do not directly capture information on HS disease severity.1–3 Current treatment guidelines are based on Hurley stage severity (mild HS is Hurley stage I, moderate HS is Hurley stage II, and severe HS is Hurley stage III) determined by lesion morphology, physical exam findings and disease recurrence.4

Secondary healthcare data, including claims data, can provide valuable evidence particularly on disease epidemiology and the safety of systemic immunomodulating agents.5 However, clinical information that would help to inform important sub-populations of interest, such as sub-groups based on HS severity (mild, moderate, severe), and effect modification by severity is not available in these data sources. To be able to generate evidence based on these subgroups and understand effect modification by disease severity in large claims datasets, it is critical to develop algorithms that can accurately identify patients with HS, and group HS patients by their severity using claims data.

We sought to develop a claims-based model for classifying patients by HS severity in claims data.

The abstract of this paper was presented as a poster at the 2025 Annual Meeting of the American Academy of Dermatology with interim findings [Journal of the American Academy of dermatology, https://eposters.aad.org/abstracts/63547].

Methods

Data Source

The study used Medicaid claims data from October 1, 2015, to December 31, 2019 deterministically linked with electronic health records (EHR) of Mass General Brigham (MGB) with a linkage success rate of 98.5%.6 The MGB network consists of 2 tertiary care hospitals, 3 community hospitals, and >35 primary care centers. All EHR were fully accessible electronically and included the complete documentation of all patient-physician interactions consisting of structured fields, laboratory test results, and free text clinical notes from medical, surgical, pathological, radiological, and pharmacy services from both inpatient and outpatient settings. The linked data include information on patient demographics (age, sex, self-reported race and ethnicity from both Medicaid claims and EHR data, location), hospital admissions, emergency room visits, outpatient visits, and outpatient surgical visits for all Medicaid beneficiaries. Diagnoses are coded using the International Classification of Diseases-10 Clinical Modification (ICD-10-CM) system and procedures using the Current Procedural Terminology (CPT-4).7,8 Data contain a longitudinal record of outpatient medication prescribing (EHR) and dispensing (claims), with information on the drug dispensed coded by the National Drug Code (NDC), and the strength, quantity dispensed, and the days the supply of drug is anticipated to last. All records are associated with a date of service so that a longitudinal timeline can be established for all patients.

Informed consent was not required as all patient information was de-identified. The IRB of the Brigham and Women’s Hospital approved this study (Protocol #: 2023P000623) and the work was conducted in the secure environment of the MGB Center for Integrated Healthcare Data Research and its Clinical Phenotyping and Outcome Validation Program.

Study Population

We identified all patients who had at least one ICD-10-CM code for HS (L73.2) in Medicaid claims between October 1, 2016, to December 31, 2019 (Figure 1). The date of this diagnosis code was considered the index date. If a patient had more than one HS diagnosis code during the study period, the date of one of those codes was randomly selected as the index date to include patients at varying stages of disease duration and severity. To ensure that this index HS diagnosis code originated from MGB providers, we required that patients also have an MGB visit with an ICD-10-CM for HS within 30 days before or after the index date. Further, we excluded patients <10 years of age and required at least one year of continuous insurance enrollment (with an allowable gap of up to 30 days) before the index date (baseline period).

Figure 1 Flowchart for study population.

Abbreviations: HS, hidradenitis suppurativa; MGB, Mass General Brigham; N, number remaining at each step.

The study size needed to estimate a positive predictive value (PPV) of 85% with 5% uncertainty was calculated to be 196.9,10 To increase the chances of having sufficient patients in the “mild” category and to plan for a 30% hold-out sample, we randomly sampled 350 patients as the final study population. To ensure the MGB population for analysis was representative of the full Medicaid population, characteristics were compared between those sampled and all other patients. Patient characteristics showed reasonable agreement between both groups. The MGB patient population had slightly more prior HS diagnoses, more systemic medication use, and more dermatologist visits compared to the general Medicaid patient population. This is in line with slightly more severe patients being referred to a tertiary care center with specialty HS clinics and is reflected in our increased number of severe patients with fewer mild patients seen at MGB. (Table S1).

EHR Review and Establishment of the Reference Standard

A detailed EHR review included all structured data and free text information from medical, surgical, pathological, pharmacy, and laboratory service or care from both inpatient and outpatient settings of encounters from one year before to two weeks after the index date. First, EHR charts were reviewed to determine if a patient with a claims code for HS (ICD-10 L73.2) truly had HS. Next, HS stage was assigned as mild, moderate or severe based on Hurley staging by clinician, physician global assessment (PGA), patient global assessment (PtGA), medications, surgical interventions, key words recorded by the treating clinician and physical exam findings based on EHR review by at least two physicians from the study team (Table S2). If there was differing information, more weight was given to the higher severity. For example, if a patient had physical exam findings of mild HS (a single lesion, without sinus tracts), but had received treatments for moderate HS (oral targeted antibiotics), they were classified as having moderate HS. In cases where the EHR-based HS severity remained unclear, a dermatology-pharmacoepidemiology faculty member reviewed all available data to make the final determination.

Validation of ICD-10-CM Code L73.2

Three algorithms for identifying HS were developed in claims data using a combination of ICD-10-CM diagnosis code L73.2 (diagnosis made by any provider), dermatologist visit (diagnosis made by a dermatologist), and use of biologic agents. Performance of the algorithms was quantified in terms of positive predictive value (PPV).

Development of a Claims-Based Model to Classify HS Severity

All patients who were determined to not have HS based on the EHR review but had a claims data code of HS were classified as uncertain and grouped into a class of mild/uncertain HS.

The study cohort was split into a training data set (70%, n=245) set and a hold-out testing set (30%, n=105) set. All potential predictors of HS severity were measured during the baseline period in the claims data and included demographic characteristics (age, sex, race), healthcare utilization, baseline comorbidities, medications, and procedures (Table 1). Medication use data included in the prediction model were based on pharmacy dispensing of those drugs which increased the likelihood that patients started them. We did not study the prescribing of drugs that did not lead to a fill. The variables were entered into a multinomial least absolute shrinkage and selection operator (LASSO) model to select features predictive of HS severity by selecting the lambda tuning parameter (degree of regularization) that optimizes 20-fold cross-validated predictive performance within the training dataset.

Table 1 Baseline Characteristics for Patients with at Least One ICD-10 Diagnosis Code for HS Grouped by HS Severity

The multinomial regression identified three separate sets of linear predictors for each level of severity and the predicted probabilities of each patient falling into the three levels were calculated. The model with the best cross-validated performance within the training set was then applied to the testing set to assign predicted probabilities of each severity level for each patient. The severity level for which the patient had the highest predicted probability was assigned to that patient as the predicted HS severity level. For example, if the predicted probability for being mild, moderate, or severe HS was estimated to be 0.2, 0.3, and 0.5, respectively, the patient was classified as having severe HS. Performance of the model, in terms of c-statistic and PPV within the hold-out testing set (n=105), was calculated based on the assigned predicted HS severity.

Sensitivity Analyses

Performance of models selected using ordinal LASSO and binomial (mild vs moderate or severe, and mild or moderate vs severe) regression were also tested. Additional analyses were conducted among those with the Hurley stage explicitly recorded in their EHR, as well as excluding those patients with uncertain HS.

Results

We identified 350 patients aged 10 years and older with at least one ICD-10-CM code for HS in both Medicaid and the MGB medical records after applying all exclusions (Figure 1). Of the 350 patients identified with HS by a single ICD-10 claims code 312 (89.1%) were confirmed by our expert review as having HS. The diagnosis was uncertain in 38 (10.9%) patients.

Among the 350 identified HS cases, 72 (20.6%) had mild/uncertain HS, 173 (49.4%) had moderate HS and 105 (30.0%) had severe HS (Table 1). Most patients were aged 18–45 years (74.4%) with fewer aged 45 years or older (18.6%) or aged 10–17 years (7.4%). The cohort was predominantly female (80%). Black race was most prevalent in patients with severe HS (29.5% black and 21.0% white) with 49.5% of severe patients not reporting their race. Among treatments, topical treatment was evenly distributed among mild (33.3%), moderate (33.5%) and severe HS (43.8%). Systemic antibiotic use was more frequent in severe (52.4%) and moderate (38.2%), compared to mild HS (25.0%). Isotretinoin and acitretin use was uncommon and almost only used for severe HS. Any biologic use was highest in severe HS (14.3%), with a limited number of users in the moderate HS group (N<11). Generic incision and drainage procedures were frequently observed in moderate HS (18.5%), while HS specific surgeries were exclusively recorded in patients with severe HS (10.5%). Comorbid IBD occurred in 3.7% of the 350 patients, with most cases being among those with severe HS (Table 1). Overall, the Gagne combined comorbidity score (CCS) was low and a majority of patients had a CCS <3. 15.3% of patients with mild HS had CCS of 3 or more compared to 10.4% and 11.4% among those with moderate or severe HS.

HS Claims Data Code Validation

Among 350 patients with at least one ICD-10 diagnosis code for HS in Medicaid claims data, 312 had HS confirmed in the MGB chart review resulting in a PPV of 89%. 130 patients had a single diagnostic code for HS made by a dermatologist resulting in a PPV of 96%, and 20 patients had a single code for HS followed by biologic treatment resulting in a PPV of 100% (Table 2).

Table 2 Positive Predictive Values for Identifying HS in Medicaid Claims Data

HS Severity Model and Its Validation

Using a multinomial LASSO model, we identified pre-index date predictors during the baseline period associated with each severity level of HS (Table 3). The strongest claims data-based predictors for severe HS included isotretinoin or acitretin use, HS specific surgery, any biologic use, comorbid acne, adalimumab use, 3 or more prior HS diagnoses (inpatient or outpatient). Having no prior diagnosis of HS was inversely associated with severe HS. Moderate HS predictors included dermatologist-diagnosed HS and having 3 or more dermatologist visits. The strongest predictors for mild HS included young age (10–17 years) and no prior HS diagnoses (inpatient or outpatient). Having an incision and drainage or at least 3 prior HS diagnoses was inversely associated with having mild HS (Table 3). CCS was not associated with HS severity.

Table 3 Relevant Pre-Index-Visit Patient Characteristics Identified by the Multinomial LASSO Model for Mild, Moderate, and Severe HS

In this model the PPV for mild/uncertain (n=72) was 20%, for moderate (n=173) was 54% and for severe (n=105) was 67%. Specificity was 90% or higher for mild/uncertain (96%) and severe (90%) but only 31% for moderate disease (Table 4).

Table 4 Measurement Characteristics for the Multinomial LASSO Model Identifying Mild/Uncertain, Moderate, and Severe HS in Claims Data*

Sensitivity Analyses

In a sensitivity analysis, we combined severity into “mild/moderate” versus “severe” and achieved a PPV of 71%, meaning that among those classified as severe, 71% were truly severe. The NPV was 80%, meaning that among those classified as mild/moderate, 80% were truly mild/moderate (Table 4). Grouping “mild/uncertain” versus “moderate/severe” the PPV was 80% and the NPV was only 20% with a specificity of <10% based on very few mild/uncertain cases (Table S3).

After excluding 38 patients with uncertain HS, the PPV was 1% for mild (n=34), 62% for moderate, and 65% for severe (Table S4). Limiting to only 95 patients who had their HS severity recorded by Hurley stage in the EHR free text notes resulted in a sample of 39 patients with mild, 36 with moderate and 20 with severe HS. The resulting PPV was 60% for mild, 53% for moderate, and 50% for severe (Table S5). The predictors selected for severity when defined by Hurley stage recorded in the chart were very similar to those chosen with our assessment tool. While medications were used as part of the definition for severity in our severity assessment measure, these medications were still chosen as predictors when only relying on recorded Hurley stage (Table S6).

Using an ordinal regression model instead of a multinomial resulted in similar PPVs: 20% for mild/uncertain, 53% for moderate, and 61% for severe (Table S7).

Discussion

In this US-based validation study, our claims-based model to classify HS patients by severity demonstrated reasonable performance in identifying severe HS, but had limitations in distinguishing between moderate and mild HS. In part, this was due to the very small number of mild cases. For researchers studying the safety and effectiveness of HS treatments, grouping mild and moderate HS together versus severe HS resulted in the best performance with a PPV of 71%. For researchers aiming to identify HS populations in claims data, we found that in the absence of EHR data for confirmation, a single ICD-10 code for HS performed well, achieving a PPV of 89%.

This study is among the first to validate claims-based algorithms for identifying HS patients and to develop a model for categorizing HS severity in claims data. For researchers seeking to assess HS severity using claims data without EHR linkage, the high PPV of a single ICD-10 code supports the accurate identification of a true HS population. We further excluded “uncertain HS” patients and found no meaningful difference, which further supports the generalizability of this algorithm for researchers applying the algorithm in claims when not having access to EHR for HS confirmation. Given the low prevalence of HS in the population, the study size of 350 patients makes it the largest validation study of this kind. From 30 claims-based patient characteristics used as candidates for predicting HS severity the multinomial LASSO model selected characteristics well in line with clinical observations for mild, moderate, and severe HS. For example, severe HS patients often require biologics and/or specific surgery and usually have frequent visits for managing their disease.

These results must be interpreted in the context of the study design. While the study is the largest of its kind, the study size of 350 is still small, which can limit the ability to accurately differentiate three severity categories. This may be a reason why the predictive performance metrics are less impressive for the mild HS category.11 One of the three categories, mild/uncertain HS, is substantially smaller than the other two, making it difficult to derive algorithms with robust performance. This is well illustrated in the sensitivity analyses. When categorizing mild/uncertain (n=72) versus moderate/severe HS (n=278), the PPV was 80% for correctly categorizing moderate/severe HS but only 20% for mild/uncertain HS. However, when looking at the Hurley stage analysis, the population was smaller in total (n=95), but more evenly distributed across all mild (n=39), moderate (n=36) and severe (n=20) resulting in a more even performance across the severity categories (60%, 53%, and 50%, respectively). Secondly, in clinical practice, HS is often diagnosed in its more advanced stage while milder forms may often be under diagnosed and preliminarily coded as cutaneous abscess, carbuncle/furuncle, severe acne, or follicular cyst.12 A recent study using administrative data from Germany found that 34.7% of patients with HS had at least one misdiagnosis before their first HS diagnosis.13 Additionally, MGB, an academic medical center, and its network of outpatient clinics may have a higher proportion of referral patients and therefore few patients with truly mild HS may be seen. Further, the findings are representative for the United States healthcare system. Finally, even in detailed EHR it is often not easy to determine the severity of HS, which is different from primary data collection for example in RCTs. The standard for HS severity assessment is the Hurley staging, yet this was only recorded in 27% of the EHR charts in clinical practice. Additionally, while ultrasound can be used to assess HS diagnosis and staging, international consensus on its use was not available at the time of the study and was not considered.14

In conclusion, a single diagnosis for HS can accurately capture the existence of HS in claims data. Patients with severe HS can be reasonably differentiated using our algorithm in claims data; however, it performed insufficiently to fully capture mild HS. This algorithm can accurately distinguish mild/moderate HS versus severe. A validation study with even larger HS populations, particularly with mild HS, would help to understand whether the low PPV for categorizing mild HS can be improved.

IRB Approval

The Brigham and Women’s Hospital’s institutional review board approved this study, and a signed data licensing agreement was in place.

Data Sharing Statement

Although data cannot be shared with third parties, reasonable requests for re-analysis of the data will be considered by the corresponding author.

Acknowledgments

We thank Costello Medical, UK, for publication coordination within UCB.

Funding

The study was funded by a research grant from UCB to the Brigham and Women’s Hospital.

Disclosure

Drs. Anand, Shay, and Davies have no financial disclosures to report.

Dr. M. Schneeweiss has received a research grant to the Brigham and Women’s Hospital from UCB pharma.

Dr. Mostaghimi reports personal fees from Pfizer, Digital Diagnostics, 3Derm, AbbVie, Sun Pharma, Equillium, ASLAN, Boehringer Ingelheim, ACOM, Olaplex, Legacy Healthcare, Pelage, Q32 Bio, Bioniz, Concert, Lilly, Hims, and participation in clinical trials with Incyte, Aclaris, Concert, and Lilly outside the submitted work.

Aijing Shang is an employee of UCB receiving UCB shares.

Dr. Kumar reports receiving personal fees (advisory board) from Boehringer Ingelheim during the conduct of the study.

Dr. Tran is an employee of UCB receiving stock and/or stock options.

Dr. Shang is an employee of UCB receiving stock and/or stock options.

Dr. Lin has received research grants from Takeda and AbbVie for studies unrelated to this work.

Dr. Glynn has received research support for unrelated work from grants to the Brigham & Women’s Hospital from Amarin, Kowa, Novartis, and Pfizer.

References

1. Jemec GB. Clinical practice. Hidradenitis suppurativa. N Engl J Med. 2012;366(2):158–164. doi:10.1056/NEJMcp1014163

2. Marvel J, Vlahiotis A, Sainski-Nguyen A, Willson T, Kimball A. Disease burden and cost of hidradenitis suppurativa: a retrospective examination of US administrative claims data. BMJ Open. 2019;9(9):e030579. doi:10.1136/bmjopen-2019-030579

3. McKenzie SA, Harview CL, Truong AK, et al. Physical symptoms and psychosocial problems associated with hidradenitis suppurativa: correlation with Hurley stage. Dermatol Online J. 2020;26(9). doi:10.5070/D3269050156

4. Alikhan A, Sayed C, Alavi A, et al. North American clinical management guidelines for hidradenitis suppurativa: a publication from the United States and Canadian hidradenitis suppurativa foundations: part II: topical, intralesional, and systemic medical management. J Am Acad Dermatol. 2019;81(1):91–101. doi:10.1016/j.jaad.2019.02.068

5. Schneeweiss S, Schneeweiss M. Concepts of designing and implementing pharmacoepidemiology studies on the safety of systemic treatments in dermatology practice. JID Innov. 2023;3(6):100226. doi:10.1016/j.xjidi.2023.100226

6. Lin KJ, Singer DE, Glynn RJ, Murphy SN, Lii J, Schneeweiss S. Identifying patients with high data completeness to improve validity of comparative effectiveness research in electronic health records data. Clin Pharmacol Ther. 2018;103(5):899–905. doi:10.1002/cpt.861

7. Fowles JB, Lawthers AG, Weiner JB, Garnick DW, Petrie DS, Palmer RH. Agreement between physicians’ office records and medicare part B claims data. Health Care Financ Rev. 1995;16(4):189–199.

8. Glynn RJ, Monane M, Gurwitz JH, Choodnovskiy I, Avorn J. Agreement between drug treatment data and a discharge diagnosis of diabetes mellitus in the elderly. Am J Epidemiol. 1999;149(6):541–549. doi:10.1093/oxfordjournals.aje.a009850

9. WW D. Biostatistics: A Foundation for Analysis in the Health Sciences. 7th edition ed. New York: John Wiley & Sons; 1999.

10. servives Ss. Population proportion – sample size calculator. 2021. Available from: https://select-statistics.co.uk/calculators/sample-size-calculator-population-proportion/. Accessed September 10, 2021.

11. de Jong VMT, Eijkemans MJC, van Calster B, et al. Sample size considerations and predictive performance of multinomial logistic prediction models. Stat Med. 2019;38(9):1601–1619. doi:10.1002/sim.8063

12. Saunte DML, Jemec GBE. Hidradenitis suppurativa: advances in diagnosis and treatment. JAMA. 2017;318(20):2019–2032. doi:10.1001/jama.2017.16691

13. Kirsten N, Petersen J, Hagenstrom K, Augustin M. Epidemiology of hidradenitis suppurativa in Germany - an observational cohort study based on a multisource approach. J Eur Acad Dermatol Venereol. 2020;34(1):174–179. doi:10.1111/jdv.15940

14. Wortsman X, Alfageme F, Dini V, et al. International consensus statement on the use of ultrasound in hidradenitis suppurativa. J Eur Acad Dermatol Venereol. 2025. doi:10.1111/jdv.20600

Creative Commons License © 2025 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.