SEPLINE: Socioeconomic Position in Epidemiological Research&mdash;A National Guideline on Danish Registry Data

Cathrine F Hjorth; Thora M Kjærulff; Mette Kielsholm Thomsen; Deirdre Cronin-Fenton; Susanne O Dalton; Maja H Olsen

doi:10.2147/CLEP.S520772

Back to Journals » Clinical Epidemiology » Volume 17

Expert Opinion

SEPLINE: Socioeconomic Position in Epidemiological Research—A National Guideline on Danish Registry Data

Authors Hjorth CF , Kjærulff TM, Thomsen MK , Cronin-Fenton D , Dalton SO , Olsen MH

Received 24 February 2025

Accepted for publication 28 May 2025

Published 4 July 2025 Volume 2025:17 Pages 593—624

DOI https://doi.org/10.2147/CLEP.S520772

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 3

Editor who approved publication: Dr Thomas Ahern

Download Article [PDF]

Cathrine F Hjorth,¹ Thora M Kjærulff,² Mette K Thomsen,^1,³ Deirdre Cronin-Fenton,¹ Susanne O Dalton,^{4– 6} Maja H Olsen,⁴ The SEPLINE Group includes:Anne Dahl Sørensen, Cathrine F. Hjorth, Danni Chen, Deirdre Cronin Fenton, Eeva-Liisa Røssell Johansen, Emma Neble Larsen, Frederik Nicolai Foldager, Gitte Valentin, Henrik Bøggild, Henrik Toft Sørensen, Henry Jensen, Ingelise Andersen, Jan Wohlfahrt, Jarl Christian Quitzau, Julie A. Schmidt, Kirubakaran Balasubramaniam, Lars Børty Nielsen, Lau Caspar Thygesen, Linda Ejlskov, Line Virgilsen, Maja Halgren Olsen, Marie Mørk Josiasen, Merete Osler, Mette Bender, Mette Kielsholm Thomsen, Michael Green, Nasrin Tayyari, Nynne Bech Utoft, Oleguer Plana-Ripoll, Peter Haastrup, Peter Vedsted, Pia Kjær Kristensen, Susanne Oksbjerg Dalton, Susanne Fogh Jørgensen, Søren Korsgaard Martiny, Thomas Maribo, Thomas Wolff Rosenqvist, Thora Majlund Kjærulff, Tinne Laurberg, Trine Allerslev Horsbøl, Ulla Arthur Hvidtfeldt, Ulrik Deding On behalf of the SEPLINE Group

PhD student, Research Unit for General Practice, Aarhus, Denmark, and Department of Public Health, Aarhus University, Denmark; Postdoc, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; PhD Student, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Professor, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Postdoc, Department of Public Health, Aarhus University and Steno Diabetes Center Aarhus, Denmark; PhD Student, Cancer Survivorship, Danish Cancer Institute, Denmark; PhD Student, Department of Orthopedic Surgery, Aarhus University & Aarhus University Hospital, Denmark; Researcher, DEFACTUM, Central Denmark Region, Denmark; Associate Professor, Public Health and Epidemiology, Department of Health Science and Technology, Aalborg University, Gistrup, Denmark & Research Data and Biostatistics, Aalborg University Hospital, Denmark; Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Epidemiologist, The Danish Healthcare Quality Institute (DHQI), Denmark; Associate Professor, Section of Social Medicine, Department of Public Health, University of Copenhagen, Denmark; Chief Epidemiologist, Cancer Epidemiology and Surveillance, Danish Cancer Institute & Department of Clinical Medicine, Aalborg University, Denmark; Chief Advisor, Statistics Denmark, Denmark; Postdoc, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Associate Professor, Research Unit of General Practice, University of Southern Denmark, Denmark; Assistant Professor, Center for Clinical Data Science, Aalborg University Hospital and Aalborg University, Denmark; Professor, National Institute of Public Health, University of Southern Denmark, Denmark; Assistant professor, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Senior Researcher, Research Unit for General Practice, Aarhus, Denmark; Postdoc, Cancer Survivorship, Danish Cancer Institute, Denmark; Department of Public Health, Aarhus University, Denmark; Professor, Center for Clinical Research and Prevention, Bispebjerg and Frederiksberg Hospitals and Section of Epidemiology, University of Copenhagen, Denmark; Assistant Professor, Section of Social Medicine, University of Copenhagen, Denmark; Postdoc, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital & CASTLE – Cancer Survivorship and Treatment Late Effects, Department of Oncology, Copenhagen University Hospital, Rigshospitalet, Denmark; Department of Obstetrics and Gynecology, Duke University School of Medicine, US; Researcher and Assistant Professor, DEFACTUM, Central Denmark Region and Aalborg University, Denmark; PhD Student, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Professor, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark & National Centre for Register-based Research, Department of Public Health, Aarhus University, Denmark; Associate Professor, Research Unit of General Practice, University of Southern Denmark, Denmark; Professor, Research Unit for General Practice, Aarhus and Medical Diagnostic Center, University Clinic for Innovative Patient Pathways, Department of Clinical Medicine, Aarhus University, Denmark; Associate Professor, Department of Orthopedic Surgery, Aarhus University and Aarhus University Hospital, Denmark; Professor, Cancer Survivorship, Danish Cancer Institute & Danish Research Center for Equality in Cancer (COMPAS), Department of Clinical Oncology and Palliative Care, Zealand University Hospital & Institute of Clinical Medicine, Faculty of Health, Copenhagen University, Denmark; Assistant Professor, Research Unit for Screening and Epidemiology, Department of Biochemistry and Immunology, Lillebaelt Hospital and Department of Regional Health Research, University of Southern Denmark, Denmark; PhD Student, Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Denmark; Professor, Department of Public Health, Aarhus University and DEFACTUM, Central Denmark Region, Denmark; PhD student, Center for Clinical Research and Prevention, Bispebjerg and Frederiksberg, Denmark; Postdoc, National Institute of Public Health, University of Southern Denmark, Denmark; Associate Professor, Steno Diabetes Center Aarhus, Denmark; Associate Professor, National Institute of Public Health, University of Southern Denmark, Denmark; Senior Scientist, Work, Environment and Cancer, Danish Cancer Institute, Denmark; Postdoc, Epidemiologist, Department of Surgery, Odense University Hospital, & Department of Clinical Research, University of Southern Denmark, Denmark; ¹Department of Clinical Epidemiology, Department of Clinical Medicine, Aarhus University and Aarhus University Hospital, Aarhus, Denmark; ²National Institute of Public Health, University of Southern Denmark, Odense, Denmark; ³CASTLE – Cancer Survivorship and Treatment Late Effects, Department of Oncology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark; ⁴Cancer Survivorship, Danish Cancer Institute, Copenhagen, Denmark; ⁵Department of Clinical Oncology and Palliative Care, Zealand University Hospital, Næstved, Denmark; ⁶Institute of Clinical Medicine, Faculty of Health, Copenhagen University, Copenhagen, Denmark

Correspondence: Cathrine F Hjorth, Email [email protected] Maja H Olsen, Email [email protected]

Background: Socioeconomic differences in health have become an increasing public health concern and priority, leading to a growing number of studies investigating the relationship between socioeconomic position and health outcomes. However, variability in methodological practices hampers the comparability of findings and leads to inefficiencies, as researchers invest substantial resources in selecting appropriate variables and methods. To address these challenges, the SEPLINE initiative was established to develop a methodological guideline aimed at enhancing the comparability, quality, and feasibility of socioeconomic research using Danish registry data.
Methods: The guideline was developed through a consensus-driven approach involving an interdisciplinary group of stakeholders from Danish universities, research institutions, and data warehouses. The guideline addresses socioeconomic position as an exposure based on data from Danish registries, with the cancer continuum applied as a case outcome to illustrate its application. The development process included two collaborative workshops informed by a pre-workshop questionnaire. Workshop I (spring 2024) focused on socioeconomic indicators, data collection, and data management, featuring expert presentations and group discussions. Workshop II (fall 2024) addressed analytical methods, including causal inference challenges and income/wealth assessment methods. Insights from these workshops were integrated into iterative refinements of the guideline.
Conclusions and Implications: The guideline provides a structured framework for conducting socioeconomic epidemiological research using Danish registry data, offering specific information on data sources and recommendations about variable selection, measurement timing, and data handling. While tailored to Danish registry-based cancer research, the guideline’s methodological principles have broader applicability to other diseases and international contexts. By emphasizing transparency, theoretical grounding, and methodological rigor, SEPLINE aims to advance the study of social determinants of health. Researchers are encouraged to use the guideline as a relevant starting point and adapt it to their specific study populations and research questions, ensuring its relevance across diverse settings.

Keywords: socioeconomic position, social epidemiology, methodology, social determinants of health, socioeconomic inequality, disparity, guideline, registry-based research

Corrigendum for this paper has been published.

Introduction

Over the past decades, addressing socioeconomic differences in health has gained growing scientific, clinical, and political attention, increasing the demand for studies investigating the impact of socioeconomic position on health outcomes. The complexity of the field has led to the application of varying methods for data collection, data management, categorization, and analyses of socioeconomic indicators, which limits the comparability of results. The choice of method is often determined by local traditions and data accessibility but is also guided by the specific research question. As information about these processes is often inadequately reported in scientific papers, the reproducibility of findings is compromised. New researchers, in particular, may spend considerable time identifying suitable variables and how to apply them in observational studies.

To improve the comparability, quality, and feasibility of socioeconomic epidemiological research, the SEPLINE initiative was launched in 2024 as a collaborative effort among subject-matter experts from Danish universities and institutions. The primary outcome of this collaboration is the development of this guideline, which offers methodological recommendations and practical guidance for selecting and analyzing socioeconomic indicators from Danish population-based registries. Focusing on socioeconomic position as an exposure, the guideline is tailored to research involving the adult population. While cancer is used as a case example to illustrate key points, the methodological considerations are broadly applicable to research in other diseases and contexts. Developed through a consensus-driven process and grounded in existing evidence, the guideline aims to serve as a starting point for socioeconomic research. As the guideline does not encompass all aspects of this complex research field, researchers should carefully assess whether the suggestions provided are appropriate for their specific research question and study population.

Methods

The guideline is based on data from Danish national registries available at Statistics Denmark as of October 2024.

The Danish Welfare State

Denmark is a welfare state with universal access to education and healthcare for all residents, alongside a comprehensive social security system. The educational system is state-funded and offers free tuition at all levels from primary school to university.¹ Education is mandatory for 10 years for children aged about 6–15 years, corresponding to primary and lower secondary education. To further support students, the Danish government provides financial aid through the State Educational Grant and Loan Scheme to cover basic living expenses. Despite these provisions, a strong correlation persists between parental socioeconomic factors and the highest educational level attained, indicating ongoing barriers regarding social mobility.² The Danish social security system seeks to reduce poverty and improve economic equality through income redistribution and social benefit programs, including unemployment benefits, child allowances, housing subsidies, sick leave compensations, and pensions. Though income inequality, as measured by the GINI coefficient, has increased in Denmark since the late 1980s, Denmark is among the Organisation for Economic Co-operation and Development (OECD) countries with the lowest level of income inequality and poverty.³ Most healthcare services are provided with no out-of-pocket payments, including general practitioner visits, hospital care, certain vaccinations, cancer screening programs, and rehabilitation. Other services are only partially subsidized and require co-payment, eg, adult dental care, prescription medication, eye exams, physiotherapy, chiropractors, and psychologists. While the public healthcare system remains the dominant provider, the use and provision of private health services and private health insurance have expanded.¹ Despite addressing inequality in health having been a key public health priority for decades, considerable socioeconomic differences in health outcomes and life expectancies persist in Denmark and other nations.^4–6

Danish Registry Data

The Danish public system has a long tradition of comprehensive data collection and a unique ability to link data across systems, using a universal personal identification number, called the CPR number.⁷ Since 1968, all Danish residents have been given a CPR number at birth or residence permit. Information on many life events—including date of birth, immigration, educational attainment, employment and income, marriages and households, retirement, emigration, and death—is systematically registered in national administrative registers. Moreover, medical information is collected in national registers covering healthcare contacts, diagnoses, therapies, drug prescriptions etc.¹ All these registries are linkable by the CPR number. The data are primarily collected for administrative purposes; however, Danish research institutions can be permitted to use the data for research under certain conditions. Statistics Denmark is the data holder of national registries that include socioeconomic information, and data extracts for research are accessed in a pseudo-anonymized form on the secure servers of Statistics Denmark. Through their website, Statistics Denmark offers a documentation system describing each variable including data breaches and validity.⁸ Researchers are encouraged to explore the website along with the information provided in this guideline. A list of the registers referred to in this guideline can be found in Table S1.

Guideline Development

This guideline was developed during two collaborative full-day workshops using a consensus driven approach with broad representation across institutions, geographical regions, seniority levels, and subject-matter expertise. Invitations were sent to research group leaders within the field as well as data-holder consultants from Danish universities and institutions, encouraging representation ranging from junior researchers to senior professors and chief consultants. Open invitations were also distributed via social media and the Danish Comprehensive Cancer Center's website. Participants spanned epidemiologists, sociologists, data consultants, clinicians, data managers, and statisticians. Participation was free of charge and open to researchers and other professionals with an interest in the topic.

Upon registration, participants received a questionnaire on their applied methodological practices, including variable selection, categorization, and data management. The responses revealed substantial variation in the choice of variables, time of measurement, handling of missing values, and categorizations. These findings formed the foundation for a preliminary guideline serving as the focal point for Workshop I. Workshop I, held in spring of 2024, focused on the choice of socioeconomic indicators, data collection, and data management. The workshop included two lectures. Chief Adviser Jarl Christian Quitzau from Statistics Denmark presented an overview of registry-based data sources for socioeconomic indicators and pitfalls in their use. Assistant Professor Linda Ejlskov from Aarhus University presented findings of a study emphasizing the analytical impact of different income measures.⁹ Building on the preliminary guideline, participants engaged in group discussions to draft concrete recommendations for single indicators. Where multiple—and potentially conflicting—recommendations emerged, all options were documented and systematically compared. The group recommendations were presented and discussed in a plenary session. After the workshop, group notes and plenary feedback were summarized in a revised version of the guideline. This version also included a proposed theoretical model and a new chapter on analytical methods. The updated guideline was circulated to all participants for review ahead of Workshop II. Workshop II, held in fall of 2024, focused on analytical methods. Epidemiologist Michael Green from Duke University gave a lecture on causal inference issues when working with multiple indicators of socioeconomic position. PhD student Søren Korsgaard Martiny from Aarhus University presented methodological considerations on rank-based income and wealth assessments. Participants then worked in groups to refine the theoretical model and continued building upon the work initiated in Workshop I. Outcomes from group discussions were again presented in plenum, with the aim of reaching consensus. In cases where consensus could not be reached—particularly when methodological choices were context-dependent—alternative approaches were included in the guideline, with their respective advantages and disadvantages clearly described. Before publication, the revised guideline was circulated for several rounds of review, ensuring that all SEPLINE group members endorsed the final content.

Socioeconomic Determinants of Health

Several public health theories describe how health is shaped by the social and structural conditions in which individuals are born, live, grow, work, and age. During the 19^th century, with the foundation of social medicine, the lack of education, poverty, housing and working conditions were argued to be key determinants of health.^10–12 Since then research has demonstrated a systematic socioeconomic gradient in health, with lower socioeconomic position being associated with poorer health outcomes at every level of this gradient. Concurrently, frameworks on the social determinants of health evolved to encompass distal and upstream societal mechanisms and emphasize the dynamic interplay between structural, contextual, and individual-level factors.^10–17

In line with many current social epidemiological theories, the theoretical approach used in this guideline employs a life course perspective. This approach asserts that societal, contextual, and individual factors interact throughout life and shape different socioeconomic conditions (Figure 1). These conditions encompass both resource-based stratifying mechanisms in a society, such as material and social resources, and prestige-based mechanisms, such as occupational prestige. Socioeconomic position is a conceptualization of these conditions and refers to an individual’s or group’s social and economic position within the structure of society.¹⁰

Figure 1 Theoretical model of the relationship between structural, contextual, and individual factors and the trajectory of cancer. Adapted from Olsen MH, Kjær TK, Dalton SO, Danish Cancer Society Research Center. Social Inequality in Cancer in Denmark: White Paper. Danish Cancer Society; 2023.¹⁸

Structural factors include politics, legislation, and organization such as health policies, redistributive policies, housing policies, and structures of the educational system and healthcare system.
Contextual factors encompass social norms, cultural practices and prestige, as well as the broader social, economic, and physical environment of our neighborhoods. These factors include—but are not limited to—access to and quality of education and healthcare.
Individual factors include genetics and personality, early life conditions and morbidity, ethnicity, social support and resources, and capability and competencies, such as familial socioeconomic circumstances, social networks, and cognitive skills.

As illustrated in Figure 1, structural, contextual, and individual factors shape an individual’s socioeconomic position and influence different health-related factors and resources to navigate the society and healthcare system. It is through the combined interaction of these factors that the socioeconomic differences in, for example, cancer outcomes, are believed to arise.^18,19 For instance, the design of healthcare initiatives, such as national screening programs, cancer treatments, or follow-up care, may interact with an individuals’ socioeconomic position or health resources, contributing to socioeconomic differences in timely diagnosis, treatment adherence, and ultimately cancer survival.¹⁸

Measures of Socioeconomic Position

Socioeconomic position is a latent and multifaceted construct that cannot be measured by a single indicator. Instead, it is approximated through various indicators that serve as proxies for overlapping, yet distinct dimensions of socioeconomic position. This guideline encompasses educational level, labor market affiliation or occupation, income, and wealth, alongside closely related sociodemographic factors such as marital or cohabitation status, ethnicity, and area-level characteristics. Though these indicators are correlated, each captures unique aspects of an individual’s socioeconomic position at different stages of life and potentially has differential effects on health.^15,20 Figure 2 illustrates some of these multi-dimensionalities, which can develop over the life course. The non-overlapping areas of the circles illustrate the unique contributions of each indicator, while the overlapping areas represent shared effects and a core dimension of socioeconomic position.²⁰ For example, as educational level is correlated with later income level, observed health effects from income may be influenced by and share effects with education. Yet, the educational level may to a larger degree reflect knowledge and cultural resources, whereas income and wealth are more directly associated with the availability of economic resources.

Figure 2 The multidimensionality of indicators of socioeconomic position. Adapted from Green MJ, Popham F. Interpreting mutual adjustment for multiple indicators of socioeconomic position without committing mutual adjustment fallacies. BMC Public Health. 2019;19(1):19. Creative Commons.²⁰

Given the unique aspects of each indicator, they are not interchangeable proxies for socioeconomic position. The choice of specific indicators should, therefore, be carefully aligned with the research question. Moreover, investigating different socioeconomic indicators and their interactions may elucidate the complex, multidimensional nature of socioeconomic position, and its influence on health outcomes. Such an approach may provide valuable insights to guide public health strategies and interventions. This guideline covers information about the selected individual socioeconomic indicators. For some settings, it may be meaningful to combine these individual indicators into a composite measure or socioeconomic index to reflect the multidimensionality of socioeconomic position, however, such approaches are not addressed in this guideline. Note that for all socioeconomic indicators, individuals with missing values may constitute a vulnerable group. Hence, it is recommended to explore this subpopulation separately in relation to the outcome.

Education

Educational attainment is considered a comprehensive indicator of socioeconomic position, which may encompass both individual-level factors and reflect the familial, contextual, and structural conditions in which one is raised. Individual-level factors include early cognitive development and abilities, physical and mental health status, intellectual capacity, motivation to learn, and personal interests and values. Familial conditions include the educational level, economic resources, living standard, and parental health or household, as well as the level of cognitive stimuli and support in early life and adolescence. Contextual and structural conditions include the quality and availability of learning environments, accessibility and structure of secondary and higher educational institutions, as well as social norms, expectations, and prestige.^15,21 In regard to studies of health, the educational attainment and the abilities learned under education, are related to the level of health literacy, which encompasses the ability to understand, evaluate, and act upon health information, communicate and actively engage with healthcare professionals, and navigate in the healthcare system.^22,23 Education is often defined as the highest attained (ie, completed) level of education. Although an individual’s educational level is typically attained in early adulthood, the educational level may influence future job opportunities and income level, and, therefore, also reflect aspects of these factors, such as available economic resources.^15,21

Data Source and Variables

Different data sources are available to determine educational attainment, including registers on completed education programs and a longitudinal registry of ongoing activities (Table 1). The age at index date can help determine the most appropriate data sources and variables. As eg, a higher proportion of individuals below the age of 30 are still under education, it may be more appropriate for these age groups to use information from the student registry covering ongoing education (Figure S1), ie the KOTRE registry or the parental educational level.

Table 1 Data Sources for Educational Level in the Danish Registries

In Denmark, information on education has been systematically registered in student authorization registries since 1974 and maintained by Statistics Denmark.²⁴ For some individuals, the educational data comes from other sources, as indicated by the variable HF_KILDE. Information on education attained prior to 1974 derives from the qualification registry and from a census conducted in 1970 based on self-reported information (HF_KILDE= 2). For immigrants, the available information on education can be self-reported from surveys (HF_KILDE= 3, 17) or imputed by Statistics Denmark (HF_KILDE= 9, 10, 18). Before 2018, the information came from surveys conducted in 1999, 2006, and 2016. Since 2018, information on education completed outside Denmark by individuals who have immigrated within the past year has been collected through The Immigrant Survey annually. This data is obtained from Danish language courses and the Danish Agency for Labor Market and Recruitment. However, since these sources do not cover all immigrants, Statistics Denmark supplements the data by sending a questionnaire each December via digital post to those for whom information is unavailable. For non-respondents, educational attainment is imputed. This imputation is based on population data from the past four years of surveys. Background variables from other registries are incorporated, and observations with unknown educational levels are imputed using a random forest algorithm. Therefore, the imputed data should be used with caution. Its limitations and uncertainties are more pronounced at the individual level, whereas aggregated data is generally reliable because errors in the imputed data are likely to be balanced out at the group average. Therefore, when analyzing the educational level across larger population groups—such as comparing entire municipalities or individuals in and out of the labor market—the imputed data remains useful. However, analyses involving individual-level education data are less reliable. In such cases, it may be advisable to exclude the imputed data altogether or analyze them separately (Table 2). Detailed information about the education imputation models is available in Table S2.

Table 2 Data Management of Educational Data

The highest attained educational level (HFAUDD) variable can be formatted by using formats available in the secure servers of Statistics Denmark. Each year, the format is updated with codes for new educational programs and corrections, eg, if a specific education program has changed level. There are large differences in categorization depending on which format you choose. The format most suitable for the Danish population is provided in Table 3.

Categorization

The categorization suggested in Table 3 is aligned with international classifications from the International Standard Classification of Education (ISCED) and OECD, enabling global comparisons.²⁵ Depending on the study’s statistical power, it may be informative to treat missing values as a separate category, as this group could provide valuable insights. If the statistical power is insufficient, a complete-case analysis can be applied instead. The choice of categorization may consider that the length of education has different meanings for different birth cohorts. In 1972, compulsory education was extended from 7 to 9 years for students in 7^th grade in 1971/72 and onwards, typically those born ≥1958. However, students born before 1958 increasingly attended voluntary 8^th to 10^th grades. In 1975, a unified school system was introduced, and a selective type of middle school (In Danish: “Realskolen”) was abolished. This educational program covered grades 6^th to 10^th for students who met specific grade requirements and prepared the students for careers or higher secondary education. Depending on the study population and research question, researchers may consider assigning these individuals a medium instead of short education. The last group completed the program (In Danish: “Realeksamen”) in the summer of 1978.

Table 3 Categorization of Educational Level Using Danish Registry Data

Attention Points

Causality Issues

Educational level can be influenced by early life health, sex, ethnicity, parental socioeconomic position, and possibly cohabitation/parenthood status. As such, researchers should be aware of both unmeasured confounding, and reverse causality.

Missing Information

The information on education from the census in 1970 was discarded for individuals aged 50 years and above in 1970 due to poor quality. Education is, therefore, missing for most people born in or before 1920 or available for a highly selected group only (Figure S2).
In 2018, 51% of the adult population with missing information on education were immigrants. Be aware of the source of which the information is collected and consider whether imputed values should be excluded.
There is a data gap with missing information on education attained between 1970 (census information) and 1974 (start of student authorization registries) (Figure S2).

Inaccuracy and Misclassification

By 31^st December 2018, available information on the highest attained educational level was self-reported for 22% of the adult population (18% from the census in 1970 and 3.8% from surveys among immigrants). The proportion increases by decreasing birth year.²⁶ Compared to information obtained from registries, the self-reported information may be less precise.²⁴
Education attained outside Denmark is not systematically registered.
The educational level is not necessarily proportional to income levels, eg, some blue-collar workers have a higher income than certain groups of academics (Figure S3).
Besides primary education, most educations either qualify for further schooling or prepare for a specific profession. However, some educations, eg, some introductory courses, do not provide qualifications. Information about the attained qualifying level for each educational level can be found in the variable KOMP, as 1=qualifying general education, 3=vocational education and 7=higher education. Non-qualifying education and eg, privately paid education courses are not registered.

Birth Cohort/Life Course Aspects

The educational level in the population has increased notably over time, and the length of education may have different meanings for different birth cohorts (Figure S4).
For some education programs, their level has changed over time. For example, the educational level for nurses has changed from short further education (category 40) to medium further education (category 50). For this reason, it is recommended to use the latest accessible format at Statistics Denmark.
For younger cohorts, it might be appropriate to differentiate between the non-qualifying and qualifying educations. General upper secondary education (category 20) often requires further education to qualify for employment, while vocational education and training programs (category 30) are designed to directly prepare students for the labor market.

Labor Market Affiliation and Occupation

This chapter offers guidance on measuring labor market affiliation and occupation type, helping researchers assess the relevance of each indicator to their specific research questions. Labor market affiliation provides insights into an individual’s social and economic standing, reflecting income and job stability, access to resources, and social networks. It reflects the dynamic interplay between work, economic resources, and social structures, making it a crucial factor in understanding social determinants of health, social mobility, and policy impacts.

Compared to education, labor market affiliation is more sensitive to economic cycles, making it particularly useful for identifying vulnerable socioeconomic groups. Labor market affiliation indicates one’s engagement with income-generating activities, including employment status and job stability. However, depending on the categorization used, it may not capture socioeconomic differences among the large group of employed individuals. In contrast, occupation type conveys information on education, skill levels, social prestige, and insight into roles and responsibilities, but this is not accurately reflected in registry data, leading to potential misclassification. Additionally, shifts in industry demands may alter the socioeconomic implications of certain occupations over time, and cultural or regional differences in occupational prestige further complicate cross-context comparisons.^15,27

Data Sources and Variables

Three main data sources for labor market affiliation are summarized in Table 4.

Table 4 Data Sources for Labor Market Affiliation and Occupation in the Danish Registries

The Employment Classification Module (AKM)

In AKM, employed individuals are registered based on their main occupation during a given year, based on the activity that generated the largest amount of income.²⁸ The registry comprises an annual primary activity status (variable: BESKST), occupation type (variable: SOCIO13), and a Danish version of the International Standard Classification of Occupations for the economically active (DISCO). The registry goes back to 1976, with several changes over time, including new criteria for classifying students with part-time jobs as well as adjustments made to how self-employed individuals are categorized. These changes have been applied to historical data, with older classifications phased out from 2014.

The Integrated Database for Labour Market Research (IDA)

IDA registers the labor market affiliation of all Danish residents on 30^th November each year, among those residing in Denmark on 1^st January the following year. Due to changes in the coding system, two variables may be needed to assess labor market affiliation in the IDA database. Before 2008, the information is collected in the variable PSTILL, which refers to the primary affiliation during the given year. Until 1995 employed individuals are subclassified according to their profession code. From 1996, employed individuals are classified based on their work function in accordance with recommendations from the International Labour Organization and DISCO.²⁸ Since 2008, the information has been collected in the variable PSOC_STATUS_KODE, with different subcategories referring to employment types, unemployment, social assistance groups, etc. Therefore, the variables are not directly comparable.

The Danish Registry for Evaluation of Marginalization (DREAM)

DREAM encompasses all individuals who have received social benefits or other public transfer income, though not benefits related to the Social Services Act.²⁹ From 2008, DREAM also includes monthly employment information, hence also including all employed. The database is longitudinal, with one new variable added per week, with a code indicating the type of benefit received (variable: y_YYMM, where Y=year without century and M=month). The codes can, for instance, indicate if an individual has been unemployed, on leave, in early retirement, on sick leave, on social assistance, enrolled in eligible student education, or in job activation. DREAM also monitors transitions to receiving a state retirement pension, emigration from Denmark, and deaths occurring before reaching the state retirement pension age. If the variable is empty for a given week, it indicates that the individual neither received any benefits during that time nor matched any other specified codes (referred to as “empty”). This will in the following be used to indicate that the individual was working or self-supporting. Be aware that this assumption may lead to misclassification, particularly for young individuals who are supported by their parents or are not entitled to unemployment benefits. To address this issue, linking salary or income data can help verify employment status. Starting in 2008, the variable Branche_YYYY_MM can be used to verify whether weeks with no recorded code (“empty”) are due to employment.

Weekly benefit information is generated whenever an individual has received a benefit for at least one day. However, only one type of weekly benefit can be recorded, necessitating automatic prioritization in cases of overlapping data. For example, unemployment benefits take precedence over social assistance, while sick leave benefits take precedence over unemployment benefits.

The choice of data source offers different options for time of measurement (Table 5). Due to the annual nature of AKM and IDA, it is recommended to collect data in the year prior to index, whereas DREAM gives the opportunity to measure labor market affiliation closer to index, eg the week before index. However, this should be done with caution as labor market affiliation may change in the time leading up to index and cause reverse causality issues. Also, DREAM data are sensitive to weekly changes, why some previous studies have categorized according to main status in the 3–12 months leading up to diagnosis.^30,31

Table 5 Data Management of Labor Market Affiliation and Occupation Data

Table 6 Categorization of Labor Market Affiliation Using Danish Registry Data