Level of agreement between self-rated and clinician-rated instruments when measuring major depressive disorder in the Thai elderly: a 1-year assessment as part of the THAISAD study
Authors Wongpakaran N, Wongpakaran T, Wannarit K, Saisavoey N, Pinyopornpanish M, Lueboonthavatchai P, Apisiridej N, Srichan T, Ruktrakul R, Satthapisit S, Nakawiro D, Hiranyatheb T, Temboonkiat A, Tubtimtong N, Rakkhajeekul S, Wongtanoi B, Tanchakvaranont S, Bookkamana P, Srisutasanavong U, Nivataphand R, Petchsuwan D
Received 27 October 2013
Accepted for publication 3 January 2014
Published 25 February 2014 Volume 2014:9 Pages 377—382
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 3
Nahathai Wongpakaran,1 Tinakon Wongpakaran,1 Kamonporn Wannarit,2 Nattha Saisavoey,2 Manee Pinyopornpanish,1 Peeraphon Lueboonthavatchai,3 Nattaporn Apisiridej,4 Thawanrat Srichan,5 Ruk Ruktrakul,5 Sirina Satthapisit,6 Daochompu Nakawiro,7 Thanita Hiranyatheb,7 Anakevich Temboonkiat,8 Namtip Tubtimtong,9 Sukanya Rakkhajeekul,9 Boonsanong Wongtanoi,10 Sitthinant Tanchakvaranont,11 Putipong Bookkamana,12 Usaree Srisutasanavong,1 Raviwan Nivataphand,3 Donruedee Petchsuwan4
1Faculty of Medicine, Chiang Mai University, Chiang Mai, Kingdom of Thailand; 2Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Kingdom of Thailand; 3Faculty of Medicine, Chulalongkorn University, Bangkok, Kingdom of Thailand; 4Trang Hospital, Trang, Kingdom of Thailand; 5Lampang Hospital, Lampang, Kingdom of Thailand; 6Khon Kaen Hospital, Khon Kaen, Kingdom of Thailand; 7Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Kingdom of Thailand; 8Phramongkutklao Hospital, Bangkok, Kingdom of Thailand; 9Faculty of Medicine Naresuan University, Pitsanulok, Kingdom of Thailand; 10Srisangwal Hospital, Mae Hong Son, Kingdom of Thailand; 11Queen Savang Vadhana Memorial Hospital, Chonburi, Kingdom of Thailand; 12Faculty of Science, Chiang Mai University, Chiang Mai, Kingdom of Thailand
Purpose: Whether self-reporting and clinician-rated depression scales correlate well with one another when applied to older adults has not been well studied, particularly among Asian samples. This study aimed to compare the level of agreement among measurements used in assessing major depressive disorder (MDD) among the Thai elderly and the factors associated with the differences found.
Patients and methods: This was a prospective, follow-up study of elderly patients diagnosed with MDD and receiving treatment in Thailand. The Mini International Neuropsychiatric Inventory (MINI), 17-item Hamilton Depression Rating Scale (HAMD-17), 30-item Geriatric Depression Scale (GDS-30), 32-item Inventory of Interpersonal Problems scale, Revised Experience of Close Relationships scale, ten-item Perceived Stress Scale (PSS-10), and Multidimensional Scale of Perceived Social Support were used. Follow-up assessments were conducted after 3, 6, 9, and 12 months.
Results: Among the 74 patients, the mean age was 68±6.02 years, and 86% had MDD. Regarding the level of agreement found between GDS-30 and MINI, Kappa ranged between 0.17 and 0.55, while for Gwet's AC1 the range was 0.49 to 0.91. The level of agreement was found to be lowest at baseline, and increased during follow-up visits. The correlation between HAMD-17 and GDS-30 scores was 0.17 (P=0.16) at baseline, then 0.36 to 0.41 in later visits (P<0.01). The PSS-10 score was found to be positively correlated with GDS-30 at baseline, and predicted the level of disagreement found between the clinicians and patients when reporting on MDD.
Conclusion: The level of agreement between the GDS, MINI, and HAMD was found to be different at baseline when compared to later assessments. Patients who produced a low GDS score were given a high rating by the clinicians. An additional self-reporting tool such as the PSS-10 could, therefore, be used in such under-reporting circumstances.
Keywords: late-life depression, measurement, correlation
Major depressive disorder (MDD) is a mood disorder commonly found among the elderly; it is characterized by a loss of pleasure, sadness, sleep disturbance, a poor sense of self, guilty feelings and cognitive impairment. It can lead to impaired social, occupational, everyday functioning. MDD causes clinically significant distress and suicidal behaviors in some people as well. The prevalence of late-life depression varies, but evidence suggests that MDD rates in community settings may be around 5.5% to 5.9%.1–3 A recent meta-analysis reported that the median prevalence of MDD in long-term care facilities is around 10%.4
MDD can be diagnosed using reliable screening tools and diagnostic criteria. There are two main types of screening instruments used when assessing depression: self-rating and clinician-rating instruments. The different measures used when screening for or diagnosing MDD may have an influence on its prevalence.5 Self-rating instruments are easy for patients to use if they have the ability to understand the content, though as cognitive impairment is one of the conditions commonly presented among elderly people with MDD, there may be some degree of uncertainty surrounding the use of a self-rating instrument in such cases. Clinician-rated instruments can be used without a patient’s cooperation; however, the level of interpretation of a patient’s own feelings and thoughts may be limited. While the Geriatric Depression Scale (GDS) is a widely used self-rating screening measure, the Hamilton Depression Rating Scale (HAMD) is a commonly used clinician-rated instrument, and is used to assess depression in the elderly.6,7
While discrepancies between self-rating and clinician-rated scales have been found at baseline and may have an impact on treatment outcomes for adult depression, the information gathered thus far on geriatric depression is limited,8,9 with little or no data available regarding this issue among elderly populations in Asian countries. The primary aim of this study was to investigate the level of agreement between the 30-item Geriatric Depression Scale and the diagnoses made using Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) criteria when trying to detect MDD in elderly Thais. The researchers also wanted to examine the association between the severity of depressive symptoms and stress, social support, interpersonal problems, and attachment levels.
Material and methods
This research was part of the Thai Study of Affective Disorder (THAISAD) research project, a prospective, 12-month follow-up study of Thai people diagnosed with depressive disorders and receiving treatment at eleven hospitals across Thailand. Ethical approval for this study was provided by the Joint Research Ethics Committee of Thailand, and the Ethics Committee of the Ministry of Public Health of Thailand. Details regarding the methodology used in the THAISAD project and the participants’ characteristics can be viewed elsewhere.10
Participants and procedures
In total, 74 participants aged 60 years and over were included out of the 371 adult participants taking part in the main study (Figure 1). All of the participants had been diagnosed with MDD, dysthymia, or both (double depression), according to DSM-IV criteria using the Mini International Neuropsychiatric Interview (MINI), version 5.11 All participants were assessed at baseline (month 0), and after 3, 6, 9, and 12 months.
Only those participants able to give informed consent were recruited. All the participants were treated with antidepressants (ie, selective serotonin reuptake inhibitors, serotonin norepinephrine reuptake inhibitors), hypnotics or anxiolytics, and/or psychotherapy, and then monitored over the course of 12 months. The medication or treatment choices available were considered by psychiatrists in line with standard guidelines and treatment recommendations. Participants who had severe medical comorbidities, a history of cognitive disorders (mild cognitive disorder and dementia) according to the Mini-Mental State Examination–Thai 2002 (MMSE–Thai 2002)12 instrument, who were unable to understand the researchers’ words, and/or had a history of psychiatric comorbidity (ie, alcohol dependence, anxiety disorder, organic mental disorder, psychotic disorders, or bipolar disorders) were excluded from the study.
The MINI was conducted by research nurses or psychiatrist investigators (all except PB). The Clinical Global Impression scale (CGI) and HAMD instruments were used by psychiatrists. All these three measures were utilized among all the participants and on every visit. At baseline, upon remission, and at the end of the study, all participants were assessed using a socio-demographic questionnaire, outcome variables, and psychosocial instruments (see the Instruments section below). Self-rating measures were completed by the participants, on their own, or by having the questions read to them by research assistants.
The clinician-rated measurements used included the Clinical Global Impression – Severity scale (CGI-S), a seven-point scale that requires clinicians to rate the severity of the condition being assessed (ranging from 1 [normal/not at all], to 7 [extremely ill]). The severity of depression was assessed using: i) the 17-item Hamilton Depression Rating Scale (HAMD-17), a clinician-rated scale; ii) the Thai Depression Inventory, a 20-item, four-rating scale which assesses the severity of depressive symptoms among respondents, ranging from 1 (most severe) to 4 (normal); iii) the Thai GDS, a 30-item, true–false questionnaire used to assess depressive symptoms in participants aged 60 years of age or over;13 iv) the 32-item Inventory of Interpersonal Problems scale, an instrument which uses a five-point Likert scale to assess the severity of interpersonal problems in the participants’ daily lives;14 v) the short version of the Revised Experience of Close Relationships scale, an 18-item instrument which uses a seven-point Likert scale when asking respondents how anxious or close they feels toward a partner and other close friends/relatives;15 vi) the ten-item Perceived Stress Scale (PSS-10), a ten-item/five-rating scale in which participants report on how frequently they feel stressed;16 and vii) the Multidimensional Scale of Perceived Social Support (MSPSS), a 12-item, seven-point Likert rating scale in which participants report on how they feel on the social support they receive.17 All Thai versions of the measurements have demonstrated good reliability and validity. The psychometric properties of the instruments used are described in another published article.10
MMSE–Thai 2002 is a clinician-rated measure used to evaluate a participant’s cognitive impairment and level of dementia, and was developed for use with the Thai population from a model first developed by Folstein et al.10,12,18 Using this instrument, there are three different cut-off scores for impairment, depending on the patients’ formal education levels, these being: 22 (out of 30) for elementary school level, 17 (out of 30) for below elementary school level, and 14 (out of 23) for those who did not go to school, or are illiterate.
In this study, descriptive statistics were used in order to describe socio-demographic and clinical characteristics of the participants. Pearson’s correlation was calculated in order to establish any association between the GDS and HAMD scores. To compare the level of agreement between clinician diagnoses using MINI and the GDS cut-off score (>24), Gwet’s AC119 and Cohen’s Kappa were used, using AgreeStat software version 2011.3 (Advanced Analytics, Gaithersburg, MD, USA).20,21 Univariate analysis was used to find predictors for the severity of suicidality, with statistical analysis conducted using SPSS for Windows, version 17 (IBM Corporation, Armonk, NY, USA).
Among the 74 elderly patients (71.6% female), the mean age was 68±6.02 years, and 86% had MDD (Table 1). The mean GDS-30 at baseline was 17.84±6.84, while for the follow-up evaluations the GDS-30 scores were 11.69±7.82, 11.28±7.90, 9.41±7.42, and 9.89±7.41 after 3, 6, 9, and 12 months, respectively (Table 2).
Table 1 Participants’ socio-demographic and clinical characteristics
Table 2 Severity of depression as rated by the GDS-30 and HAMD-17 at different times
In terms of the level of agreement between the MINI and GDS (cut-off score, 25 or more for severe depression) when diagnosing for MDD, the Cohen’s Kappa values were 0.172, 0.551, 0.528, 0.102, and 0.359 at months 0 (baseline), 3, 6, 9, and 12, respectively. In accordance with Altman, the values produced at baseline and at month 9 were poor (<0.20).22 However, when using Gwet’s AC1, the values were 0.487, 0.778, 0.782, 0.841, and 0.907 at months 0 (baseline), 3, 6, 9, and 12, respectively. Based on Altman’s criteria, these values could be rated as moderate to very good.22 The Gwet’s AC1 values produced were consistent with the percentage level of agreement (Table 3). The findings demonstrate that levels of agreement (calculated by both Cohen’s Kappa and Gwet’s AC1) were low at baseline.
Table 3 Level of agreement between the MINI and GDS (cut-off 25) for each visit
Most of the HAMD-17 and GDS-30 scores were correlated, with Pearson’s correlation coefficients being 0.17 (P=0.15) at baseline, then 0.40 (P=<0.001), 0.36 (P=0.002), 0.41 (P<0.001), and 0.38 (P=0.001) after 3, 6, 9, and 12 months, respectively (Table 4).
When exploring the level of discordance at baseline, it was found that the MMSE and PSS instruments were good predictors of depression (unstandardized coefficient (B) =−0.367, standard error [SE] 0.16, 95% confidence interval [CI] =−0.681 to −0.053, P=0.022; B =0.211, SE 0.08, 95% CI =0.049–0.374, P=0.011). However, nothing was found to be a predictor at the 3- and 9-month follow-ups, though PSS was close to being a predictor at month 6 (P=0.050). Interestingly, MSPSS was a predictor at month 12 (B =2.143, SE 0.82, 95% CI =3.76–6.72, P=0.010).
To our knowledge, this is the first study to have compared use of the self-reporting GDS and clinician-rated interview of depression instruments among elderly patients. The reason why there was low agreement between the first and fourth assessments (month 9) may be due to the fact that the participating patients’ cognitive functions had been poor upon admission for treatment, due to their depression. As a result, they may have responded in an inconsistent way. Uher et al, using a large sample, found that self-reporting instruments are consistent with clinician-rated methods, and so may be used interchangeably; however, the participants in that study were not elderly.8 Patient factors need to be taken into account when reporting symptoms, as they may have a tendency to under- or over-rate.23 In this study, the first assessment produced the lowest level of agreement between the two measures – the self-reporting and clinician-rated instruments; thereafter, the level of agreement was higher, except during the fourth assessment when Cohen’s Kappa was at its lowest (in contradiction of Gwet’s AC1). This may have been due to problems with the formulae used by Cohen’s Kappa when compared to Gwet’s AC1s, rather than the actual level of agreement.19
It is interesting to note that those factors predicting the disagreements varied at each visit. It is conceivable that MMSE (or “poor cognitive function”) predicted discordance in the first evaluation, while the perception of stress was also a predictor of disagreement, as shown by the fact that that when it reduced in the follow-up visits, the level of disagreement decreased accordingly. This highlights the fact that a clinician’s judgment when diagnosing using DSM does not necessarily reflect how the patients really feel.24 Factors found to be associated with discordance in other studies include a higher age, being male, having lower impairment levels, and the severity of the symptoms being experienced. In addition personality factors, such as high levels of neuroticism, and low levels of extraversion and agreeableness, may be associated with a greater endorsement of depressive symptoms.24,25
When it comes to a large population-based study, self-reporting questionnaires may be used before applying clinician-rated tools. The GDS can still be used to screen for depression (nowadays, GDS-15 is widely used);26,27 however, as the results here show, it is difficult to rely on the GDS tool alone, particular at baseline stage. Therefore, using another questionnaire, such as the perceived stress scale, alongside GDS might be of use, as it may provide more information to clinicians with regard to the possibility of their under-reporting symptoms, prompting them to carry out further assessments.
When reporting on MDD in this study, the level of agreement between patients and clinicians was found to be different at baseline when compared to later assessment periods. Patients who produced a low GDS score were rated as highly depressed by the clinicians when using either MINI or HAMD-17. Use of an additional self-reporting tool, such as the Perceived Stress Scale, may therefore be useful in such under-reporting circumstances.
The sample used for this study was of a modest size, so generalizability of the findings may be limited. As this study was part of the THAISAD project, which studied 371 adult and elderly patients with depressive disorders across eleven sites, researchers preferred using HAMD to the Montgomery–Åsberg Depression Rating Scale (MADRS), as they were more familiar with the HAMD instrument. However, it may be that MADRS is more suitable for use among the elderly, due to the fact that it has fewer physically related questions when compared to the HAMD tool. However, it is generally accepted that HAMD and MADRS correlate well.
All authors have contributed to conception and participated on the protocol designed, acquisition of data and coordination of the study in the different sites. PB and TW have participated on the analysis of the available data. NW drafted the manuscript and the rest authors revised it critically for important intellectual content. All authors read and approved the final manuscript and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
This study was funded by the National Research Council of Thailand, and was coordinated and supported by the Medical Research Network of the Consortium of Thai Medical School. Additional funding for the research was provided by the Faculty of Medicine, Chiang Mai University.
The authors report no conflicts of interest in this work.
Mojtabai R, Olfson M. Major depression in community-dwelling middle-aged and older adults: prevalence and 2- and 4-year follow-up symptoms. Psychol Med. 2004;34(4):623–634.
Wongpoom T, Sukying C, Udomsubpayakul U. Prevalence of depression among the elderly in Chiang Mai province. J Psychiatr Assoc Thailand. 2011;56(2):103–116.
Byers AL, Yaffe K, Covinsky KE, Friedman MB, Bruce ML. High occurrence of mood and anxiety disorders among older adults: The National Comorbidity Survey Replication. Arch Gen Psychiatry. 2010;67(5):489–496.
Seitz D, Purandare N, Conn D. Prevalence of psychiatric disorders among older adults in long-term care homes: a systematic review. Int Psychogeriatr. 2010;22(7):1025–1039.
Wongpakaran N. Geriatric psychiatry in Thailand. J Psychiatr Assoc Thailand. 2008;53(Suppl):s39–s46.
Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr Res. 1982;17(1):37–49.
Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56–62.
Uher R, Perlis RH, Placentino A, et al. Self-report and clinician-rated measures of depression severity: can one replace the other? Depress Anxiety. 2012;29(12):1043–1049.
Rane LJ, Fekadu A, Wooderson S, Poon L, Markopoulou K, Cleare AJ. Discrepancy between subjective and objective severity in treatment-resistant depression: prediction of treatment outcome. J Psychiatr Res. 2010;44(15):1082–1087.
Wongpakaran T, Wongpakaran N, Pinyopornpanish M, et al. Baseline characteristics of depressive disorders in Thai outpatients: findings from the Thai Study of Affective Disorders (Thai SAD). Neuropsychiatr Dis Treat. 2014;10:217–223.
Sheehan D, Lecrubier Y, Sheehan K, et al. The Mini-International Neuropsychiatric Interview (MINI): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59(Suppl 20):22–33, quiz 34–57.
Thai Cognitive Test Development Committee 1999. Mini-Mental State Examination-Thai 2002. Bangkok: Institute of Geriatric Medicine, Department of Medical Services, Ministry of Public Health, Thailand; 2002.
Train The Brain Forum (Thailand). Thai Geriatric Depression Scale. Siriraj Hosp Gaz. 1994;46(1):1–9.
Wongpakaran T, Wongpakaran N, Sirithepthawee U, et al. Interpersonal problems among psychiatric outpatients and non-clinical samples. Singapore Med J. 2012;53(7):481–487.
Wongpakaran T, Wongpakaran N. A short version of the revised ‘experience of close relationships questionnaire’: investigating non-clinical and clinical samples. Clin Pract Epidemiol Ment Health. 2012;8:36–42.
Wongpakaran N, Wongpakaran T. The Thai version of the PSS-10: An Investigation of its psychometric properties. Biopsychosoc Med. 2010;4:6.
Wongpakaran N, Wongpakaran T. A revised Thai Multi-Dimensional Scale of Perceived Social Support. Span J Psychol. 2012;15(3):1503–1509.
Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–198.
Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. BMC Med Res Methodol. 2013;13:61.
Gwet KL. Handbook of Inter-Rater Reliability. The Definitive Guide to Measuring the Extent of Agreement Among Raters. 2nd ed. Gaithersburg, MD: Advanced Analytics, LLC; 2010.
Gwet KL. AgreeStat2011.2. 2011; http://agreestat.com/agreestat. Accessed March 9, 2012.
Altman DG. Practical Statistics for Medical Research. London: Chapman and Hall; 1992.
Carter JD, Frampton CM, Mulder RT, Luty SE, Joyce PR. The relationship of demographic, clinical, cognitive and personality variables to the discrepancy between self and clinician rated depression. J Affect Disord. 2010;124(1–2):202–206.
Eaton WW, Neufeld K, Chen LS, Cai G. A comparison of self-report and clinical diagnostic interviews for depression: diagnostic interview schedule and schedules for clinical assessment in neuropsychiatry in the Baltimore epidemiologic catchment area follow-up. Arch Gen Psychiatry. 2000;57(3):217–222.
Enns MW, Larsen DK, Cox BJ. Discrepancies between self and observer ratings of depression. The relationship to demographic, clinical and personality variables. J Affect Disord. 2000;60(1):33–41.
Wongpakaran N, Wongpakaran T, Van Reekum R. The use of GDS-15 in detecting MDD: a comparison between residents in a Thai long-term care home and geriatric outpatients. J Clin Med Res. 2013; 5(2):101–111.
Wongpakaran N, Wongpakaran T. Prevalence of major depressive disorders and suicide in long-term care facilities: a report from northern Thailand. Psychogeriatrics. 2012;12(1):11–17.