Back to Journals » Journal of Blood Medicine » Volume 5

Applying spatial epidemiology to hematological disease using R: a guide for hematologists and oncologists

Authors Kohno K, Narimatsu H , Otani K, Sho R, Shiono Y, Suzuki I, Kato Y, Fukao A, Kato T

Received 20 November 2013

Accepted for publication 16 January 2014

Published 5 March 2014 Volume 2014:5 Pages 31—36

DOI https://doi.org/10.2147/JBM.S57944

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2



Kei Kohno,1 Hiroto Narimatsu,2 Katsumi Otani,2 Ri Sho,2 Yosuke Shiono,1 Ikuko Suzuki,1 Yuichi Kato,1 Akira Fukao,2 Takeo Kato,1

1Department of Neurology, Hematology, Metabolism, Endocrinology, and Diabetology, Yamagata University School of Medicine, 2Department of Public Health, Yamagata University Graduate School of Medicine, Yamagata, Japan

Abstract: “Spatial statistics” is an academic field that deals with the statistical analysis of spatial data, and has been applied to econometrics and various other policy fields. These methods are easily applied by hematologists and oncologists using better and much less expensive software. To encourage physicians to use these methods, this review introduces the methods and demonstrates the analyses using R and FleXScan, which can be freely downloaded from the website, with sample data. It is demonstrated that spatial analysis can be used by physicians to analyze hematological diseases. In addition, applying the technique presented to the investigation of patient prognoses may enable generation of data that are also useful for solving health policy-related problems, such as the optimal distribution of medical resources.

Keywords: leukemia, malignant lymphoma, Tango's index, spatial regression model

Introduction

“Spatial statistics” is an academic field that deals with the statistical analysis of spatial data. In the field of epidemiology, Snow created a cholera map in the 19th century with the goal of extracting the spatial unevenness in the distribution of cholera patients in an outbreak in London, and he used it as the basis for establishing measures for preventing cholera. This is a formulation of what today is called spatial clustering, and its modern applications have been developed as spatial epidemiology, directed toward analyzing risk assessment for infectious diseases and various other diseases.1 “Spatial statistics” has also been applied to many fields, including econometrics and various other policy fields.2

Implementing spatial statistics requires a statistics package for the use of special statistical techniques, but in recent years, R Software3 and FleXScan software,4 which are statistical packages for spatial statistics, have become available free of charge to all. It is also essential to use a graphic information system (GIS). A GIS is a construct for linking text, numbers, images, or the like to a map, creating a reproduction on a computer, and integrating, analyzing, or making an easy to understand map representation of various forms of information from locations and positions; it has been widely used in the fields of disaster management and in business settings. To use a GIS, there is not only commercial software, such as ArcGIS (ESRI; Redlands, CA, USA), but also free software, such as the Quantum GIS (QGIS Development Team; Quantum GIS Geographic Information System. Open Source Geospatial Foundation Project. http://qgis.osgeo.org),5 and environments have been set up for clinicians to allow them to conduct spatial epidemiological research.

Regional clustering can help elucidate the etiology of hematological and oncological diseases, such as adult T-cell leukemia.6 The study of regional clustering is expected to lead to the identification of risk factors and a better understanding of the pathology of these diseases. Since the uneven distribution of diseases is thought to be dependent also on the availability of medical services aimed at the proper diagnosis of hematological diseases, spatial analysis of hematological diseases would also be useful in the field of health policy.7,8

Yamagata Prefecture, which is located about 300 km north of Tokyo with a population of about 1.2 million, boasts a regional cancer registry of the highest precision in Japan, and it is one of the few prefectures where the incidence of cancer can be comprehensively understood. Therefore, this information was used to implement spatial analysis of hematological diseases with a spatial statistics package as a guide to hematologists and oncologists. To encourage physicians to use these methods, this review introduces the methods and demonstrates the analyses using R and FleXScan with sample data.

Software used for statistical analysis

R version 2.14.2 (R Foundation for Statistical Computing, Vienna, Austria) and the packages “spdep”, “Dcluster”, and “classInt” were used. R can be downloaded from the website.3 FleXScan software version 3.1 (FleXScan; National Institute of Public Health, Tokyo, Japan) was used to conduct global clustering tests using Tango’s index.9 The users’ guide can also be downloaded from the website.4

For regression analysis in an econometric model,7 the incidences of diseases in each municipality and the number of hospitals that employ full-time hematologists were shown. These data were collected from interviews with hematology physicians and from the hospitals’ websites.

The age-adjusted disease incidence was calculated using the 1985 model population of Japan10 and the 2008 model population of Yamagata Prefecture.11 The detailed method of spatial analysis using R has been described elsewhere.7,8

Data used for analysis

The data related to hematological malignant diseases including malignant lymphoma, leukemia, and multiple myeloma between 2000 and 2008 were provided by the cancer registry of Yamagata Prefecture. The data included type of disease, date of onset of disease, age, sex, and the cities where the patients lived. The cancer registry in Yamagata Prefecture is of sufficient quality; in 2008, rates of death certificate notification and death certificate only were 18.5% and 5.9%, respectively.12 The data from the registry are included in the IARC (International Agency for Research on Cancer) Scientific Publications entitled “Cancer Incidence in Five Continents”.13

Preparing datasets: first step

As the first step, the data set must be prepared in a “csv file”. Microsoft Excel® (Microsoft; Redmond, WA, USA) is used to prepare a table including the following data as columns: the names of regions or their identifications, the x and y coordinates on a plane rectangular coordinate system, longitude and latitude, the population, incidences of diseases, and the explanatory variable.

The example dataset is shown in Figure 1 and Table 1. It includes the names of the municipalities in Yamagata Prefecture as names and regions, the x, y coordinates of the municipalities on a plane rectangular coordinate system, the longitudes and latitudes of the municipalities, the population, and the incidences of diseases. As the explanatory variable, the number of doctors in the municipalities was included. The age-adjusted disease incidence was used; in Figure 1, it was calculated using the 1985 model population of Japan10 and the 2008 model population of Yamagata Prefecture.11 This dataset was saved as a “csv file” (“blood.csv” in this review).

Figure 1 Preparing the dataset.

Table 1 Example data set for analysis using R in the style of “csv file” format.
Notes: Data set includes the names of the municipalities in Yamagata: prefecture as names and regions including the x, y coordinates of the municipalities on a plane rectangular coordinate system, the longitudes and latitudes of the municipalities, the population, and the incidences of diseases.
Abbreviations: ageadj, age-adjusted; Dr, doctor.

Preparing for the analysis: second step

These are the instructions that were used for the analysis:

  • Go to the Excel Save menu
  • Save your worksheet file as a “csv file” (“blood.csv”) in R work directory (the work directory can be set using the preference menu of R)
  • Close Excel
  • Start R by double clicking on the desktop icon
  • R shows the symbol, then expects input commands
  • Select “Packages” from the main menu, select “Install package(s)”, choose a CRAN (Comprehensive R Archive Network; http://cran.r-project.org) site, and select the “spdep” and “DCluster” packages to download and install.

Conducting the analysis using R: third step

The instructions for spatial analysis with Pearson’s chi-squared test and Tango’s test using R are shown in Figure 2. Tango’s test indicates the presence of disease clustering in hematological diseases.

FleXScan is another useful tool for spatial analysis detecting disease clustering. The results of global clustering tests using Tango’s index by FleXScan are shown in Figure 3. Instructions are available in the users’ guide, which can be downloaded from the website.4 A map of Yamagata Prefecture can be downloaded from the website of freemap (http://www.freemap.jp).

Figure 2 Instructions for Pearson’s chi-squared test and Tango’s test using R with the “spdep” and “Dcluster” packages.

Figure 3 Disease cluster analysis by Tango’s index using crude and age-adjusted disease incidences by region of Yamagata Prefecture.
Notes: Crude (A) and age-adjusted disease incidences using the 1985 model population of Japan (B) and the 2008 population of Yamagata Prefecture (C), by region of Yamagata Prefecture. Disease clusters using crude incidences are shown for Tsuruoka, Sakata, Obanazawa, Mogami, Funagata, Mamuragawa, Okura, Mikawa, Shonai, and Uza (P=0.048). Disease clusters using age-adjusted disease incidences and the 1985 model population of Japan are shown for Yamagata, Kaminoyama, and Takahata (P=0.001). Disease clusters using the age-adjusted disease incidences and the 2008 population of Yamagata Prefecture are shown for Kaminoyama (P=0.001). Points and lines indicate municipalities and their contiguous areas, respectively. Disease clusters are shown by black dots with red lines.

The impact of medical supply on disease incidence can be examined by spatial regression analysis using R with the package “spdep”. Using spatial data, whether the disease incidence as an objective variable has a relationship to the explanatory variables can be tested. The instructions are shown in Figure 4. The detailed information relating to spatial statistics and the method of spatial analysis using R have been described elsewhere.7,8,14

Figure 4 Instructions for spatial auto-regression analysis using R with the package “spdep”.

Usefulness of spatial statistics in hematology and oncology

In this review, spatial statistical analysis was implemented in the field of hematology using the latest techniques. All of the tools used are available free of charge. It was demonstrated that hematology/oncology physicians can implement such an analysis in various settings using these tools to compile the data. One of the advantages of the technique used is that hypotheses on spatial clustering can be tested. This technique enables a spatial statistics investigation of disease clustering, whereas in the past, such clustering could only be estimated visually by plotting the disease incidence.9 This method is useful in that it enables scientific validation of the clinical impressions of patient clustering that clinicians often glean through daily clinical practice.

The present analysis showed that, when adjusted for age, clustering of hematological malignancies in Yamagata Prefecture showed significant accumulation of disease in Yamagata City and its environs. However, in interpreting this result, consideration must be given to the role of health care providers. Specifically, care for hematological malignancies is highly specialized, and diagnosis is difficult in medically underserved regions, such as residential areas that are far from a hospital that has a specialist physician, and there is concern that the incidence of disease might be underestimated. Even this point can be assessed with the technique of spatial analysis presented. Although the present data show that the number of hematologists in a municipality is not a factor clearly related to incidence, it would be possible to assess for each disease a variety of different variables other than the number of specialist physicians in the area, such as the number of hospitals or the number of outpatient visits to specialist hematological departments for each municipality.

A method for analyzing the method of spatial clustering of hematological malignancies is shown. Although the present analysis was performed at the municipality level, it would also be possible to use GIS data of even smaller districts, and an even more detailed spatial epidemiological analysis is also possible.15,16 However, the comprehensive acquisition of cancer information is also limited in that it is only possible to obtain data in places with a highly precise cancer registry such as Yamagata Prefecture. Even this, however, will be solved by the expansion of the cancer registration system.

The etiology of most hematological diseases has not been elucidated. Investigation of these epidemiological aspects may potentially contribute to a better understanding of the etiology of these diseases. In addition, applying the technique presented to the investigation of patient prognoses may enable generation of data that are also useful for solving health policy-related problems, such as the optimal distribution of medical resources.

Acknowledgments

This work was supported by the Institute for Regional Innovation, Yamagata University.

The authors are grateful to Professor Hiroshi Suzuki (Niigata Seiryo University, Niigata, Japan) for critical reading of the manuscript and providing useful discussion. The authors are also grateful to Hidenori Sato (Yamagata University) for support of spatial analysis using R.

Disclosure

The authors declare that they have no conflict of interest in this work.


References

1.

Stevenson M, Stervens KB, Rogers DJ. Spatial Analysis in Epidemiology. 1st ed. Oxford, UK: Oxford University Press; 2008.

2.

Diggle PJ, Ribeiro PJ. Model-based Geostatistics. New York, NY, USA: Springer; 2007.

3.

The R Project for Statistical Computing. (Home page on the Internet). Available from: http://www.r-project.org. Accessed February 12, 2013.

4.

National Institute of Public Health. (Home page on the Internet). Available from: http://www.niph.go.jp/soshiki/gijutsu/download/index.html. Accessed February 12, 2013.

5.

Web page of The Quantum GIS project. (Home page on the Internet). Available from: http://www.qgis.org. Accessed February 12, 2013.

6.

Takatsuki K. Adult T-cell leukemia. Intern Med. 1995;34(10):947–952.

7.

Furuya T. [Statistical Analysis of Spatial Data using R]. Tokyo, Japan: Asakura Shoten; 2011. Japanese.

8.

Web page of Data Sciences for the Resilient Society. (Home page on the Internet). Available from: http://web.sfc.keio.ac.jp/~maunz/wiki/index.php?%B6%F5%B4%D6%A5%C7%A1%BC%A5%BF%A4%CE%C5%FD%B7%D7%CA%AC%C0%CF. Accessed February 20, 2013.

9.

Tango T. A class of tests for detecting ‘general’ and ‘focused’ clustering of rare diseases. Stat Med. 1995;14(21–22):2323–2334.

10.

[The 1985 model population of Japan]. (Home page on the Internet). Available from: http://www.mhlw.go.jp/toukei/saikin/hw/jinkou/suii06/fuhyo.html. Accessed February 12, 2013. Japanese.

11.

[Home page of Yamagata Prefectural Government]. Available from: http://www.pref.yamagata.jp/ou/kikakushinko/020052/tokei/jinkel.html. Accessed February 13, 2013. Japanese.

12.

[Home page of the cancer registry in Yamagata Prefecture 2012]. Available from: https://www.pref.yamagata.jp/kenfuku/kenko/gan/7090005gantouroku.html. Accessed June 11, 2012. Japanese.

13.

Cancer Incidence in Five Continents. Volume IX. IARC Scientific Publications No 160. Lyon, France: International Accreditation Recognition Council; 2007.

14.

Tango T. Statistical Methods for Disease Clustering. New York, NY, USA: Springer; 2010.

15.

[National Land Numerical Information Download Service Japan]. (Home page on the Internet). Available from: http://nlftp.mlit.go.jp/ksj/. Accessed March 12, 2013. Japanese.

16.

[Portal Site of Official Statistics of Japan]. Available from: e-stat http://www.e-stat.go.jp/SG1/estat/eStatTopPortalE.do. Accessed March 12, 2013. Japanese.

Creative Commons License © 2014 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.