Measuring the presence and incidence of cholera in Hindustan: Data from primary sources for the colonial era

Cholera caused one of the most severe pandemics faced by humankind in the 19th and 20th centuries. Exploring a collection of medical reports, we build an harmonized dataset covering 90 years of Cholera spreading in its cradle, Hindustan, from 1814 to 1904.
Published in Social Sciences and Economics
Measuring the presence and incidence of cholera in Hindustan: Data from primary sources for the colonial era

While Cholera continues to persist in certain regions worldwide, its legacy echoes through the pandemics of the 19th and 20th centuries, leaving an indelible mark on our collective consciousness and historical narratives. Not only documented in key contributions by historians like Charles Rosenberg or Patrice Bourdelais, traces of its impact also permeate literature and contemporary popular culture. For instance, Mary Shelley's Frankenstein monster originates in the tragic toll of Cholera deaths in Switzerland, the very setting where the book originated (D'Arcy Wood, 2015). Despite its significant imprint on our history and cultural fabric, the intricate connection between past Cholera outbreaks and their present-day repercussions on health and societal development remains inadequately understood. The dataset we present meticulously catalogs the historical presence and prevalence of Cholera in Hindustan, the cradle of this disease, during the colonial era, offering insights into how this data could enrich our understanding of the complex interplay between the disease and its profound effects on health and economic dynamics.

Our dataset spans from 1814 to 1904 and results from harmonizing various primary sources, predominantly derived from military and medical records. This compilation enables a comprehensive tracking of both the presence and incidence of the disease. Leveraging the administrative geographic districts of the colonial era, we offer multilevel disaggregation, facilitating potential future comparisons with significant datasets from the same period. Furthermore, we  include a conversion to modern administrative boundaries. This conversion simplifies the examination of contemporary issues, allowing for the evaluation of long run societal and economic effects of the disease.

Our methodology

Our dataset encompasses the former colonial region of Hindustan, presently partitioned among India, Pakistan, Nepal, and Bangladesh. It is composed of two distinct parts, each corresponding to specific primary sources utilized in its construction. The initial segment goes from 1814 to 1824, relying extensively on two complementary medical reports pertaining to cholera in early 19th-century Hindustan, written by James Jameson in 1820 and William Scot in 1849. This part enables the separate measurement of cholera incidences among locals and within British military camps, offering detailed temporal and spatial granularity. Data is aggregated on a monthly basis.

The subsequent portion of the dataset covers the period from 1825 to 1905. It relies on a combination of ancient maps illustrating cholera's impact in Hindustan, overlaid onto modern maps encompassing India, Pakistan, Nepal, and Bangladesh. Additionally, this section draws upon the analysis of medical articles and WHO reports concerning the disease. However, because of data limitations, it primarily focuses on documenting the presence of cholera rather than its specific incidence.

One of our key insight is to offer an harmonization of the different sources for what concerns locations’ names across concurrent publications. Due to the often oral transmission of these names, such a task was challenging but fundamental. Thanks to the book of Walter Hamilton published in 1820, we have been able to input geographical coordinates for most of the locations described by Jameson and Scot. For difficult locations without correspondence in Hamilton, we had to obtain their position by extracting the maximum of information about their vicinity from the original document, and using Google earth’s line function to triangulate their position. In parallel, we used a sufficiently detailed ancient map from Rogers (1926) along with Google Map for localization. For what concerns locality homonymy,  the location was determined using Google Maps, triangulating their position by entering all homonyms together with all other locations cited in the same paragraph in the source and selected homonym closest to other cited locations. Through Scot’s and Jameson’s publications, we successfully identified approximately two hundred individual cities, towns, and villages, with only ten locations remaining unidentified.

What we obtained:

We have been able to produce a unique database retracing the presence of Cholera for each year of the period 1814 to 1904 at a very fine grained geographical level: we cover up to 2192 unique clusters of observation, an infra-modern district level. For the period 1814 to 1825, the data are even provided with a monthly frequency, including incidence for British Military camps and for locals. 

Coverage is not uniform across space and time. For a given year, some localities are either not covered (there are no data) while others have a rate of coverage going from 7.7% up to 100%. It means that for each geographical entity having an observation, the data we provide may come from only 7.7% of the localities included in the entity to 100%.  Any future use of the dataset should take this into account. The coverage, at least for the first part of the database, is definitely better for India and Pakistan than for Nepal, for instance.

Despite this, our method allows to recover the spread of the pandemics and its dynamics. In Figure 1, we show how our reconstructed data fit with ancients maps produced by Leonard Rogers  in 1926.

Figure 1: Example of comparison of data with ancient maps from Rogers (1926)


What can be done next: 

 We believe that our dataset holds significant potential to shed new light on existing research. For instance, one could re-examine the productivity gains from Indian railways between 1874 and 1912 (Bogart and Chaudary, 2013) as researchers can now discern how the geographic spread of the epidemic influenced these trends. Additionally, supplementing our dataset with complementary sources might reveal deeper insights into the enduring impact of epidemics in the long term, possibly elevating cholera's significance to a level akin to that of the plague, as explored by Siuda and Sunde (2021). Furthermore, our comprehensive documentation of historical cholera prevalence could offer valuable insights into understanding cultural persistence in India. This extends to areas such as the societal status of women and broader cultural aspects. Ultimately, our dataset serves as a valuable resource for examining the enduring effects of epidemics across various contextual settings.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Economic History
Humanities and Social Sciences > Economics > Economic History
Population and Demography
Humanities and Social Sciences > Society > Population and Demography
Health Economics
Humanities and Social Sciences > Society > Sociology > Health, Medicine and Society > Health Policy > Health Economics

Related Collections

With collections, you can get published faster and increase your visibility.

Remote sensing data for changes in land use

This Collection comprises a series of articles presenting data on changes to land use in urban areas, farmland, forests, and natural environments, as determined using remote sensing techniques.

Publishing Model: Open Access

Deadline: Jan 31, 2024

Ecological data for tracking biological diversity and environmental change

This collection presents data contributions addressing topics in biodiversity and ecology.

Publishing Model: Open Access

Deadline: Jan 31, 2024