Mapping populations at risk: improving spatial demographic data for infectious disease modeling and metric derivation

MISSING IMAGE

Material Information

Title:
Mapping populations at risk: improving spatial demographic data for infectious disease modeling and metric derivation
Physical Description:
Mixed Material
Language:
English
Creator:
Tatem, Andrew J.
Adamo, Susana
Bharti, Nita
Burgert, Clara R.
Castro, Marcia
Dorelien, Audrey
Fink, Gunter
Publisher:
BioMed Central
Publication Date:

Notes

Abstract:
The use of Global Positioning Systems (GPS) and Geographical Information Systems (GIS) in disease surveys and reporting is becoming increasingly routine, enabling a better understanding of spatial epidemiology and the improvement of surveillance and control strategies. In turn, the greater availability of spatially referenced epidemiological data is driving the rapid expansion of disease mapping and spatial modeling methods, which are becoming increasingly detailed and sophisticated, with rigorous handling of uncertainties. This expansion has, however, not been matched by advancements in the development of spatial datasets of human population distribution that accompany disease maps or spatial models. Where risks are heterogeneous across population groups or space or dependent on transmission between individuals, spatial data on human population distributions and demographic structures are required to estimate infectious disease risks, burdens, and dynamics. The disease impact in terms of morbidity, mortality, and speed of spread varies substantially with demographic profiles, so that identifying the most exposed or affected populations becomes a key aspect of planning and targeting interventions. Subnational breakdowns of population counts by age and sex are routinely collected during national censuses and maintained in finer detail within microcensus data. Moreover, demographic and health surveys continue to collect representative and contemporary samples from clusters of communities in low-income countries where census data may be less detailed and not collected regularly. Together, these freely available datasets form a rich resource for quantifying and understanding the spatial variations in the sizes and distributions of those most at risk of disease in low income regions, yet at present, they remain unconnected data scattered across national statistical offices and websites. In this paper we discuss the deficiencies of existing spatial population datasets and their limitations on epidemiological analyses. We review sources of detailed, contemporary, freely available and relevant spatial demographic data focusing on low income regions where such data are often sparse and highlight the value of incorporating these through a set of examples of their application in disease studies. Moreover, the importance of acknowledging, measuring, and accounting for uncertainty in spatial demographic datasets is outlined. Finally, a strategy for building an open-access database of spatial demographic data that is tailored to epidemiological applications is put forward. Keywords: Population, Epidemiology, Demography, Disease mapping
General Note:
Publication of this article was funded in part by the University of Florida Open-Access publishing Fund. In addition, requestors receiving funding through the UFOAP project are expected to submit a post-review, final draft of the article to UF's institutional repository at the University of Florida community, with research, news, outreach, and educational materials.
General Note:
Tatem et al. Population Health Metrics 2012, 10:8 http://www.pophealthmetrics.com/Content/10/1/8 ; Pgs. 1-14
General Note:
doi:10.1186/1478-7954-10-8 Cite this article as: Tatem et al.: Mapping populations at risk: improving spatial demographic data for infectious disease modeling and metric derivation. Population Health Metrics 2012 10:8.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
All rights reserved by the source institution.
System ID:
AA00013234:00001


This item is only available as the following downloads:


Full Text


Tatem et al. Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


* POPULATION HEALTH METRICS


Mapping populations at risk: improving spatial

demographic data for infectious disease

modeling and metric derivation

Andrew J Tatem1'2'3*, Susana Adamo4, Nita Bharti5, Clara R Burgert6, Marcia Castro7, Audrey Dorelien8, Gunter Fink7,
Catherine Linard9,10, Mendelsohn John", Livia Montana7, Mark R Montgomery2'16, Andrew Nelson13,
Abdisalan M Noorl4, Deepa Pindolial24, Greg Yetman4 and Deborah Balk"


Abstract
The use of Global Positioning Systems (GPS) and Geographical Information Systems (GIS) in disease surveys and
reporting is becoming increasingly routine, enabling a better understanding of spatial epidemiology and the
improvement of surveillance and control strategies. In turn, the greater availability of spatially referenced
epidemiological data is driving the rapid expansion of disease mapping and spatial modeling methods, which are
becoming increasingly detailed and sophisticated, with rigorous handling of uncertainties. This expansion has,
however, not been matched by advancements in the development of spatial datasets of human population
distribution that accompany disease maps or spatial models.
Where risks are heterogeneous across population groups or space or dependent on transmission between
individuals, spatial data on human population distributions and demographic structures are required to estimate
infectious disease risks, burdens, and dynamics. The disease impact in terms of morbidity, mortality, and speed of
spread varies substantially with demographic profiles, so that identifying the most exposed or affected populations
becomes a key aspect of planning and targeting interventions. Subnational breakdowns of population counts by
age and sex are routinely collected during national censuses and maintained in finer detail within microcensus
data. Moreover, demographic and health surveys continue to collect representative and contemporary samples
from clusters of communities in low-income countries where census data may be less detailed and not collected
regularly. Together, these freely available datasets form a rich resource for quantifying and understanding the
spatial variations in the sizes and distributions of those most at risk of disease in low income regions, yet at present,
they remain unconnected data scattered across national statistical offices and websites.
In this paper we discuss the deficiencies of existing spatial population datasets and their limitations on
epidemiological analyses. We review sources of detailed, contemporary, freely available and relevant spatial
demographic data focusing on low income regions where such data are often sparse and highlight the value of
incorporating these through a set of examples of their application in disease studies. Moreover, the importance of
acknowledging, measuring, and accounting for uncertainty in spatial demographic datasets is outlined. Finally, a
strategy for building an open-access database of spatial demographic data that is tailored to epidemiological
applications is put forward.
Keywords: Population, Epidemiology, Demography, Disease mapping


* Correspondence andytatem@gmail com
Department of Geography, University of Florida, Gainesville, USA
2Emerging Pathogens Institute, University of Florida, Gainesville, USA
Full list of author information is available at the end of the article
S 2012 Tatem et al., licenee BioMed Central Ltd This is an Open Access article distributed under the terms of the Creative
BiolEt ed Central Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted ue, distribution, and
reproduction in any medium, provided the original work is properly cited.







Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


Page 2 of 14


Introduction to guide funding allocations within the Millennium De-
The spatial modeling and mapping of diseases is increas- velopment Goals (MDGs: WW r :: .: ii nn .i.
ingly being undertaken to derive health metrics, guide framework continues to drive the development of dis-
intervention strategies, and advance epidemiological ease mapping and modeling approaches.
understanding [1]. This has been driven by a recognition Given the high degrees of individual and local hetero-
of the spatial, temporal, and demographic heterogene- geneity within geographic regions or administrative
ities in disease risk (Table 1), and has resulted in .. 5..!- units, effective policy design requires a .I. i l..i know-
cant recent methodological advances (for example [2,3]). ledge of the spatial distribution of relevant population
Moreover, the need for estimates of populations at risk attributes of interest, including size, age, gender, income,


Table 1 Heterogeneities in disease risks


Heterogeneity type
Spatial


Tempora


Demographic and
Socioeconomic


Background information and examples
Understanding relevant spatial heterogeneities underlies our ability to map host risk of pathogen exposure. Predictions
of disease importation or emergence are limited by our ability to distinguish disease-specific hotspots from continuous
risk surfaces. Spatial variation in risk is defined by the specific biology of each host-pathogen relationship.
Epidemiologically relevant spatial heterogeneities can be highly specific to each infection and must be correctly
identified within the proper context of the ecology and landscape of each host-pathogen relationship. Spatia
heterogeneities that impact risk profiles for exposure to a pathogen include large-scale environmental factors, such as
temperature, access to water, and rainfall abundance, which can affect host susceptibility (e.g. within the African
meningitis belt [4]), host exposure (e.g. proximity to malaria vector habitats [5]), and pathogen viability (e.g. cholera
survival in the environment [6]). Within a population, the transmission events of infections drive the spatial progression
of an outbreak after the initial exposure to the pathogen has already taken place. Transmission events are rarely
observed and risk profiles must be constructed using proxies for transmission, again highlighting characteristics specific
to each host-pathogen relationship. Risk profiles for directly transmitted diseases focus on host contacts between
infectious and susceptible individuals. Important components of these contacts are host density, susceptibility, and
mobility. Each of these factors can also be defined across spatial scales, from within household contact patterns to
settlement-level risk factors. Urban and rural residence can be thought of as a basic (yet dichotomized) spatia
heterogeneity that is closely associated with density and landscape, but typically urbanization has not been defined in
spatial terms. Similarly, transmission of vector-mediated infections is impacted by spatial heterogeneities at the
household and community level determined by host density, prevention measures, vector mobility and vector
abundance. Spatial patterns of environmentally mediated infections will also be determined by the host-pathogen
relationship.
Epidemiologically important temporal heterogeneities will also be specific to each infection. For emerging infections,
long-term changes in host settlements, habitat loss, and changing levels of interactions between humans and anima
species interactions can deine the risk of disease emergence over time [7] (e.g. ebola, SARS, monkeypox, HIV, H1N1 and
H5N1 influenza). In other situations, seasonal and environmental factors may determine the population level risk of
pathogen exposure (e.g. malaria vector habitats, hyperendemic areas of meningitis). Short-term risk of infection, or
transmission of a pathogen within a population, is determined by the biology of the relationships between the host,
pathogen and vector. These relationships establish the host susceptibility and infectious periods, and therefore the risk
of transmission events. Population level susceptibility profiles (natural or derived) vary across temporal scales with
respect to prior exposure and preventative measures. Temporal likelihood of transmission will be determined by length
of exposure, and changes in abundance and susceptibility of the host and vector. Exposure and contact rates (density,
migration) over the course of a day (as in commuter patterns for influenza [8]) are additional examples of tempora
heterogeneities in transmission likelihood and risk across temporal scales.
Susceptibility and transmissibility of infectious disease vary across differing demographic and socioeconomic groups due
to differences in immunity, mobility, contact patterns and health status. Small-scale variations in socioeconomic and
demographic factors can have a large influence on the geographical variation of infections compared to environmental
factors. Age represents one of the most I factors, with risk of morbidity and mortality of many diseases varying
substantially across age groups. These include large variations in mortality and morbidity by age for malaria [9] and for
clinical attack risk for dengue [10]. Heterogeneities in susceptibility and transmissibility also exist between the sexes, and
especially during childbearing age for women, when pregnancy increases the risks of death for both the mother and
fetus, and are important for diseases such as congenital rubella syndrome (CRS) [11]. At a population scale, differences in
vital rates such as birth rates create heterogeneities in disease risk across space and time, as evidenced by rotavirus in
the US [12]. For macro-parasite infections, such as helminths, in addition to environmental risk factors, the population at
risk often depends on socioeconomic profiles and access to key infrastructure (housing quality, adequate sanitation and
drinking water). For micro-parasite infections with human-to-human transmission, risk is again associated with individual
socioeconomic attributes, but also with community/neighborhood attributes. In other words, the concentration of
poverty or poor sanitation services increase risk, as evidenced by cholera outbreaks [13]. Finally, in addition to
information on poverty status, knowledge of nutritional status is important; malnutrition can increase (i) susceptibility to
many infectious diseases, (ii) the period of infectiousness (by reducing immune function and delaying recovery) and (iii)
disease associated mortality [14].


Disease morbidity, mortality, and speed of spread vary substantially with demographic profiles, with clear risk groups and vulnerable populations existing. These
have important implications for planning and targeting intervention strategies. The risk of pathogen infection to host populations exists at two spatial levels. First,
there is a probability of initial exposure of a population to a pathogen, which defines the population risk. Second, there is a probability of transmission of a
disease within a population, which defines the individual risk. Within these epidemic and endemic classifications, the implications for interventions vary across
disease landscapes dependent upon the host-pathogen relationships.






Page 3 of 14


Tatem et al. Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


nutritional status, vaccination rates, or child mortality.
From a public health perspective, ,i. r iI,, i spatial data-
sets not only allow investigation of the relationship be-
tween policy inputs and individual-specific outcomes,
but also build detailed and realistic predictive models
and derive suites of health metrics. Disease mapping and
spatial modeling studies have become increasingly
:. ,i..: and sophisticated, with rigorous :, i l.!,; of un-
certainties built in, but are limited when it comes to esti-
mating populations at risk. D. .,i. .i spatial datasets on
population .1. iii,!..!, now exist, but maps of other
demographic and socioeconomic characteristics to iden-
tify vulnerable subgroups remain lacking. To quantify
these spatial variations in population attributes, recent
high-impact studies have had to overlook subnational
demographic variations in characteristics and rely on ap-
plying simple national-scale adjustments, e.g. [5,15-17].
The availability of high-resolution population data has
increased .1. Ii it. lII; through a series of 1..1. 1 popula-
tion mapping efforts over the past 15 years. Initially
restricted to a few countries, i, I: ..- -. :if population
numbers have been made available for the globe over
the past decades through the combined efforts of pro-
jects like the Gridded Population of the World (GPW)
[18], the Global Rural Urban Mapping Project (GRUMP)
[19], LandScan [20], and AfriPop [21](www.afripop.org).
All databases are in the public domain and allow indivi-
duals, companies, researchers, and policymakers to ac-
cess population data either by administrative units or by
user-specified geographic boundaries of interest. While
the generation of these comprehensive population data-
bases clearly constitutes a major achievement from a sci-
entific perspective, two main factors limit the degree to
which these databases can be used for research as well
as for I' i:. and planning: limited time frames and lim-
ited information on population attributes of interest.
The first limitation is mostly the result of the :*,. L,,' ,
collection of detailed population data as well as the ef-
fort required in i. il i'.... global datasets at any given
point in time. Given that most countries independently
collect full censuses only once per decade and data shar-
ing is complicated by a large set of i.,..: .i, issues,
most current population databases contain .. :" .li;..1
data only on a five- or 10-year basis. When analyses war-
rant data for noncensus years, national growth rates
[22], subnational growth rates from National Statistical
Offices, or interpolation between .. nI ,ii. data points
may be applied to produce estimates for intermediate
years, as annual population fluctuations are ._ .... .'
limited.
The second constraint is more critical: little is known
about characteristics of the underlying populations being
mapped in detail. From a planning or research perspec-
tive, these factors can be of critical importance, as


outlined in Table 1. Various freely available datasets exist
to facilitate mapping improvements and add significant
value to epidemiological analyses, but these remain scat-
tered across ,lit, .. ir sources and require processing to
be integrated into mapping. Here we review these
sources of more dc. Ji. i, contemporary, freely available,
and relevant spatial demographic data, focusing on low-
income regions of the world where disease burden is
highest, and put forward a strategy for i-.,.,,i..-. an open-
access database to link the various datasets, tailored to
epidemiological applications.

Usages of spatial demographic data in epidemiology
Population i..r ,i,:,:...: datasets constitute an essential
denominator required for many infectious disease stud-
ies. It is well known that disease transmission is --:.:,o,
focal and heterogeneous (Table 1), partially due to the
clustered nature of population distribution. The epi-
demiology of many diseases makes .,. .. il .... 1 .. I
methods (reliant upon reporting from health facilities)
for i;..t;.i.L populations at risk and disease burden
problematic, particularly in low-income regions [23-25],
while spatial heterogeneity in human population distri-
bution can produce significant effects on transmission
[3,26]. Cartographic and spatial modeling approaches
have proven to be effective in i i ... these factors (e.g.,
[27-29]). Such approaches can help characterize large-
scale patterns of disease spread to evaluate intervention
impact [3] and produce l..l.I'=. consistent measures of
i.....i.i.i. of known fidelity, which often represent the
only plausible method in many African countries where
W..... !i!!.,.... data is incomplete, unreliable, and inconsist-
ent [23,30,31]. As the precision and 1. I ,' of disease risk
mapping and modeling improves, spatial population
datasets that capture these patterns are therefore
required if populations at risk are to be more accurately
.i: 1ii; .1 and disease spread among populations is real-
istically modeled for prediction and prevention purposes.
Uses of gridded il...l., i;i,.i count data in epidemio-
logical studies are documented in Tatem et al. [1] and
Linard and Tatem '-2]. and here we focus on studies
that have attempted to incorporate spatial data on popu-
lation .1,l. 'i .
Applications of gridded population datasets in epi-
demiology have involved estimating numbers of clinical
cases, modeling the spatial progression of an epidemic,
risk mapping and assessing the effects of urbanization,
and the study of diseases ranging from dengue and yel-
low fever to HIV and leprosy. The majority of spatial
I.i..1, ; approaches of infectious diseases have been
based on the environmental correlates of infection, due
in part to the availability of high spatial resolution envir-
onmental data and relative paucity of spatial socioeco-
nomic and demographic data. The most widespread uses






Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


of gridded population datasets in an epidemiological
context have been in the study of malaria. Global spatial
demographic datasets have been used to estimate popu-
lations at risk of malaria, which forms a fundamental
metric for decision-makers at national and international
levels [30,33]. While approaches for mapping malaria
have become increasingly sophisticated (e.g., [2]), those
for mapping population distributions have not kept pace,
especially in low-income regions [1], where detailed
spatial information on population composition is rarely
available or utilized.
Previous studies that have aimed to enumerate vulner-
able population subgroups at risk for different diseases
have solely focused on utilizing simplistic national-level
adjustments. The malaria burden in children under
5 years old was recently estimated based on a Zambia-
wide survey and LandScan population data adjusted by a
national-level estimate of the proportion of under-5 chil-
dren [34]. Similarly, the numbers at risk of malaria glo-
bally in different age groups were estimated by applying
national-level adjustments to GRUMP data [5,29,35,36].
Models of disease prevalence were overlaid onto popula-
tion density maps adjusted by national-level proportions
again to quantify school-age children and young adults
at risk of schistosomiasis [17,37-40] and hookworm
[38,41-43] and the number of pregnant women infected
with hookworm in sub-Saharan Africa [44]. Specific esti-
mates of populations at risk of malaria for pregnant
women and children have also been derived from these
maps, by combining GRUMP data with national-scale
age, sex, and fertility data from the United Nations
Population Division [15,45,46]. Finally, the numbers of
children under 5 with anemia in West Africa were esti-
mated using similar techniques [16]. In all of these


examples outlined here, the problems of overlooking sub-
national variations in population through the national-
level adjustments applied are illustrated in an example in
the next section.
Spatio-temporal transmission models aim to simulate
contacts between infectious and susceptible individuals
and estimate the spatial spread of the disease. This helps
to identify areas and times at risk of disease and assists
in planning targeted interventions [47,48]. Sophisticated
spatially explicit models have been developed to study
the spatial progression of infectious diseases. Many of
such spatially explicit models have made use of gridded
population datasets as input data [3,49]. Gridded popu-
lation data have also been used to develop agent-based
simulation models at the regional level [28,50,51] and at
the global level [52,53]. Whatever the spatial approach
for modeling, population data are essential as these
models, which generally require the generation of a vir-
tual society with an appropriate distribution and com-
position of people [3]. Gridded data are preferred by
these models in that the gridding process removes the ir-
regularity associated with the native administrative units
in which these data were initially reported and thereby
makes the data more flexible for use with a variety of
other spatial units or features. In addition, global (or
continent-level) gridded population data provide valu-
able input datasets mainly because of their wide cover-
age, consistent spatial resolution, and availability in the
public domain. Notably missing as discussed above is in-
formation on population attributes. This represents a
limitation for models that can be substantially improved
through the incorporation of realistic population attri-
butes to build 'synthetic' populations. Previous studies
have had to rely on national-level statistics or the


Page 4 of 14


R R 5.40%o.



a b c

Figure 1 For Tanzania in 2007: (a) P. falciparum malaria transmission classes (adapted from Hay [5], measured by P. falciparum Parasite
Rate (PfPR), (b) percentage of residents under 5 years of age by ward, (c) percentage differences in estimates of number of children
under 5 at risk of the highest transmission class by national- vs. ward-level adjustments.







Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


Table 2 Estimates of numbers of children under 5 at risk
of P. falciparum malaria in Tanzania using the two
differing demographic methods described in the text
Transmission U5PAR1: U5PAR2: Percentage change
Class UN Nationwide Census unit from U5PAR1 to
adjusted adjustments U5PAR2


PfPR<5% 770547
PfPR 5-40% 4315638
PPR >40% 773992
U5PAR= Under-5 population at risk.


650174
3383040
630518


-15.62175961
-21.6097365
-18.5368841


application of census-derived attributes from one coun-
try applied to multiple others (e.g., '-' ] '

Improving estimates of children under 5 years at risk of
Plasmodium falciparum malaria
The lack of availability of subnational spatial datasets on
-... :-.. population groups that are particularly vulner-
able to P. falciparum malaria has meant that simple na-
tional-level adjustments have been applied in influential
studies to estimate the spatial 1 I,.!it.. .. of, for ex-
ample, i lhiil' under 5 [15] or pregnant women [45,46]
at risk. To illustrate the importance of mapping vulner-
able populations to a level of spatial detail approaching
that now used in disease mapping, here we compare
estimates of under-5 Ii.l., .. at risk of P. falciparum
malaria in Tanzania in 2007 using transmission risk
classes (Figure 1(a))[5] overlaid onto a population distri-
bution map (,ww'. ,i-.''i.. i adjusted to represent
children under 5 by (i) applying a single nationwide per-
centage adjustment as defined by the UN's World Popu-
lation Prospects [22] (as undertaken in [15] for
Tanzania, percentage under 5 is estimated to be 17.9%)
and (ii) applying per-district proportions of under-5 chil-
dren derived from ward-level census data T'., .. l(b)).
Table 2 shows the ::trri...... in estimates of under-5
I '.i.. residing in each transmission class, with large
percentage ill:0 found in each transmission class.
We do not examine the spatial patterns of i:ii. i. .i.
here, as this is beyond the scope of this analysis, but they
remain an :..i I;.._ area for exploration [1]. Overall,
the adjustments from finer-scale age distributions indi-
cate that national-level estimates .,I. -t ::: )II overesti-
mate numbers at risk. Whereas Table 2 summarizes the
overestimates by transmission classes, Figure 1(c) shows
the spatial pattern of the misestimation in the 1,:. I
transmission class. It shows the percentage i:ll. i. .i.
obtained in estimates of under-5 children at risk of PfPR
> 40% transmission level (mapped in Figure l(a)) result-
ing from use of the ward-level map of children under
5 years rather than from applying a single nationwide
:.:I'.. ...'-, Most of these wards show ::,rrt.......- above
25%, and several have discrepancies of greater than
100%. These malaria transmission maps and the


populations at risk estimates derived from them are in-
creasingly being used to guide planning, policy, and con-
trol. Such substantial .i *. .... in estimates of
populations at risk, achievable through the use of an
improved spatial demographic composition data, illus-
trate the urgent need to develop spatial databases of vul-
nerable populations.

Spatial demographic data to meet needs
From an epidemiological and health metrics perspective,
fundamental characteristics are age and sex. The most
commonly needed age-sex specific groups in developing
countries are: infants, I,,ii.!.. under 5, women of child-
bearing ages, and the elderly (Table 1). More specific
needs might require the population of pregnant women,
young adults, or urban .!.Il.:I.., Even though these
numbers can generally be approximated by multiplying
total population numbers by estimated national popula-
tion fractions, the large ... I. ...i...... li.. important het-
.1: ... -i4 in population composition generated by
migration and .1:" .. --1 .1 mortality and birth rates
within countries and regions, and particularly between
urban and rural residents, is likely to induce substantial
degrees of imprecision in resultant output metrics (see
previous section). The problem becomes even more se-
vere when researchers or policy makers are primarily
interested in nondemographic aspects of the population.
In many cases, the main variable of interest may be a
fraction of the population with certain health or behav-
ioral characteristics: the number of. li.l. .. not vacci-
nated, the number of women without access to
contraceptives, the number of children not going to
school or not receiving formal health care. Many of
these characteristics are not census-based, but rather
can be ascertained through survey data, an aspect that
we shall discuss in further detail below. Clearly it is not
feasible for global population databases to generate on-
demand maps for each of these factors on a regular
basis, nevertheless the potential to leverage current
freely available population databases appears large.
Table 3 documents the principal datasets that are .. .1 i;
available without cost across multiple countries to
achieve this.
Census data form the basis of existing spatial demo-
graphic databases [19,20], and such population and hous-
ing censuses are undertaken for almost all countries in
the world, including developing countries, generally
every 10 years (the date of past and upcoming planned
censuses are available here: http://,': ,i 0..... L-/unsd/
demographic/sources/census/censusdates.htm), but these
provide only population counts. A range of other popula-
tion-attribute information is ..- .:i : .. I!. .. r..: during
population censuses such as age, gender, urban/rural
residence, and migration information, and, for the


Page 5 of 14








Tatem et al. Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8





Table 3 Sources of freely available spatial demographic data


Data (standard
survey name)/
source
Census
National Statistica
Offices


Census Microdata

https://internationa:.
ipums.org/
international/



DHS
and Health Survey)
Household, women
15-49, men
15-59, children born
in the last five years
http://www.
measuredhs.corm/


MICS (Multi-indicat
cluster survey)
http://www.unicef.
org/statistics/
index 24302.html


Time intervals Typical spatial coverage


Typically 10 years


Typically 10 years


Varies by country,
typically every 5 years


UNICEF (Round 2,
1999-2001; round 3
2005-2007; round 4
is in the field
2009 present)


Census enumerator area
or courser level


Admin 1-3


National, Admir
I/region, GPS c
of cluster locati
most recent sur
(last 15 years)


Typical strata


Urban/rural, race
or ethnic
groups (often)


Urban/rura


aordinates
)ns for
veys


National, Admi


Urban/rura


Urban/rura


Relevant variables


Sex, age, education,
migration status, household
and dwelling characteristics


Household and
dwelling characteristics,
sex, age, education,
migration status, children
ever born, children surviving


Household and
dwelling characteristics,
sex, age, education,
maternal and child
health, fertility and ful
birth history, family
planning, domestic
violence,
biomarkers, nutrition



Household and
dwelling characteristics,
sex, age, education,
status, maternal and
child health, child
labor, domestic
violence, summary
birth history, anthropometry


LSMS (Living Standard
Measure Survey)
(Integrated Household
Budget Survey and
many others that
are locally adapted)
http://research.
worldbark.org/isms/
IsmssurvevFinde.htm


regular


National, Admin 1,
some GPS coordinates


Urban/rura


Household and
dwelling characteristics, sex,
age, education, migration
status,consumption,
expenditures, income,
nutrition,anthropometry,
summary birth history


Varies by country,
typically every 3 years


National, Admin
I/region, GPS coordinates
of cluster locations for
some surveys (last five years)


Urban/rura


Household and
dwelling characteristics,
sex, age,
education, biomarkers


Page 6 of 14


MIS (Malaria
indicator Survey)
http://www.
measuredhs.com/

http://www.
malariasurveys.orc








Tatem et al. Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8





Table 3 Sources of freely available spatial demographic data (Continued)

AIS (AIDS Indicator
Survey)


hltp://v
neasur


is.com/


DHS
and Health Survey)
Household, women
15-49, men 15-59,
children born in the
last five years

http://www.
measuredhs.cornm/


MICS (Multi-indicate
cluster survey)

http://www.unicef.
org/statistics/
index 24302.htm


Varies by country,
typically every 3 years


Varies by country,
typically every 5 years


UNICEF (Round
2, 1999-2001; round
3 2005-2007; round 4
is in the field
2009-present)


National, Admin
I/region, GPS coordinates
of cluster locations for
some surveys (last
eight years)


National, Admin l/rec
GPS coordinates of
cluster locations for
most recent surveys
(last 15 years)


National, Admi


Urban/rura


urnan/rura


Urban/rura


Household and
dwelling characteristics,
sex, age,
education, biomarkers


nousenola ana
dwelling characteristics, sex,
age, education, maternal
and child health, fertility and
full birth history,
family planning,
domestic violence,
biomarkers, nutrition



Household and
dwelling characteristics, sex,
age, education, status,
maternal and child health,
child labor, domestic
violence, summary birth
history, anthropometry


LSMS (Living Standard
Measure Survey)
(Integrated Household
Budget Survey and
many others that are
locally adapted)


MIS (Malaria Indicate
Survey)

http://www.
measuredhs.com/

http://www.
malariasuJveys.org/



AIS (AIDS Indicator
Survey)
http://www.
measuredhscom/


regular


National, Admin 1,
some GPS coordinates


Varies by
country, typically
every 3 years


Varies by
country, typically
every 3 years


National, Admin
I/region, GPS coordinates
of cluster locations for
some surveys (last five years)



National, Admin
I/region, GPS coordinates
of cluster locations for
some surveys (last
eight years)


Urban/rura


Urban/rura


Urban/rura


Household and
dwelling characteristics, sex,
age, education, migration
status, consumption,
expenditures,
income, nutrition,
anthropometry, summary
birth history


Household and
dwelling characteristics, sex,
age, education, biomarkers


Household and
dwelling characteristics, sex,
age, education, biomarkers


Page 7 of 14


http://iresear
worldbank.or
IsmssurveyFir


Isms/
er.htm







Tatem et al. Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


';. "'1
!;" -, : .,'
r- .


CI-


~ ., "- "' -

, --
r4 -
- E^"^'


'.\c, .^ ._,
>'r" T '
rW $


\




II]
-S
-42


Figure 2 Maps showing the availability of useful demographic datasets for deriving subnational estimates of population attributes.
(a) Numbers of census microdata records maintained at the International Public Use Microdata Series repository 1 i /
international/), (b) combined numbers of Demographic and Health Surveys (DHS), Malaria Indicator Surveys (MIS), and AIDS Indicator Surveys (AIS)
conducted for each country, (c) combined numbers of DHS, MIS, and AIS with GPS cluster coordinates available.



majority of countries, made available in some form on difficult to obtain. An addition to the aggregated full cen-
national statistical office websites. This information sup- sus data are large samples of household-level records
plies a series of single population characteristics at what- derived from censuses (census microdata) that provide
ever level of geographic detail is made available by the age and sex structure, as well as many other compos-
National Statistical Office. Often, this information is itional measures, reported generally by administrative
available through data tables aggregated at coarse admin- level 1 (e.g., province) or 2 (e.g., district). These data
istrative levels, however, and full-detail datasets can be keep information about households intact so that


Page 8 of 14


11 J


-Il
I1m,


A;"






Tatem et al. Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8



Table 4 Components of relational spatial demographic database based on freely available datasets


Feature
National boundaries
Administrative boundaries
DHS boundaries
Coastlines
Water bodies
Land cover
Protected areas
Urban extent
Settlement locations
Elevation and slope
infrastructure


Example dataset
SALB
GADM
MEASURE DHS
GBWD
SWDB
GlobCover
WDBPA
MODIS
NGA Geonames
SRTM


Example dataset source
www.unsalb.org
www.gadm.org
www.measuredhs.com
http//dds.cr.usgs.gov/srtm/
http//dds.cr.usgs.gov/srtm/version2 1/SWBD/
www.ionial.esrin.esa.int
www.wdpa.org
http//www.sage.wisc.edu/people/schneider/research/data.htm


www.earth-info.nc
www.srtm.csi.caiar


a.mil/qns/htm


www.ciesin.columbia.edu/


uence/display/roads/GlobalRoadsData


combinations of variables can be made. The largest re-
pository of such data is the International Public Use Micro-
data Series (https://international.ipums.org/international/)
and the data held there are mapped in Figure 2(a).
While census aggregates and census microdata sam-
ples are typically large enough to cover small or moder-
ately sized geographic areas, they are only carried out
approximately every 10 years and are limited in content.
Survey data offer much richer content on shorter time
intervals but are limited in spatial coverage (e.g., Figure 2
(b)). In most high- and low-income countries, geo-
referenced household-based surveys are collected on a regu-
lar basis. These surveys contain detailed local population


-n d C o asE t lines (pol I


Udmins


Boun b


DISadSre
Bonareb


characteristics for a finite number of locations, which
could be used to generate characteristic-prevalence sur-
faces for a given country and year. Overlaying these
characteristic surfaces with population estimates would
likely become an invaluable tool both for researchers
and policymakers.
Data on a rich variety of population attributes can be
obtained from a range of international household survey
programs, each of which is listed in Table 3. These pro-
vide subnational urban (or rural) age and sex structures,
educational compositions, employment information, and
countless other socioeconomic and health indicators at
the level of subnational regions. Large household survey


Wtflfl





P e Areas,





UR


Figure 3 Design of a relational spatial demographic database. Table 4 pr


Page 9 of 14


ides details on each layer


Cove
Land Ll



Urban Extents




Settlement
locations


Slope






Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


programs such as the Multiple Indicator Cluster Surveys
(MICS) and Demographic and Health Surveys (DHS)
listed in Table 3 are reasonably well standardized and
cover many low-income countries (Figure 'I.: with
multiple rounds in each country. Additionally, in most
recent DHS data sets, the survey clusters have been geo-
coded (Figure 2(c)). In order to protect the confidential-
ity of survey respondents, cluster locations are randomly
displaced by up to 2 km in urban areas and 5 km in
rural areas. Moreover, in rural areas, the DHS cluster
locations provided can represent large and potentially
heterogeneous areas. The existing approaches using
DHS data have not taken advantage of spatial modeling
to expand the use of DHS data below the survey region
level (usually administrative level 1). The geocoded clus-
ter data from the DHS allow for data regrouping to dif-
ferent levels of representativeness while still respecting
the sample frame. While MICS now regularly collect
geocoded data, these datasets are not available to enable
mapping of the data at finer level than the survey region
level.
Each of the datasets described here and listed in
Table 3 report demographic information aggregated by
named administrative units. Rarely, however, are spatial
data on the boundaries of these units provided with the
data. For spatial analysis, therefore, GIS boundary data-
sets must be found that match the reported administra-
tive units. This is often a nontrivial task given regular
boundary changes over time, alternative names, and mis-
matches with national boundaries. The initiation of open
access repositories of standardized administrative
boundary datasets (e.g., GADM: http://wwwv.gadm.org/),
and documented histories of changes (e.g., http://www.
statoids.com/) simplifies such operations. Moreover,
DHS also shares the geography for their surveys on
request.

Designing a spatial demographic database
The datasets .1 .. .,..I in the previous section are pres-
ently scattered across disparate sources (Table 3). To
better fulfill the needs of disease ,.ll, ,.._ and carto-
graphic-style derivations of health metrics, we propose
the construction of a spatial database. The construction
of this database would involve not only the housing of
the disparate demographic datasets in a central open ac-
cess location, but also their linkage to GIS datasets to
enable the construction of spatial datasets representing a
variety of _[:. ..:.-!,,.i,,:,. :. relevant variables. The re-
cent development of spatially enabling tools for database
servers, such as PostGIS (http://postgis.refractions.net),
which provides support for geographic objects in object-
relational databases, provides the ideal framework for
construction of the database. The database would be
hosted on a central server and accessed through an


interactive web portal. Table 4 outlines the spatial data-
sets that would be included in a database to spatially
reference the datasets in Table 3 and provide .iJi.i..... i
information to increase mapping capabilities. The frame-
work spatial data are open-access GIS datasets that can
be reused by multiple organizations for different pur-
poses. Figure 3 outlines how these datasets link together
in the relational spatial database. The key objectives of
this database would be to:

1. Provide disaggregated spatially-referenced data on
population sizes and characteristics such as age, sex,
urban/rural location, and education
2. Facilitate data sharing between differing platforms
and demographic mapping projects
3. Provide a high degree of transparency,
documentation, and flexibility with respect to data
sources and the treatment of uncertainty

The database is designed to encourage data sharing,
built in a manner that can be .11.; : 1 across different
nodes, with standardized, ,. .I.1 ."i.. representations.
For example, whilst the GRUMP and A I'.p project
outputs take .:irr...... forms and use differing modeling
techniques, each is built upon standard representations
(national boundaries, coastlines, administrative units)
and aim to use the most i. I !il..: and contemporary
population data available. A standardized database
framework would encourage sharing of new and
improved datasets between projects, benefitting a range
of user groups. By I in.I. !! in I::: !: i levels of access
control, new datasets can be reviewed and processed be-
fore release to a wider user community, and also data-
sets that remain .i...:. i i1 can be i......lii. in their
accessibility.
Documentation of all aspects of the data and database
structure are key to ensuring ease of use, integration
with epidemiological applications, and accessibility to a
wide user community. This .. Ii focus around database
version control, the development of a data dictionary,
with full documentation of the datasets archived within
it, and metadata accompanying the GIS-related data-
sets. The i,.t. :. n between spatial metadata and a
data dictionary must be made: they are much .:rr .....-
and both are necessary. A data dictionary is needed to
understand the shortened name and values of particular
variables, for example, whereas the metadata speak to
the spatial lineage and quality of the data. Some institu-
tional or database history is sometimes warranted, for
example, when data collection for a given variable has
changed. Ensuring that this documentation is oriented
toward the user through full explanation of assumptions
made and quality issues with the data provided will be
important. Moreover, the construction of a library of


Page 10 of 14






Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


tools and techniques for analyzing and using the data
within the database using a forum and other mechan-
isms, such as a code repository, will facilitate ease of
use. i -. iA the provision of quantitative measures of
data quality and uncertainty will be of great importance.
This could range from basic information of the timeli-
ness, resolution, spatial uncertainty, and standard errors
of datasets that enable informed interpretation of result-
ing mapped products to the more rigorous handling
and measurement of uncertainties (see next section).

New methods, data, and future challenges
We have so far outlined datasets and a basic framework
for >....li:.. them to meet immediate disease modeling
and health metric demographic needs. Several opportun-
ities and possibilities exist however with which to
supplement and improve the scope and accuracy of
resources available. Here we outline these briefly,
along with future challenges likely to be faced in
implementation.

Urban populations
The health divide between urban and rural populations
has been well documented [54], as have the increasing
levels of urbanization around the world [55]. While
much has been done in the past 10 years to refine popu-
lation data that delineates urban areas [19,56,57], much
less is known about populations within cities even within
relatively coarse divisions (city center, suburban, peri-
urban) or by slum dwellers and others (much also
remains to be learned about population i .I!:.n,..i
within rural areas). Africa, in particular, will undergo
rapid urbanization in the coming decades ]. yet the
data record to understand the demographic variation and
health conditions in these cities, let alone changes in dis-
ease transmission that may result from urban change, is
largely absent. If it is important to understand what is
happening within urban areas, even the currently available
cluster-level data of the DHS program (Figure 2(c)) is in-
adequate. While the DHS program and some other sur-
veys have focused on i..1I ...i larger urban samples,
there remains a need for large samples of urban popula-
tions to permit city-specific analyses.. .: i:.... ilI%, the def-
inition of urban is not standard across the DHS countries.
There is reason to believe that even greater heterogeneity
of health and socioeconomic characteristics exist within
urban areas.
As with the demographic datasets discussed already,
there are a variety of disparate spatial datasets on urban
populations that could be brought together to get a bet-
ter perspective. City population counts for cities with
populations of 100,000 and above are produced by the
UN Population Division World Urbanization Prospects
[55]. Alternative sources of city and settlement


population sizes include the City Population website
(wwv i'- I ',,i ,! .:: while cl,'i :...r projects are fo-
cused on mapping city extents, which these counts could
be matched to (e.g., [19,58], www.afripop.org). There still
exist significant gaps, however, such as time-series of
urban spatial extents, which would facilitate the develop-
ment of ways to forecast changes in urban extents. Also,
information on properly defined neighborhoods within
cities is important, such as within-household and
within-neighborhood population density, but so are
other contexts (e.g., schools).

Subnational spatial and temporal projections
Most low-income countries do not produce population
projections, or forecasts, at a subnational level. Even the
United Nations Population Division's urban population
projections [55] do not produce city-level population
projections. Yet the demographic inputs for generating
subnational estimates and projections are increasingly
becoming available. Subnational projections are now
being undertaken at least for very large countries (for
example, India and China) and for small and large cities
in the developing world [59]. For the latter, the forecast-
ing method departs from the ti :!.'.:.. -: cohort-compo-
nent method and instead uses longitudinal data on cities
and subnational estimates of demographic rates (urban
fertility, mortality, and migration) derived from survey
and census microdata in an econometric model of city
growth [60]. These new approaches depend on harnes-
sing old data in a spatial framework. The spatial frame-
work allows disparate units to be linked in new ways,
.-i.-'.' new estimates and projections. Because these
methods are largely probabilistic and derived from mod-
eling exercises, the uncertainty associated with these
estimates should also be characterized.

Quantifying uncertainty
The variety of ages, spatial resolutions, and sample sizes
of input demographic data translates to great variations
in accuracies and uncertainties of any output gridded
demographic data products, and this is rarely acknowl-
edged [1]. The most basic level of quantification and
communication of this uncertainty to users involves the
provision of information on input datasets and methods
used in construction, such as is undertaken for GPW,
GRUMP [19], and AfriPop [21] (ww. ;i,. .....
J.i. Ill;, a more rigorous quantification of the uncertainty
inherent in output gridded demographic datasets should
be undertaken. The rigorous bi :". i-- and propagation
of uncertainty through a mapping process is now regu-
larly undertaken in disease risk mapping within a Bayes-
ian framework (e.g., [2,16,17]), .. -,..l1(;!. in full posterior
prediction .1. t.i,:'r., for each grid cell, providing
flexibility in the derivation of differing uncertainty


Page 11 of 14






Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


metrics and enabling the production of accompanying
uncertainty maps. Undertaking an equivalent approach
for deriving accompanying uncertainty maps for demo-
graphic datasets would require consideration of the in-
put datasets and output requirements. For instance, this
could take the form of estimating the uncertainty in
gridded population distribution mapping from census
data summarized by administrative unit. Here, two of
the major sources of uncertainty are the age of the cen-
sus data in relation to the output prediction year and the
size of the administrative units relative to the population
sizes within them and output grid cell size. While uncer-
tainty in temporal projection of census data is relatively
well studied, the spatial aspect remains unexplored. For
example, gridding a 50,000 km2 administrative unit con-
taining 1 million children under age 5 to 30 arc second
resolution results in greater uncertainty about the popu-
lation size and composition residing in each grid square
than does the same resolution gridding of a 1,000 km2
administrative unit containing 10,000 (:,il.. i, under age
5. Based on simulating all possible permutations of grid
square composition, bound by the limits imposed by the
original administrative unit size and vulnerable popula-
tion totals, per-grid square measures of spatial uncer-
tainty in composition that relate directly to the L. I.1.i;
1iii, l..,.l,.1 .1 could be derived. Secondly, the availability
of the DHS cluster locations opens up the possibility of
estimating surfaces of variables with associated uncer-
tainty, and this is discussed below.

Gridding household survey data
The availability of the GPS coordinates of DHS clusters
(Figure 2(c)) has prompted several studies to utilize
geostatistical approaches to derive continuous estimated
surfaces of variables of interest. However, survey data
are *.-.- .: ......i. .(.-_: to be nationally representative
and, as such, their sampling frames may not lend them-
selves to finely resolved geographic grids. The DHS pro-
gram has been a leader in i..11 i,- and j..... i:,,!
geocoded information of the survey clusters in :-:1.: ir ..-,
to their standard data files in which data can be tabu-
lated by first-order subnational regions as well as urban/
rural classification. Early examples have demonstrated
the value of such approaches for deriving continuous
maps of variables of interest from geolocated DHS clus-
ter data. For example, Gemperli et al. [61] investigated
spatial patterns of malaria endemicity as well as socio-
economic risk factors on infant mortality in Mali using a
Bayesian hierarchical geostatistical model. Meanwhile,
Soares and Clements [16] used a similar approach for
anemia mapping. However, these approaches did not
take account of the DHS sampling design or the random
spatial displacement that cluster data undergo, and over-
coming these issues should be a priority for future


applications [62]. Apart from utilizing the cluster loca-
tion, the subnational regions supply information that
can be used with more finely resolved grids. One
approach that uses spatial coverage of census aggregates
combined with the attribute breadth in survey data is
that of the Poverty Mapping efforts [63]. But this is not
necessarily the only approach to consider.


Migration and mobility mapping
Very little is known about migration and mobility within
countries, which may occur seasonally and p" .... i,. d.i
as well as permanently, except through case studies and
qualitative place-specific analyses. Disease modeling and
health metric derivation, as well as demographic ana-
lyses, increasingly require information on migration and
mobility [64-66]. These data are the weak link of the
demographic record even the stock estimates of sub-
national migration have been largely ignored. Disease
modelers often want to know about daily movements ra-
ther than decadal ones, but the decadal moves may be
important to evaluate for changes to place-specific vul-
nerability of residents. Decadal moves should be exam-
ined more closely with existing survey and census
microdata; characterizing more frequent moves will re-
quire data collection methods that depart from the
standard demographic tool kit. Use of new data, such as
spatial locations derived from GPS tracking devices [67]
and cell phone usage [68] may show promise. However,
to be useful, methods for using these data in combin-
ation with more standard demographic data will be
necessary.


Conclusions
Growing trends in research and funding for disease
mapping and spatial modeling to derive health metrics
and guide strategies are increasing needs for spatial
demographic data of similar scope and quality for use in
estimating sizes and characteristics of populations at
risk. However, existing spatial demographic databases
are often based on coarse resolution and outdated input
and lack any consideration of population attribute map-
ping. These drawbacks are likely .... ili.1I... to sub-
stantial uncertainties in disease modeling and health
metric outputs [1]. Here we have shown that datasets to
rectify this exist but remain scattered across multiple re-
positories and websites, requiring collation into a central
open-access database to become more widely used and
build on the strengths of each data type, overcoming
temporal, spatial, and attribute limitations. We have put
forward a basic database design here to achieve this and
lay the foundations for undertaking ::, .-!.:: mapping of
population attributes for ; l.... ; spatial demographic
data in disease studies.


Page 12 of 14








Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


Competing interests
The authors declare that they have no competing interests


Author's contributions
AJT conceived and designed the manuscript Al" authors contributed to
writing the manuscript and have read and approved the final manuscript


Acknowledgments
A J is supported by a grant from the Bil & Melinda Gates Foundation
(#49446), which also supports DP AJT acknowledges funding support from
the RAPIDD program of the Science & technology Directorate, Department
of Homeland Security, and the Fogarty International Center, Nat:ona
institutes of Iealth CL is supported by a grant from the Fondation Philippe
Wiener Maurice Anspach This paper is the result of a working group
meeting held in May 2011 in New York, funded by the RAPIDD program of
the Science & Technology Directorate, Department of Homeland Securty,
and the Fogarty international Center, National Institutes of Health AD
received financial support from the National institute for Child health and
Human Development, Grant No R24HD047879 AMN ,s supported by a
Welcome Trust Intermediate Research Fellowship (##095127) SA is
supported in part by NASA under contract NNG08 1 r11C or the continued
operation of the Socioeconomic Data and Applications Center (SEDAC) CRB
is supported by the US Agency for Internationa Deve:opment-funded
MEASURE Demographic and Health Survey

Author details
Department of Geography, University of Florida, Gainesville, USA 2Emerging
Pathogens Institute, University of Florida, Gainesville, USA Fogarty
international Center, National Institutes of Health, Bethesda, USA 4Center for
International Earth Science Information Network (CIESIN), Columbia
University, New York, USA sEcology and Evolutionary Biology, Princeton
University, Princeton, USA "Demographic and Health Surveys, Internationa
Health and Development Division, ICF International, Washington DC, USA
'Department of Global Health and Population, Harvard School of Public
Health, Boston, USA office of Population Research and Woodrow Wilson
School of Public and International Affairs, Princeton University, Princeton,
USA I C Control and Spatial Ecology, Universite Libre de Bruxelles,
Brussels, Belgium adFonds National de la Recherche Scientifque (F RS-FNRS),
Brussels, Belgium and Information Services of Namibia,
Windhoek, Namibia Population Council, New York, USA
international Rice Research Institute, Los Banos, Philippines 14Malaria Public
Health and Epidemiology Group, Centre for Geographic Medicine, KEMRI -
University of Oxford -Wellcome Trust Research Programme, Nairobi, Kenya
SSchool of Public Affairs, Baruch College, City University New York, New
York, USA "Department of Economics, Stony Brook University, New York,
USA

Received: 28 October 2011 Accepted: 27 April 2012
Published: 16 May 2012


References
STatem A, Campiz N, Gething P, Snow R, Linard C The effects of spatial
population dataset choice on estimates of population at risk of disease.
Popu/ Health Meincs 2011, 9:4
2 Patil AP, Gething PW, Pi e FB, Hay SlI Bayesian geostatistics in health
cartography: the perspective of malaria. Trends Fcrasito/ 2011, 27:246-253
3 Riley S Large-scale spatial-transmission models of infectious disease.
Science 2007, 316:1298-130 1
4 Molesworth AM, Thomson MC, Connor 5J, Cresswell MP, Morse AP Shears
P, Hart CA, Cuevas LE Where is the meningitis belt? Defining an area at
risk of epidemic meningitis in Africa. ITrns R Soc Top Med 'Hg 2002,
96:242-249
5 Hay S, Gueir CA, Gething PW, Pati AP, Tatim AJ, Noox AM, Kabaia (iW,
Mlanh Bi Elyazar IRF, Brooker SJ, et ol World malaria map: Plasmodium
falciparum endemicity in 2007. PLoS Med 2009, 6:e1000048
6 Vezzulli 1, Pruzzo C, Huq A, Colwell RR Environmental reservoirs of Vibrio
cholerae and their role in cholera. Environ Microbioi Rep 2010, 2:27-33
7 Jones KE, Patel NG, Levy MA, Storeyguard A, Balk D, Gittleman JL, Daszak P
Global trends in emerging infectious diseases. Nature 2008, 451:990-994


8 Viboud C, Bjornstad ON, Smith DL, Simonsen L, Miller MA, Grenfell BT
Synchrony, waves, and spatial hierarchies in the spread of influenza.
Science 2006, 312:447-451
9 Smith DL, Guerra CA. Snow RW. Hay Sl Standardizing estimates of the
Plasmodium falciparum parasite rate. Mo!ar J2007, 6:131
10 Egger JR, Coleman PG Age and clinical dengue illness. Energ Inf ct Dis
2007 13:924-925
11 Miller E, Cradock Watson JE, Pollock TM Consequences of confirmed
maternal rubella at successive stages of pregnancy. Loncet 1982, 2:781-784
12 Pitzer VE, Viboud C, Simonsen L, Steiner C, Panozzo CA, Alonso WJ, Miller
MA, Glass RI, Glasser pW, Parashar UD, Grenfell BI' Demographic variability,
vaccination, and the spatiotemporal dynamics of rotavirus epidemics.
Science 2009, 325:290-294
13 Talavera A, Perez EM Is cholera disease associated with poverty? j infect
Dev Ctries 2009, 3:408-411
14 Allison SP Malnutrition, disease and outcome. Nurition 2000, 16:590-593
15 Grthing PW, Kiiui VC, Alegana VA, Okiro EA, Noor AM. Snow RW' Estimating
the number of paediatric fevers associated with malaria infection
presenting to Africa's public health sector in 2007. PioS Med 2010, 7:
el 000301
16 Soares Magalhaes P, Clemenlts ACA Mapping the risk of anaemia in
preschool-age children: the contribution of malnutrition, malaria and
helminth infections in West Africa. P /oS Med 2011, 8:e1000438
17 Schur N, Hurlimann E, Garba A, Traore MS, Ndir 0, Ratard RC, Tchuente LT,
Kri.sensen TK, Utzinger J, Vounatsou P Geostatistical model-based
estimates of schistosomiasis prevalence among individuals aged
<20 years in West Africa. PLoS Negi Trop Dis 2011, 5:e 194
18 Deichmann U, Balk D, Yerman G Transforming population data for
interdisciplinary usages: from census to grid. 2001, Documentation for
GPW Version 2 available only at httpy/sedac ciesin columbia edu/pluegpw
GPW documentation pdf
19 Balk Di, Deichmann U, Yetman G, Pozzi F, Hay SI, Nelson A Determining
global population distribution: methods, applications and data. Adv
Parastol 2006, 62:119-156
20 Dobson JE, Bright EA, Coleman PR, Durfee RC, Worley BA LandScan: a
global population database for estimating populations at risk.
Photogranrn Eng Remote Sens 2000, 66:849-857
21 Linard C, Gilbert M, Snow RW, Noor AM, Tatem AJ Population distribution,
settlement patterns and accessibility across Africa in 2010. PLoS One
202, 7:e3 743
22 United Nations Population Division' Wrblid population prospects 200
revision New York United Nations; 2010
23 Gerhing PW, Noor AM, Gikandi PW, Ogara EAA, lay SI, Nixon MS, Snow RW,
Atkinson PM Improving imperfect data from health management
information systems in Africa using space-time geostatistics. PioS Med
2006, 3:e271
24 Health Metrics Network Statistics sove lives Strengthening country heo!th
information systems Geneva WHO Health Metrics Network; 2005
25 Murray CJL, I pez AD. Wibulpolpiasert S Monitoring global health: Time
for new solutions. B! Med J 2004, 329:1096-1100
26 Kubiak RJ, Arinaminpathy N, McLean AR Insights into the evolution and
emergence of a novel infectious disease. PoS Corput B io! 2010,
6:el 000947
27 Brooker S, I ay SI, Bundy DA Tools from ecology: useful for evaluating
infection risk models? Trends Poasito/ 2002, 18:70-74
28 Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, Meeyai A,
lamsirithaworn S, Burke DS Strategies for containing an emerging
influenza pandemic in Southeast Asia. Nature 2005, 437:209-214
29 l ay SI, Okiro EA, Geihing PW, Patil AP, TaTem AJ, Guerra CA, Snow RW
Estimating the global clinical burden of Plasmodium falciparum malaria
in 2007. PLoS Med 2010, 7:e 00029
30 World health Organization' The Word Malaria Report Geneva' World health
Organization; 2008
31 Cibulskis RE, Bell D, Christophel EM, Hii I, Delacollette C, Bakyaita N, Aregawi
MW Estimating trends in the burden of malaria at country level. Anmirop
Med hfyg 2007, 77:133-137
32 Linard C, Tatem AJ Large-scale spatial population databases in infectious
disease research. int J Heal Geogr 2012. 11:7
33 Johansson EW, Newby H, Renshaw M, Wardlaw I Mo!lna and children
progress in intervention coverage New York United Nations Children's Fund
(UNICEF)/The Roll Back Malaria Partnership (RBM); 2007


Page 13 of 14








Tatem et al Population Health Metrics 2012, 10:8
http://www.pophealthmetrics.com/Content/10/1/8


34 Riedel N, Vounatsou P, Miller JM, Gosoniu L, Chizema-Kawesha E, Mukonka
V, Steketee RW Geographical patterns and predictors of malaria risk in
Zambia: Bayesian geostatistical modelling of the 2006 Zambia national
malaria indicator survey (ZMIS). Molar 2010, 9:37
35 Guerra CA, Howes RE, Patil AP, Gething PW, Van Boeckel TP, Temperley WH,
Kabaria CW, Tatem AJ, Manh BH, Elyazar IRF, et al The international limits
and population at risk of Plasmodium vivax transmission in 2009. PLoS
Neg/ Trop Dis 2010, 4:e774
36 Guerra CA, Gikandi PW, Tatem AJ, Noor AM, Smith DL, Hay 51, Snow RW'
The limits and intensity of Plasmodium falciparum transmission:
implications for malaria control and elimination worldwide. PLoS Med
2008, 5:e38
37 Brooker S, Miguel E, Waswa P, Namunyu R, Moulin S, Guyatt H, Bundy D'
The potential of rapid screening methods for Schistosoma mansoni in
western Kenya. Ann Trop Med Parasitol 2001, 95:343-351
38 Brooker S, Beasley M, Ndinaromtan M, Madjiouroum EM, Baboguel M,
Djenguinabe E, Hay 51, Bundy DA Use of remote sensing and a
geographical information system in a national helminth control
programme in Chad. Bull World Health Organ 2002, 80:783-789
39 Kabatereine N, Brooker S, Tukahebwa E, Kazibwe F, Onapa A Epidemiology
and geography of Schistosoma mansoni in Uganda: implications for
planning control. Trop Med Int Health 2004, 9:372
40 Clements ACA, Firth S, Dembele R, Garba A, Toure S, Sacko M, Landoure A,
Bosque-Oliva E, Barnett AG, Brooker S, Fenwick A Use of Bayesian
geostatistical prediction to estimate local variations in Schistosoma
haematobium infection in western Africa. Bull World Health Organ 2009,
87:921-929
41 Brooker SJ, Clements ACA, Hotez PJ, Hay 51, Tatem AJ, Bundy DAP, Snow
RW The co-distribution of Plasmodium falciparum and hookworm
among African schoolchildren. Molar J 2006, 5:99
42 Pullan RL, Gething PW, Smith JL, Mwandawiro CS, Sturrock HJ, Gitonga CW,
Hay SI, Brooker S Spatial modelling of soil-transmitted helminth
infections in Kenya: a disease control planning tool. PLoS Negl Trop Dis
2011, 5:e958
43 Brooker S, Clements AC, Bundy DA Global epidemiology, ecology and
control of soil-transmitted helminth infections. Adv Parasitol 2006,
62:221-261
44 Brooker S, Hotez PJ, Bundy DA Hookworm-related anaemia among
pregnant women: a systematic review. PLoS Negl Trop Dis 2008, 2:e291
45 Dellicour S, Tatem AJ, Guerra CA, Snow RW, ter Kuile FO Quantifying the
number of pregnancies at risk of malaria in 2007: a demographic study.
PLoS Med 2010, 7:e1000221
46 van Eijk A, Hill J, Alegana V, Kirui V, Gething P, ter Kuile F, Snow R Coverage
of malaria protection in pregnant women in sub-Saharan Africa: a
synthesis and analysis of national survey data. Lancet Infect Dis 2011,
11:190-207
47 Fischer E, Pahan D, Chowdhury S, Richardus J The spatial distribution of
leprosy cases during 15 years of a leprosy control program in
Bangladesh: an observational study. BMC Infect Dis 2008, 8:126
48 Kalipeni E, Zulu LC HIV and AIDS in Africa: a geographic analysis at
multiple spatial scales. Geoournal 2010, doi 101007/sl 0708-010-9358-6
49 Chao DL, Halloran ME, Longini IM Jr Vaccination strategies for epidemic
cholera in Haiti with implications for the developing world. Proc Nat/
Acad Sci U S A 2011, 108:7081-7085
50 Ferguson NM, Cummings DAT, Fraser C, Cajka JC, Cooley PC, Burke DS
Strategies for mitigating an influenza pandemic. Nature 2006,
442:448-452
51 Rakowski F, Gruziel M, Bieniasz-Krywiec L, Radomski JP Influenza epidemic
spread simulation for Poland a large scale, individual based model
study. Physical A Statistical Mechanics and its Applications 2010,
389:3149-3165
52 Rao DM, Chernyakhovsky A, Rao V Modeling and analysis of global
epidemiology of avian influenza. Environ Model Softw 2009, 24:124-134
53 Balcan D, Colizza V, Goncalves B, Hu H, Ramasco JJ, Vespignani A'
Multiscale mobility networks and the spatial spreading of infectious
diseases. Proc Nat Acd Sci 2009, 106:21484-21489
54 Dye C Health and urban living. Science 2008, 319:766-769
55 United Nations Population Division World urbanization prospects, 2009
revision New York United Nations; 2009


56 Tatem AJ, Hay SI Measuring urbanization pattern and extent for malaria
research: a review of remote sensing approaches. J Urban Health 2004,
81:363-376
57 Tatem AJ, Noor AM, Hay SI Assessing the accuracy of satellite derived
global and national urban maps in Kenya. Remote Sens Environ 2005,
96:87-97
58 Schneider A, Friedl MA, Potere D Mapping global urban areas using
MODIS 500-m data: New methods and datasets based on 'urban
ecoregions'. Remote Sens Environ 2010, 114:1733-1746
59 Balk D, Montgomery M, McGranahan G, Kim D, Mara V, Todd M, Buettner T,
Dorelien A Mapping urban settlements and the risks of climate change
in Africa, Asia and South America In Popultion dynamics and climate
change Edited by Martine G, Guzman J-M, McGranahan G, Schensul D,
Tacoli C New York UNPD; 2009'88-103
60 Kim D Econometric modeling of city population growth in developing
countries New York State University of; 2011
61 Gemperli A, Vounatsou P, Kleinschmidt I, Bagayoko M, Lengeler C, Smith T
Spatial patterns of infant mortality in Mali: the effect of malaria
endemicity. Am J Epidemio 2004, 159:64-72
62 Chin B, Montana L, Basagana X Spatial modeling of geographic
inequalities in child mortality across Nepal. Health Place 2011, 17:929-936
63 Elbers C, Lanjouw J, Lanjouw P Micro-level estimation of poverty and
inequality. Econometrica 2003, 71:355-386
64 Prothero RM Population movements and tropical health. Global Change
and Human Health 2002, 3:20-32
65 Stoddard S, Morrison A, Vazquez Prokopec G, Paz-Soldan V, Kochel T, Kitron
U, Elder J, Scott T The role of human movement in the transmission of
vector-borne pathogens. PLoS Negl Trop Dis 2010, 3:e481
66 Tatem AJ, Smith DL International population movements and regional
Plasmodium falciparum malaria elimination strategies. Proc Naot Acad Sci
2010, 107:12222-12227
67 Paz-Soldan V, Stoddard S, Vazquez-Prokopec G, Morrison A, Elder J, Kitron U,
Kochel T, Scott T Assessing and Maximizing the Acceptability of GPS
Device Use for Studying the Role of Human Movement in Dengue Virus
Transmission in Iquitos, Peru. AmJTrop Med Hyg 2010, 82:723-730
68 Tatem A, Qiu Y, Smith D, Sabot 0, Ali A, Moonen B The use of mobile
phone data for the estimation of the travel patterns and imported
Plasmodium falciparum rates among Zanzibar residents. Molar J 2009,
8:287

doi:10.1186/1478-7954-10-8
Cite this article as: Tatem et al Mapping populations at risk: improving
spatial demographic data for infectious disease modeling and metric
derivation. Population Health Metrics 2012 10'8


Page 14 of 14


Submit your next manuscript to BioMed Central
and take full advantage of:

* Convenient online submission
* Thorough peer review
* No space constraints or color figure charges
* Immediate publication on acceptance
* Inclusion in PubMed, CAS, Scopus and Google Scholar
* Research which is freely available for redistribution

Submit your manuscript at n
www.biomedcentral.com/submit 0 BiliaoMd Central




Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EP7C2YCBJ_ZMZKBM INGEST_TIME 2013-01-22T13:56:21Z PACKAGE AA00013234_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES



PAGE 1

REVIEWOpenAccessMappingpopulationsatrisk:improvingspatial demographicdataforinfectiousdisease modelingandmetricderivationAndrewJTatem1,2,3*,SusanaAdamo4,NitaBharti5,ClaraRBurgert6,MarciaCastro7,AudreyDorelien8,GunterFink7, CatherineLinard9,10,MendelsohnJohn11,LiviaMontana7,MarkRMontgomery12,16,AndrewNelson13, AbdisalanMNoor14,DeepaPindolia1,2,14,GregYetman4andDeborahBalk15AbstractTheuseofGlobalPositioningSystems(GPS)andGeographicalInformationSystems(GIS)indiseasesurveysand reportingisbecomingincreasinglyroutine,enablingabetterunderstandingofspatialepidemiologyandthe improvementofsurveillanceandcontrolstrategies.Inturn,thegreateravailabilityofspatiallyreferenced epidemiologicaldataisdrivingtherapidexpansionofdiseasemappingandspatialmodelingmethods,whichare becomingincreasinglydetailedandsophisticated,withrigoroushandlingofuncertainties.Thisexpansionhas, however,notbeenmatchedbyadvancementsinthedevelopmentofspatialdatasetsofhumanpopulation distributionthataccompanydiseasemapsorspatialmodels. Whererisksareheterogeneousacrosspopulationgroupsorspaceordependentontransmissionbetween individuals,spatialdataonhumanpopulationdistributionsanddemographicstructuresarerequiredtoestimate infectiousdiseaserisks,burdens,anddynamics.Thediseaseimpactintermsofmorbidity,mortality,andspeedof spreadvariessubstantiallywithdemographicprofiles,sothatidentifyingthemostexposedoraffectedpopulations becomesakeyaspectofplanningandtargetinginterventions.Subnationalbreakdownsofpopulationcountsby ageandsexareroutinelycollectedduringnationalcensusesandmaintainedinfinerdetailwithinmicrocensus data.Moreover,demographicandhealthsurveyscontinuetocollectrepresentativeandcontemporarysamples fromclustersofcommunitiesinlow-incomecountrieswherecensusdatamaybelessdetailedandnotcollected regularly.Together,thesefreelyavailabledatasetsformarichresourceforquantifyingandunderstandingthe spatialvariationsinthesizesanddistributionsofthosemostatriskofdiseaseinlowincomeregions,yetatpresent, theyremainunconnecteddatascatteredacrossnationalstatisticalofficesandwebsites. Inthispaperwediscussthedeficienciesofexistingspatialpopulationdatasetsandtheirlimitationson epidemiologicalanalyses.Wereviewsourcesofdetailed,contemporary,freelyavailableandrelevantspatial demographicdatafocusingonlowincomeregionswheresuchdataareoftensparseandhighlightthevalueof incorporatingthesethroughasetofexamplesoftheirapplicationindiseasestudies.Moreover,theimportanceof acknowledging,measuring,andaccountingforuncertaintyinspatialdemographicdatasetsisoutlined.Finally,a strategyforbuildinganopen-accessdatabaseofspatialdemographicdatathatistailoredtoepidemiological applicationsisputforward. Keywords: Population,Epidemiology,Demography,Diseasemapping *Correspondence: andy.tatem@gmail.com1DepartmentofGeography,UniversityofFlorida,Gainesville,USA2EmergingPathogensInstitute,UniversityofFlorida,Gainesville,USA Fulllistofauthorinformationisavailableattheendofthearticle 2012Tatemetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited.Tatem etal.PopulationHealthMetrics 2012, 10 :8 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 2

IntroductionThespatialmodelingandmappingofdiseasesisincreasinglybeingundertakentoderivehealthmetrics,guide interventionstrategies,andadvanceepidemiological understanding[1].Thishasbeendrivenbyarecognition ofthespatial,temporal,anddemographicheterogeneitiesindiseaserisk(Table1),andhasresultedinsignificantrecentmethodologicaladvances(forexample[2,3]). Moreover,theneedforestimatesofpopulationsatrisk toguidefundingallocationswithintheMillenniumDevelopmentGoals(MDGs:www.un.org/millenniumgoals) frameworkcontinuestodrivethedevelopmentofdiseasemappingandmodelingapproaches. Giventhehighdegreesofindividualandlocalheterogeneitywithingeographicregionsoradministrative units,effectivepolicydesignrequiresadetailedknowledgeofthespatialdistributionofrelevantpopulation attributesofinterest,includingsize,age,gender,income, Table1HeterogeneitiesindiseaserisksHeterogeneitytypeBackgroundinformationandexamples SpatialUnderstandingrelevantspatialheterogeneitiesunderliesourabilitytomaphostriskofpathogenexposure.Predictions ofdiseaseimportationoremergencearelimitedbyourabilitytodistinguishdisease-specifichotspotsfromcontinuous risksurfaces.Spatialvariationinriskisdefinedbythespecificbiologyofeachhost-pathogenrelationship. Epidemiologicallyrelevantspatialheterogeneitiescanbehighlyspecifictoeachinfectionandmustbecorrectly identifiedwithinthepropercontextoftheecologyandlandscapeofeachhost-pathogenrelationship.Spatial heterogeneitiesthatimpactriskprofilesforexposuretoapathogenincludelarge-scaleenvironmentalfactors,suchas temperature,accesstowater,andrainfallabundance,whichcanaffecthostsusceptibility(e.g.withintheAfrican meningitisbelt[ 4 ]),hostexposure(e.g.proximitytomalariavectorhabitats[ 5 ]),andpathogenviability(e.g.cholera survivalintheenvironment[ 6 ]).Withinapopulation,thetransmissioneventsofinfectionsdrivethespatialprogression ofanoutbreakaftertheinitialexposuretothepathogenhasalreadytakenplace.Transmissioneventsarerarely observedandriskprofilesmustbeconstructedusingproxiesfortransmission,againhighlightingcharacteristicsspecific toeachhost-pathogenrelationship.Riskprofilesfordirectlytransmitteddiseasesfocusonhostcontactsbetween infectiousandsusceptibleindividuals.Importantcomponentsofthesecontactsarehostdensity,susceptibility,and mobility.Eachofthesefactorscanalsobedefinedacrossspatialscales,fromwithinhouseholdcontactpatternsto settlement-levelriskfactors.Urbanandruralresidencecanbethoughtofasabasic(yetdichotomized)spatial heterogeneitythatiscloselyassociatedwithdensityandlandscape,buttypicallyurbanizationhasnotbeendefinedin spatialterms.Similarly,transmissionofvector-mediatedinfectionsisimpactedbyspatialheterogeneitiesatthe householdandcommunityleveldeterminedbyhostdensity,preventionmeasures,vectormobilityandvector abundance.Spatialpatternsofenvironmentallymediatedinfectionswillalsobedeterminedbythehost-pathogen relationship. TemporalEpidemiologicallyimportanttemporalheterogeneitieswillalsobespecifictoeachinfection.Foremerginginfections, long-termchangesinhostsettlements,habitatloss,andchanginglevelsofinteractionsbetweenhumansandanimal speciesinteractionscandefinetheriskofdiseaseemergenceovertime[ 7 ](e.g.ebola,SARS,monkeypox,HIV,H1N1and H5N1influenza).Inothersituations,seasonalandenvironmentalfactorsmaydeterminethepopulationlevelriskof pathogenexposure(e.g.malariavectorhabitats,hyperendemicareasofmeningitis).Short-termriskofinfection,or transmissionofapathogenwithinapopulation,isdeterminedbythebiologyoftherelationshipsbetweenthehost, pathogenandvector.Theserelationshipsestablishthehostsusceptibilityandinfectiousperiods,andthereforetherisk oftransmissionevents.Populationlevelsusceptibilityprofiles(naturalorderived)varyacrosstemporalscaleswith respecttopriorexposureandpreventativemeasures.Temporallikelihoodoftransmissionwillbedeterminedbylength ofexposure,andchangesinabundanceandsusceptibilityofthehostandvector.Exposureandcontactrates(density, migration)overthecourseofaday(asincommuterpatternsforinfluenza[ 8 ])areadditionalexamplesoftemporal heterogeneitiesintransmissionlikelihoodandriskacrosstemporalscales. Demographicand Socioeconomic Susceptibilityandtransmissibilityofinfectiousdiseasevaryacrossdifferingdemographicandsocioeconomicgroupsdue todifferencesinimmunity,mobility,contactpatternsandhealthstatus.Small-scalevariationsinsocioeconomicand demographicfactorscanhavealargeinfluenceonthegeographicalvariationofinfectionscomparedtoenvironmental factors.Agerepresentsoneofthemostsignificantfactors,withriskofmorbidityandmortalityofmanydiseasesvarying substantiallyacrossagegroups.Theseincludelargevariationsinmortalityandmorbiditybyageformalaria[ 9 ]andfor clinicalattackriskfordengue[ 10 ].Heterogeneitiesinsusceptibilityandtransmissibilityalsoexistbetweenthesexes,and especiallyduringchildbearingageforwomen,whenpregnancyincreasestherisksofdeathforboththemotherand fetus,andareimportantfordiseasessuchascongenitalrubellasyndrome(CRS)[ 11 ].Atapopulationscale,differencesin vitalratessuchasbirthratescreateheterogeneitiesindiseaseriskacrossspaceandtime,asevidencedbyrotavirusin theUS[ 12 ].Formacro-parasiteinfections,suchashelminths,inadditiontoenvironmentalriskfactors,thepopulationat riskoftendependsonsocioeconomicprofilesandaccesstokeyinfrastructure(housingquality,adequatesanitationand drinkingwater).Formicro-parasiteinfectionswithhuman-to-humantransmission,riskisagainassociatedwithindividual socioeconomicattributes,butalsowithcommunity/neighborhoodattributes.Inotherwords,theconcentrationof povertyorpoorsanitationservicesincreaserisk,asevidencedbycholeraoutbreaks[ 13 ].Finally,inadditionto informationonpovertystatus,knowledgeofnutritionalstatusisimportant;malnutritioncanincrease(i)susceptibilityto manyinfectiousdiseases,(ii)theperiodofinfectiousness(byreducingimmunefunctionanddelayingrecovery)and(iii) diseaseassociatedmortality[ 14 ].Diseasemorbidity,mortality,andspeedofspreadvarysubstantiallywithdemographicprofiles,withclearriskgroupsandvulnerablepopulations existing.These haveimportantimplicationsforplanningandtargetinginterventionstrategies.Theriskofpathogeninfectiontohostpopulationsexistsattwosp atiallevels.First, thereisaprobabilityofinitialexposureofapopulationtoapathogen,whichdefinesthepopulationrisk.Second,thereisaprobabilityoftransmis sionofa diseasewithinapopulation,whichdefinestheindividualrisk.Withintheseepidemicandendemicclassifications,theimplicationsforintervent ionsvaryacross diseaselandscapesdependentuponthehost-pathogenrelationships.Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page2of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 3

nutritionalstatus,vaccinationrates,orchildmortality. Fromapublichealthperspective,detailedspatialdatasetsnotonlyallowinvestigationoftherelationshipbetweenpolicyinputsandindividual-specificoutcomes, butalsobuilddetailedandrealisticpredictivemodels andderivesuitesofhealthmetrics.Diseasemappingand spatialmodelingstudieshavebecomeincreasingly detailedandsophisticated,withrigoroushandlingofuncertaintiesbuiltin,butarelimitedwhenitcomestoestimatingpopulationsatrisk.Detailedspatialdatasetson populationdistributionsnowexist,butmapsofother demographicandsocioeconomiccharacteristicstoidentifyvulnerablesubgroupsremainlacking.Toquantify thesespatialvariationsinpopulationattributes,recent high-impactstudieshavehadtooverlooksubnational demographicvariationsincharacteristicsandrelyonapplyingsimplenational-scaleadjustments,e.g.[5,15-17]. Theavailabilityofhigh-resolutionpopulationdatahas increaseddramaticallythroughaseriesofglobalpopulationmappingeffortsoverthepast15years.Initially restrictedtoafewcountries,location-specificpopulation numbershavebeenmadeavailablefortheglobeover thepastdecadesthroughthecombinedeffortsofprojectsliketheGriddedPopulationoftheWorld(GPW) [18],theGlobalRuralUrbanMappingProject(GRUMP) [19],LandScan[20],andAfriPop[21](www.afripop.org). Alldatabasesareinthepublicdomainandallowindividuals,companies,researchers,andpolicymakerstoaccesspopulationdataeitherbyadministrativeunitsorby user-specifiedgeographicboundariesofinterest.While thegenerationofthesecomprehensivepopulationdatabasesclearlyconstitutesamajorachievementfromascientificperspective,twomainfactorslimitthedegreeto whichthesedatabasescanbeusedforresearchaswell asforpolicyandplanning:limitedtimeframesandlimitedinformationonpopulationattributesofinterest. Thefirstlimitationismostlytheresultoftheirregular collectionofdetailedpopulationdataaswellastheeffortrequiredincompilingglobaldatasetsatanygiven pointintime.Giventhatmostcountriesindependently collectfullcensusesonlyonceperdecadeanddatasharingiscomplicatedbyalargesetofcopyrightissues, mostcurrentpopulationdatabasescontainpopulation dataonlyonafive-or10-yearbasis.Whenanalyseswarrantdatafornoncensusyears,nationalgrowthrates [22],subnationalgrowthratesfromNationalStatistical Offices,orinterpolationbetweenavailabledatapoints maybeappliedtoproduceestimatesforintermediate years,asannualpopulationfluctuationsaregenerally limited. Thesecondconstraintismorecritical:littleisknown aboutcharacteristicsoftheunderlyingpopulationsbeing mappedindetail.Fromaplanningorresearchperspective,thesefactorscanbeofcriticalimportance,as outlinedinTable1.Variousfreelyavailabledatasetsexist tofacilitatemappingimprovementsandaddsignificant valuetoepidemiologicalanalyses,buttheseremainscatteredacrossdifferentsourcesandrequireprocessingto beintegratedintomapping.Herewereviewthese sourcesofmoredetailed,contemporary,freelyavailable, andrelevantspatialdemographicdata,focusingonlowincomeregionsoftheworldwherediseaseburdenis highest,andputforwardastrategyforbuildinganopenaccessdatabasetolinkthevariousdatasets,tailoredto epidemiologicalapplications.UsagesofspatialdemographicdatainepidemiologyPopulationdistributiondatasetsconstituteanessential denominatorrequiredformanyinfectiousdiseasestudies.Itiswellknownthatdiseasetransmissionisspatially focalandheterogeneous(Table1),partiallyduetothe clusterednatureofpopulationdistribution.Theepidemiologyofmanydiseasesmakessurveillance-based methods(reliantuponreportingfromhealthfacilities) forestimatingpopulationsatriskanddiseaseburden problematic,particularlyinlow-incomeregions[23-25], whilespatialheterogeneityinhumanpopulationdistributioncanproducesignificanteffectsontransmission [3,26].Cartographicandspatialmodelingapproaches haveproventobeeffectiveintacklingthesefactors(e.g., [27-29]).Suchapproachescanhelpcharacterizelargescalepatternsofdiseasespreadtoevaluateintervention impact[3]andproducegloballyconsistentmeasuresof morbidityofknownfidelity,whichoftenrepresentthe onlyplausiblemethodinmanyAfricancountrieswhere surveillancedataisincomplete,unreliable,andinconsistent[23,30,31].Astheprecisionanddetailofdiseaserisk mappingandmodelingimproves,spatialpopulation datasetsthatcapturethesepatternsaretherefore requiredifpopulationsatriskaretobemoreaccurately quantifiedanddiseasespreadamongpopulationsisrealisticallymodeledforpredictionandpreventionpurposes. UsesofgriddedpopulationcountdatainepidemiologicalstudiesaredocumentedinTatemetal.[1]and LinardandTatem[32],andherewefocusonstudies thathaveattemptedtoincorporatespatialdataonpopulationsubgroups. Applicationsofgriddedpopulationdatasetsinepidemiologyhaveinvolvedestimatingnumbersofclinical cases,modelingthespatialprogressionofanepidemic, riskmappingandassessingtheeffectsofurbanization, andthestudyofdiseasesrangingfromdengueandyellowfevertoHIVandleprosy.Themajorityofspatial modelingapproachesofinfectiousdiseaseshavebeen basedontheenvironmentalcorrelatesofinfection,due inparttotheavailabilityofhighspatialresolutionenvironmentaldataandrelativepaucityofspatialsocioeconomicanddemographicdata.ThemostwidespreadusesTatem etal.PopulationHealthMetrics 2012, 10 :8 Page3of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 4

ofgriddedpopulationdatasetsinanepidemiological contexthavebeeninthestudyofmalaria.Globalspatial demographicdatasetshavebeenusedtoestimatepopulationsatriskofmalaria,whichformsafundamental metricfordecision-makersatnationalandinternational levels[30,33].Whileapproachesformappingmalaria havebecomeincreasinglysophisticated(e.g.,[2]),those formappingpopulationdistributionshavenotkeptpace, especiallyinlow-incomeregions[1],wheredetailed spatialinformationonpopulationcompositionisrarely availableorutilized. Previousstudiesthathaveaimedtoenumeratevulnerablepopulationsubgroupsatriskfordifferentdiseases havesolelyfocusedonutilizingsimplisticnational-level adjustments.Themalariaburdeninchildrenunder 5yearsoldwasrecentlyestimatedbasedonaZambiawidesurveyandLandScanpopulationdataadjustedbya national-levelestimateoftheproportionofunder-5children[34].Similarly,thenumbersatriskofmalariagloballyindifferentagegroupswereestimatedbyapplying national-leveladjustmentstoGRUMPdata[5,29,35,36]. Modelsofdiseaseprevalencewereoverlaidontopopulationdensitymapsadjustedbynational-levelproportions againtoquantifyschool-agechildrenandyoungadults atriskofschistosomiasis[17,37-40]andhookworm [38,41-43]andthenumberofpregnantwomeninfected withhookworminsub-SaharanAfrica[44].Specificestimatesofpopulationsatriskofmalariaforpregnant womenandchildrenhavealsobeenderivedfromthese maps,bycombiningGRUMPdatawithnational-scale age,sex,andfertilitydatafromtheUnitedNations PopulationDivision[15,45,46].Finally,thenumbersof childrenunder5withanemiainWestAfricawereestimatedusingsimilartechniques[16].Inallofthese examplesoutlinedhere,theproblemsofoverlookingsubnationalvariationsinpopulationthroughthenationalleveladjustmentsappliedareillustratedinanexamplein thenextsection. Spatio-temporaltransmissionmodelsaimtosimulate contactsbetweeninfectiousandsusceptibleindividuals andestimatethespatialspreadofthedisease.Thishelps toidentifyareasandtimesatriskofdiseaseandassists inplanningtargetedinterventions[47,48].Sophisticated spatiallyexplicitmodelshavebeendevelopedtostudy thespatialprogressionofinfectiousdiseases.Manyof suchspatiallyexplicitmodelshavemadeuseofgridded populationdatasetsasinputdata[3,49].Griddedpopulationdatahavealsobeenusedtodevelopagent-based simulationmodelsattheregionallevel[28,50,51]andat thegloballevel[52,53].Whateverthespatialapproach formodeling,populationdataareessentialasthese models,whichgenerallyrequirethegenerationofavirtualsocietywithanappropriatedistributionandcompositionofpeople[3].Griddeddataarepreferredby thesemodelsinthatthegriddingprocessremovestheirregularityassociatedwiththenativeadministrativeunits inwhichthesedatawereinitiallyreportedandthereby makesthedatamoreflexibleforusewithavarietyof otherspatialunitsorfeatures.Inaddition,global(or continent-level)griddedpopulationdataprovidevaluableinputdatasetsmainlybecauseoftheirwidecoverage,consistentspatialresolution,andavailabilityinthe publicdomain.Notablymissingasdiscussedaboveisinformationonpopulationattributes.Thisrepresentsa limitationformodelsthatcanbesubstantiallyimproved throughtheincorporationofrealisticpopulationattributestobuild ‘ synthetic ’ populations.Previousstudies havehadtorelyonnational-levelstatisticsorthe Figure1 ForTanzaniain2007 :( a ) P.falciparum malariatransmissionclasses(adaptedfromHay[5],measuredby P.falciparum Parasite Rate( Pf PR),(b)percentageofresidentsunder5yearsofagebyward,(c)percentagedifferencesinestimatesofnumberofchildren under5atriskofthehighesttransmissionclassbynational-vs.ward-leveladjustments. Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page4of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 5

applicationofcensus-derivedattributesfromonecountryappliedtomultipleothers(e.g.,[28]).Improvingestimatesofchildrenunder5yearsatriskof Plasmodiumfalciparum malariaThelackofavailabilityofsubnationalspatialdatasetson specificpopulationgroupsthatareparticularlyvulnerableto P.falciparum malariahasmeantthatsimplenational-leveladjustmentshavebeenappliedininfluential studiestoestimatethespatialdistributionsof,forexample,childrenunder5[15]orpregnantwomen[45,46] atrisk.Toillustratetheimportanceofmappingvulnerablepopulationstoalevelofspatialdetailapproaching thatnowusedindiseasemapping,herewecompare estimatesofunder-5childrenatriskof P.falciparum malariainTanzaniain2007usingtransmissionrisk classes(Figure1(a))[5]overlaidontoapopulationdistributionmap(www.afripop.org)adjustedtorepresent childrenunder5by(i)applyingasinglenationwidepercentageadjustmentasdefinedbytheUN ’ sWorldPopulationProspects[22](asundertakenin[15] – for Tanzania,percentageunder5isestimatedtobe17.9%) and(ii)applyingper-districtproportionsofunder-5childrenderivedfromward-levelcensusdata(Figure1(b)). Table2showsthedifferenceinestimatesofunder-5 childrenresidingineachtransmissionclass,withlarge percentagedifferencesfoundineachtransmissionclass. Wedonotexaminethespatialpatternsofdifferences here,asthisisbeyondthescopeofthisanalysis,butthey remainaninterestingareaforexploration[1].Overall, theadjustmentsfromfiner-scaleagedistributionsindicatethatnational-levelestimatessubstantiallyoverestimatenumbersatrisk.WhereasTable2summarizesthe overestimatesbytransmissionclasses,Figure1(c)shows thespatialpatternofthemisestimationinthehighest transmissionclass.Itshowsthepercentagedifferences obtainedinestimatesofunder-5childrenatriskofPfPR > 40%transmissionlevel(mappedinFigure1(a))resultingfromuseoftheward-levelmapofchildrenunder 5yearsratherthanfromapplyingasinglenationwide adjustment.Mostofthesewardsshowdifferencesabove 25%,andseveralhavediscrepanciesofgreaterthan 100%.Thesemalariatransmissionmapsandthe populationsatriskestimatesderivedfromthemareincreasinglybeingusedtoguideplanning,policy,andcontrol.Suchsubstantialdifferencesinestimatesof populationsatrisk,achievablethroughtheuseofan improvedspatialdemographiccompositiondata,illustratetheurgentneedtodevelopspatialdatabasesofvulnerablepopulations.SpatialdemographicdatatomeetneedsFromanepidemiologicalandhealthmetricsperspective, fundamentalcharacteristicsareageandsex.Themost commonlyneededage-sexspecificgroupsindeveloping countriesare:infants,childrenunder5,womenofchildbearingages,andtheelderly(Table1).Morespecific needsmightrequirethepopulationofpregnantwomen, youngadults,orurbanchildren.Eventhoughthese numberscangenerallybeapproximatedbymultiplying totalpopulationnumbersbyestimatednationalpopulationfractions,thelargeepidemiologicallyimportantheterogeneityinpopulationcompositiongeneratedby migrationanddifferentialmortalityandbirthrates withincountriesandregions,andparticularlybetween urbanandruralresidents,islikelytoinducesubstantial degreesofimprecisioninresultantoutputmetrics(see previoussection).Theproblembecomesevenmoreseverewhenresearchersorpolicymakersareprimarily interestedinnondemographicaspectsofthepopulation. Inmanycases,themainvariableofinterestmaybea fractionofthepopulationwithcertainhealthorbehavioralcharacteristics:thenumberofchildrennotvaccinated,thenumberofwomenwithoutaccessto contraceptives,thenumberofchildrennotgoingto schoolornotreceivingformalhealthcare.Manyof thesecharacteristicsarenotcensus-based,butrather canbeascertainedthroughsurveydata,anaspectthat weshalldiscussinfurtherdetailbelow.Clearlyitisnot feasibleforglobalpopulationdatabasestogenerateondemandmapsforeachofthesefactorsonaregular basis,neverthelessthepotentialtoleveragecurrent freelyavailablepopulationdatabasesappearslarge. Table3documentstheprincipaldatasetsthatarereadily availablewithoutcostacrossmultiplecountriesto achievethis. Censusdataformthebasisofexistingspatialdemographicdatabases[19,20],andsuchpopulationandhousingcensusesareundertakenforalmostallcountriesin theworld,includingdevelopingcountries,generally every10years(thedateofpastandupcomingplanned censusesareavailablehere:http://unstats.un.org/unsd/ demographic/sources/census/censusdates.htm),butthese provideonlypopulationcounts.Arangeofotherpopulation-attributeinformationisgenerallycollectedduring populationcensusessuchasage,gender,urban/rural residence,andmigrationinformation,and,forthe Table2Estimatesofnumbersofchildrenunder5atrisk of P.falciparum malariainTanzaniausingthetwo differingdemographicmethodsdescribedinthetextTransmission Class U5PAR1: UNNationwide adjusted U5PAR2: Censusunit adjustments Percentagechange fromU5PAR1to U5PAR2 Pf PR < 5%770547650174 15.62175961 Pf PR=5 – 40%43156383383040 21.6097365 Pf PR > 40%773992630518 18.5368841U5PAR=Under-5populationatrisk.Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page5of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 6

Table3SourcesoffreelyavailablespatialdemographicdataData(standard surveyname)/ source TimeintervalsTypicalspatialcoverageTypicalstrataRelevantvariables Census NationalStatistical Offices Typically10yearsCensusenumeratorarea orcourserlevel Urban/rural,race orethnic groups(often) Sex,age,education, migrationstatus,household anddwellingcharacteristics CensusMicrodata https://international. ipums.org/ international/ Typically10yearsAdmin1-3Urban/ruralHouseholdand dwellingcharacteristics, sex,age,education, migrationstatus,children everborn,childrensurviving DHS(Demographic andHealthSurvey) Household,women 15 – 49,men 15 – 59,childrenborn inthelastfiveyears http://www. measuredhs.com/ Variesbycountry, typicallyevery5years National,Admin 1/region,GPScoordinates ofclusterlocationsfor mostrecentsurveys (last15years) Urban/ruralHouseholdand dwellingcharacteristics, sex,age,education, maternalandchild health,fertilityandfull birthhistory,family planning,domestic violence, biomarkers,nutrition MICS(Multi-indicator clustersurvey) http://www.unicef. org/statistics/ index_24302.html UNICEF(Round2, 1999 – 2001;round3 2005 – 2007;round4 isinthefield 2009 – present) National,Admin1Urban/ruralHouseholdand dwellingcharacteristics, sex,age,education, status,maternaland childhealth,child labor,domestic violence,summary birthhistory,anthropometry LSMS(LivingStandard MeasureSurvey) (IntegratedHousehold BudgetSurveyand manyothersthat arelocallyadapted) http://iresearch. worldbank.org/lsms/ lsmssurveyFinder.htm IrregularNational,Admin1, someGPScoordinates Urban/ruralHouseholdand dwellingcharacteristics,sex, age,education,migration status,consumption, expenditures,income, nutrition,anthropometry, summarybirthhistory MIS(Malaria IndicatorSurvey) http://www. measuredhs.com/ http://www. malariasurveys.org/ Variesbycountry, typicallyevery3years National,Admin 1/region,GPScoordinates ofclusterlocationsfor somesurveys(lastfiveyears) Urban/ruralHouseholdand dwellingcharacteristics, sex,age, education,biomarkers Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page6of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 7

Table3Sourcesoffreelyavailablespatialdemographicdata (Continued)AIS(AIDSIndicator Survey) http://www. measuredhs.com/ Variesbycountry, typicallyevery3years National,Admin 1/region,GPScoordinates ofclusterlocationsfor somesurveys(last eightyears) Urban/ruralHouseholdand dwellingcharacteristics, sex,age, education,biomarkers DHS(Demographic andHealthSurvey) Household,women 15 – 49,men15 – 59, childrenborninthe lastfiveyears http://www. measuredhs.com/ Variesbycountry, typicallyevery5years National,Admin1/region, GPScoordinatesof clusterlocationsfor mostrecentsurveys (last15years) Urban/ruralHouseholdand dwellingcharacteristics,sex, age,education,maternal andchildhealth,fertilityand fullbirthhistory, familyplanning, domesticviolence, biomarkers,nutrition MICS(Multi-indicator clustersurvey) http://www.unicef. org/statistics/ index_24302.html UNICEF(Round 2,1999 – 2001;round 32005 – 2007;round4 isinthefield 2009-present) National,Admin1Urban/ruralHouseholdand dwellingcharacteristics,sex, age,education,status, maternalandchildhealth, childlabor,domestic violence,summarybirth history,anthropometry LSMS(LivingStandard MeasureSurvey) (IntegratedHousehold BudgetSurveyand manyothersthatare locallyadapted) http://iresearch. worldbank.org/lsms/ lsmssurveyFinder.htm IrregularNational,Admin1, someGPScoordinates Urban/ruralHouseholdand dwellingcharacteristics,sex, age,education,migration status,consumption, expenditures, income,nutrition, anthropometry,summary birthhistory MIS(MalariaIndicator Survey) http://www. measuredhs.com/ http://www. malariasurveys.org/ Variesby country,typically every3years National,Admin 1/region,GPScoordinates ofclusterlocationsfor somesurveys(lastfiveyears) Urban/ruralHouseholdand dwellingcharacteristics,sex, age,education,biomarkers AIS(AIDSIndicator Survey) http://www. measuredhs.com/ Variesby country,typically every3years National,Admin 1/region,GPScoordinates ofclusterlocationsfor somesurveys(last eightyears) Urban/ruralHouseholdand dwellingcharacteristics,sex, age,education,biomarkers Tatem etal.PopulationHealthMetrics 2012, 10:8 Page7of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 8

majorityofcountries,madeavailableinsomeformon nationalstatisticalofficewebsites.Thisinformationsuppliesaseriesofsinglepopulationcharacteristicsatwhateverlevelofgeographicdetailismadeavailablebythe NationalStatisticalOffice.Often,thisinformationis availablethroughdatatablesaggregatedatcoarseadministrativelevels,however,andfull-detaildatasetscanbe difficulttoobtain.Anadditiontotheaggregatedfullcensusdataarelargesamplesofhousehold-levelrecords derivedfromcensuses(censusmicrodata)thatprovide ageandsexstructure,aswellasmanyothercompositionalmeasures,reportedgenerallybyadministrative level1(e.g.,province)or2(e.g.,district).Thesedata keepinformationabouthouseholdsintactsothat Figure2 Mapsshowingtheavailabilityofusefuldemographicdatasetsforderivingsubnationalestimatesofpopulationattributes. ( a )NumbersofcensusmicrodatarecordsmaintainedattheInternationalPublicUseMicrodataSeriesrepository(https://international.ipums.org/ international/),( b )combinednumbersofDemographicandHealthSurveys(DHS),MalariaIndicatorSurveys(MIS),andAIDSIndicatorSurveys(AIS) conductedforeachcountry,( c )combinednumbersofDHS,MIS,andAISwithGPSclustercoordinatesavailable. Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page8of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 9

combinationsofvariablescanbemade.ThelargestrepositoryofsuchdataistheInternationalPublicUseMicrodataSeries(https://international.ipums.org/international/) andthedataheldtherearemappedinFigure2(a). Whilecensusaggregatesandcensusmicrodatasamplesaretypicallylargeenoughtocoversmallormoderatelysizedgeographicareas,theyareonlycarriedout approximatelyevery10yearsandarelimitedincontent. Surveydataoffermuchrichercontentonshortertime intervalsbutarelimitedinspatialcoverage(e.g.,Figure2 (b)).Inmosthigh-andlow-incomecountries,georeferencedhousehold-basedsurveysarecollectedonaregularbasis.Thesesurveyscontaindetailedlocalpopulation characteristicsforafinitenumberoflocations,which couldbeusedtogeneratechar acteristic-prevalencesurfacesforagivencountryandyear.Overlayingthese characteristicsurfaceswithpopulationestimateswould likelybecomeaninvaluabletoolbothforresearchers andpolicymakers. Dataonarichvarietyofpopulationattributescanbe obtainedfromarangeofinternationalhouseholdsurvey programs,eachofwhichislistedinTable3.Theseprovidesubnationalurban(orrural)ageandsexstructures, educationalcompositions,employmentinformation,and countlessothersocioeconomicandhealthindicatorsat thelevelofsubnationalregions.Largehouseholdsurvey Figure3 Designofarelationalspatialdemographicdatabase. Table4providesdetailsoneachlayer. Table4ComponentsofrelationalspatialdemographicdatabasebasedonfreelyavailabledatasetsFeatureExampledatasetExampledatasetsource NationalboundariesSALBwww.unsalb.org AdministrativeboundariesGADMwww.gadm.org DHSboundariesMEASUREDHSwww.measuredhs.com CoastlinesGBWDhttp://dds.cr.usgs.gov/srtm/ WaterbodiesSWDBhttp://dds.cr.usgs.gov/srtm/version2_1/SWBD/ LandcoverGlobCoverwww.ionia1.esrin.esa.int ProtectedareasWDBPAwww.wdpa.org UrbanextentsMODIS http://www.sage.wisc.edu/people/schneider/research/data.html SettlementlocationsNGAGeonameswww.earth-info.nga.mil/gns/html ElevationandslopeSRTMwww.srtm.csi.cgiar.org InfrastructuregRoads www.ciesin.columbia.edu/confluence/display/roads/GlobalRoadsData Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page9of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 10

programssuchastheMultipleIndicatorClusterSurveys (MICS)andDemographicandHealthSurveys(DHS) listedinTable3arereasonablywellstandardizedand covermanylow-incomecountries(Figure2(b),with multipleroundsineachcountry.Additionally,inmost recentDHSdatasets,thesurveyclustershavebeengeocoded(Figure2(c)).Inordertoprotecttheconfidentialityofsurveyrespondents,clusterlocationsarerandomly displacedbyupto2kminurbanareasand5kmin ruralareas.Moreover,inruralareas,theDHScluster locationsprovidedcanrepresentlargeandpotentially heterogeneousareas.Theexistingapproachesusing DHSdatahavenottakenadvantageofspatialmodeling toexpandtheuseofDHSdatabelowthesurveyregion level(usuallyadministrativelevel1).ThegeocodedclusterdatafromtheDHSallowfordataregroupingtodifferentlevelsofrepresentativenesswhilestillrespecting thesampleframe.WhileMICSnowregularlycollect geocodeddata,thesedatasetsarenotavailabletoenable mappingofthedataatfinerlevelthanthesurveyregion level. Eachofthedatasetsdescribedhereandlistedin Table3reportdemographicinformationaggregatedby namedadministrativeunits.Rarely,however,arespatial dataontheboundariesoftheseunitsprovidedwiththe data.Forspatialanalysis,therefore,GISboundarydatasetsmustbefoundthatmatchthereportedadministrativeunits.Thisisoftenanontrivialtaskgivenregular boundarychangesovertime,alternativenames,andmismatcheswithnationalboundaries.Theinitiationofopen accessrepositoriesofstandardizedadministrative boundarydatasets(e.g.,GADM:http://www.gadm.org/), anddocumentedhistoriesofchanges(e.g.,http://www. statoids.com/)simplifiessuchoperations.Moreover, DHSalsosharesthegeographyfortheirsurveyson request.DesigningaspatialdemographicdatabaseThedatasetsdescribedintheprevioussectionarepresentlyscatteredacrossdisparatesources(Table3).To betterfulfilltheneedsofdiseasemodelingandcartographic-stylederivationsofhealthmetrics,wepropose theconstructionofaspatialdatabase.Theconstruction ofthisdatabasewouldinvolvenotonlythehousingof thedisparatedemographicdatasetsinacentralopenaccesslocation,butalsotheirlinkagetoGISdatasetsto enabletheconstructionofspatialdatasetsrepresentinga varietyofepidemiologicallyrelevantvariables.Therecentdevelopmentofspatiallyenablingtoolsfordatabase servers,suchasPostGIS(http://postgis.refractions.net), whichprovidessupportforgeographicobjectsinobjectrelationaldatabases,providestheidealframeworkfor constructionofthedatabase.Thedatabasewouldbe hostedonacentralserverandaccessedthroughan interactivewebportal.Table4outlinesthespatialdatasetsthatwouldbeincludedinadatabasetospatially referencethedatasetsinTable3andprovideadditional informationtoincreasemappingcapabilities.Theframeworkspatialdataareopen-accessGISdatasetsthatcan bereusedbymultipleorganizationsfordifferentpurposes.Figure3outlineshowthesedatasetslinktogether intherelationalspatialdatabase.Thekeyobjectivesof thisdatabasewouldbeto: 1.Providedisaggregatedspatially-referenceddataon populationsizesandcharacteristicssuchasage,sex, urban/rurallocation,andeducation 2.Facilitatedatasharingbetweendifferingplatforms anddemographicmappingprojects 3.Provideahighdegreeoftransparency, documentation,andflexibilitywithrespecttodata sourcesandthetreatmentofuncertainty Thedatabaseisdesignedtoencouragedatasharing, builtinamannerthatcanbereplicatedacrossdifferent nodes,withstandardized,agreed-uponrepresentations. Forexample,whilsttheGRUMPandAfriPopproject outputstakedifferentformsandusedifferingmodeling techniques,eachisbuiltuponstandardrepresentations (nationalboundaries,coastlines,administrativeunits) andaimtousethemostdetailedandcontemporary populationdataavailable.Astandardizeddatabase frameworkwouldencouragesharingofnewand improveddatasetsbetweenprojects,benefittingarange ofusergroups.Bybuildingindifferinglevelsofaccess control,newdatasetscanbereviewedandprocessedbeforereleasetoawiderusercommunity,andalsodatasetsthatremaincopyrightedcanbecontrolledintheir accessibility. Documentationofallaspectsofthedataanddatabase structurearekeytoensuringeaseofuse,integration withepidemiologicalapplications,andaccessibilitytoa wideusercommunity.Thiswillfocusarounddatabase versioncontrol,thedevelopmentofadatadictionary, withfulldocumentationofthedatasetsarchivedwithin it,andmetadataaccompanyingtheGIS-relateddatasets.Thedistinctionbetweenspatialmetadataanda datadictionarymustbemade:theyaremuchdifferent andbotharenecessary.Adatadictionaryisneededto understandtheshortenednameandvaluesofparticular variables,forexample,whereasthemetadataspeakto thespatiallineageandqualityofthedata.Someinstitutionalordatabasehistoryissometimeswarranted,for example,whendatacollectionforagivenvariablehas changed.Ensuringthatthis documentationisoriented towardtheuserthroughfullexplanationofassumptions madeandqualityissueswit hthedataprovidedwillbe important.Moreover,theconstructionofalibraryofTatem etal.PopulationHealthMetrics 2012, 10 :8 Page10of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 11

toolsandtechniquesforanalyzingandusingthedata withinthedatabaseusingaforumandothermechanisms,suchasacoderepository,willfacilitateeaseof use.Finally,theprovisionofquantitativemeasuresof dataqualityanduncertaintywillbeofgreatimportance. Thiscouldrangefrombasicinformationofthetimeliness,resolution,spatialuncertainty,andstandarderrors ofdatasetsthatenableinformedinterpretationofresultingmappedproductstothemorerigoroushandling andmeasurementofuncertainties(seenextsection).Newmethods,data,andfuturechallengesWehavesofaroutlineddatasetsandabasicframework forcompilingthemtomeetimmediatediseasemodeling andhealthmetricdemographicneeds.Severalopportunitiesandpossibilitiesexisthoweverwithwhichto supplementandimprovethescopeandaccuracyof resourcesavailable.Hereweoutlinethesebriefly, alongwithfuturechallengeslikelytobefacedin implementation.UrbanpopulationsThehealthdividebetweenurbanandruralpopulations hasbeenwelldocumented[54],ashavetheincreasing levelsofurbanizationaroundtheworld[55].While muchhasbeendoneinthepast10yearstorefinepopulationdatathatdelineatesurbanareas[19,56,57],much lessisknownaboutpopulationswithincitiesevenwithin relativelycoarsedivisions(citycenter,suburban,periurban)orbyslumdwellersandothers(muchalso remainstobelearnedaboutpopulationdistribution withinruralareas).Africa,inparticular,willundergo rapidurbanizationinthecomingdecades[55],yetthe datarecordtounderstandthedemographicvariationand healthconditionsinthesecities,letalonechangesindiseasetransmissionthatmayresultfromurbanchange,is largelyabsent.Ifitisimportanttounderstandwhatis happeningwithinurbanareas,eventhecurrentlyavailable cluster-leveldataoftheDHSprogram(Figure2(c))isinadequate.WhiletheDHSprogramandsomeothersurveyshavefocusedoncollectinglargerurbansamples, thereremainsaneedforlargesamplesofurbanpopulationstopermitcity-specificanalyses.Additionally,thedefinitionofurbanisnotstandardacrosstheDHScountries. Thereisreasontobelievethatevengreaterheterogeneity ofhealthandsocioeconomiccharacteristicsexistwithin urbanareas. Aswiththedemographicdatasetsdiscussedalready, thereareavarietyofdisparatespatialdatasetsonurban populationsthatcouldbebroughttogethertogetabetterperspective.Citypopulationcountsforcitieswith populationsof100,000andaboveareproducedbythe UNPopulationDivisionWorldUrbanizationProspects [55].Alternativesourcesofcityandsettlement populationsizesincludetheCityPopulationwebsite (www.citypopulation.de),whiledifferentprojectsarefocusedonmappingcityextents,whichthesecountscould bematchedto(e.g.,[19,58],www.afripop.org).Therestill existsignificantgaps,however,suchastime-seriesof urbanspatialextents,whichwouldfacilitatethedevelopmentofwaystoforecastchangesinurbanextents.Also, informationonproperlydefinedneighborhoodswithin citiesisimportant,suchaswithin-householdand within-neighborhoodpopulationdensity,butsoare othercontexts(e.g.,schools).SubnationalspatialandtemporalprojectionsMostlow-incomecountriesdonotproducepopulation projections,orforecasts,atasubnationallevel.Eventhe UnitedNationsPopulationDivision ’ surbanpopulation projections[55]donotproducecity-levelpopulation projections.Yetthedemographicinputsforgenerating subnationalestimatesandprojectionsareincreasingly becomingavailable.Subnationalprojectionsarenow beingundertakenatleastforverylargecountries(for example,IndiaandChina)andforsmallandlargecities inthedevelopingworld[59].Forthelatter,theforecastingmethoddepartsfromthetraditionalcohort-componentmethodandinsteaduseslongitudinaldataoncities andsubnationalestimatesofdemographicrates(urban fertility,mortality,andmigration)derivedfromsurvey andcensusmicrodatainaneconometricmodelofcity growth[60].Thesenewapproachesdependonharnessingolddatainaspatialframework.Thespatialframeworkallowsdisparateunitstobelinkedinnewways, yieldingnewestimatesandprojections.Becausethese methodsarelargelyprobabilisticandderivedfrommodelingexercises,theuncertaintyassociatedwiththese estimatesshouldalsobecharacterized.QuantifyinguncertaintyThevarietyofages,spatialresolutions,andsamplesizes ofinputdemographicdatatranslatestogreatvariations inaccuraciesanduncertaintiesofanyoutputgridded demographicdataproducts,andthisisrarelyacknowledged[1].Themostbasiclevelofquantificationand communicationofthisuncertaintytousersinvolvesthe provisionofinformationoninputdatasetsandmethods usedinconstruction,suchasisundertakenforGPW, GRUMP[19],andAfriPop[21](www.afripop.org). Ideally,amorerigorousquantificationoftheuncertainty inherentinoutputgriddeddemographicdatasetsshould beundertaken.Therigoroushandlingandpropagation ofuncertaintythroughamappingprocessisnowregularlyundertakenindiseaseriskmappingwithinaBayesianframework(e.g.,[2,16,17]),resultinginfullposterior predictiondistributionsforeachgridcell,providing flexibilityinthederivationofdifferinguncertaintyTatem etal.PopulationHealthMetrics 2012, 10 :8 Page11of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 12

metricsandenablingtheproductionofaccompanying uncertaintymaps.Undertakinganequivalentapproach forderivingaccompanyinguncertaintymapsfordemographicdatasetswouldrequireconsiderationoftheinputdatasetsandoutputrequirements.Forinstance,this couldtaketheformofestimatingtheuncertaintyin griddedpopulationdistributionmappingfromcensus datasummarizedbyadministrativeunit.Here,twoof themajorsourcesofuncertaintyaretheageofthecensusdatainrelationtotheoutputpredictionyearandthe sizeoftheadministrativeunitsrelativetothepopulation sizeswithinthemandoutputgridcellsize.Whileuncertaintyintemporalprojectionofcensusdataisrelatively wellstudied,thespatialaspectremainsunexplored.For example,griddinga50,000km2administrativeunitcontaining1millionchildrenunderage5to30arcsecond resolutionresultsingreateruncertaintyaboutthepopulationsizeandcompositionresidingineachgridsquare thandoesthesameresolutiongriddingofa1,000km2administrativeunitcontaining10,000childrenunderage 5.Basedonsimulatingallpossiblepermutationsofgrid squarecomposition,boundbythelimitsimposedbythe originaladministrativeunitsizeandvulnerablepopulationtotals,per-gridsquaremeasuresofspatialuncertaintyincompositionthatrelatedirectlytothegridding methodologycouldbederived.Secondly,theavailability oftheDHSclusterlocationsopensupthepossibilityof estimatingsurfacesofvariableswithassociateduncertainty,andthisisdiscussedbelow.GriddinghouseholdsurveydataTheavailabilityoftheGPScoordinatesofDHSclusters (Figure2(c))haspromptedseveralstudiestoutilize geostatisticalapproachestoderivecontinuousestimated surfacesofvariablesofinterest.However,surveydata aregenerallycollectedtobenationallyrepresentative and,assuch,theirsamplingframesmaynotlendthemselvestofinelyresolvedgeographicgrids.TheDHSprogramhasbeenaleaderincollectingandproviding geocodedinformationofthesurveyclustersinaddition totheirstandarddatafilesinwhichdatacanbetabulatedbyfirst-ordersubnationalregionsaswellasurban/ ruralclassification.Earlyexampleshavedemonstrated thevalueofsuchapproachesforderivingcontinuous mapsofvariablesofinterestfromgeolocatedDHSclusterdata.Forexample,Gemperlietal.[61]investigated spatialpatternsofmalariaendemicityaswellassocioeconomicriskfactorsoninfantmortalityinMaliusinga Bayesianhierarchicalgeostatisticalmodel.Meanwhile, SoaresandClements[16]usedasimilarapproachfor anemiamapping.However,theseapproachesdidnot takeaccountoftheDHSsamplingdesignortherandom spatialdisplacementthatclusterdataundergo,andovercomingtheseissuesshouldbeapriorityforfuture applications[62].Apartfromutilizingtheclusterlocation,thesubnationalregionssupplyinformationthat canbeusedwithmorefinelyresolvedgrids.One approachthatusesspatialcoverageofcensusaggregates combinedwiththeattributebreadthinsurveydatais thatofthePovertyMappingefforts[63].Butthisisnot necessarilytheonlyapproachtoconsider.MigrationandmobilitymappingVerylittleisknownaboutmigrationandmobilitywithin countries,whichmayoccurseasonallyandperiodically aswellaspermanently,exceptthroughcasestudiesand qualitativeplace-specificanalyses.Diseasemodelingand healthmetricderivation,aswellasdemographicanalyses,increasinglyrequireinformationonmigrationand mobility[64-66].Thesedataaretheweaklinkofthe demographicrecord – eventhestockestimatesofsubnationalmigrationhavebeenlargelyignored.Disease modelersoftenwanttoknowaboutdailymovementsratherthandecadalones,butthedecadalmovesmaybe importanttoevaluateforchangestoplace-specificvulnerabilityofresidents.Decadalmovesshouldbeexaminedmorecloselywithexistingsurveyandcensus microdata;characterizingmorefrequentmoveswillrequiredatacollectionmethodsthatdepartfromthe standarddemographictoolkit.Useofnewdata,suchas spatiallocationsderivedfromGPStrackingdevices[67] andcellphoneusage[68]mayshowpromise.However, tobeuseful,methodsforusingthesedataincombinationwithmorestandarddemographicdatawillbe necessary.ConclusionsGrowingtrendsinresearchandfundingfordisease mappingandspatialmodelingtoderivehealthmetrics andguidestrategiesareincreasingneedsforspatial demographicdataofsimilarscopeandqualityforusein estimatingsizesandcharacteristicsofpopulationsat risk.However,existingspatialdemographicdatabases areoftenbasedoncoarseresolutionandoutdatedinput andlackanyconsiderationofpopulationattributemapping.Thesedrawbacksarelikelycontributingtosubstantialuncertaintiesindiseasemodelingandhealth metricoutputs[1].Herewehaveshownthatdatasetsto rectifythisexistbutremainscatteredacrossmultiplerepositoriesandwebsites,requiringcollationintoacentral open-accessdatabasetobecomemorewidelyusedand buildonthestrengthsofeachdatatype,overcoming temporal,spatial,andattributelimitations.Wehaveput forwardabasicdatabasedesignheretoachievethisand laythefoundationsforundertakingdetailedmappingof populationattributesforprovidingspatialdemographic dataindiseasestudies.Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page12of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 13

Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. Author'scontributions AJTconceivedanddesignedthemanuscript.Allauthorscontributedto writingthemanuscriptandhavereadandapprovedthefinalmanuscript. Acknowledgments AJTissupportedbyagrantfromtheBill&MelindaGatesFoundation (#49446),whichalsosupportsDP.AJTacknowledgesfundingsupportfrom theRAPIDDprogramoftheScience&TechnologyDirectorate,Department ofHomelandSecurity,andtheFogartyInternationalCenter,National InstitutesofHealth.CLissupportedbyagrantfromtheFondationPhilippe Wiener-MauriceAnspach.Thispaperistheresultofaworkinggroup meetingheldinMay2011inNewYork,fundedbytheRAPIDDprogramof theScience&TechnologyDirectorate,DepartmentofHomelandSecurity, andtheFogartyInternationalCenter,NationalInstitutesofHealth.AD receivedfinancialsupportfromtheNationalInstituteforChildHealthand HumanDevelopment,GrantNo.R24HD047879.AMNissupportedbya WellcomeTrustIntermediateResearchFellowship(##095127).SAis supportedinpartbyNASAundercontractNNG08HZ11Cforthecontinued operationoftheSocioeconomicDataandApplicationsCenter(SEDAC).CRB issupportedbytheUSAgencyforInternationalDevelopment-funded MEASUREDemographicandHealthSurvey. Authordetails1DepartmentofGeography,UniversityofFlorida,Gainesville,USA.2Emerging PathogensInstitute,UniversityofFlorida,Gainesville,USA.3Fogarty InternationalCenter,NationalInstitutesofHealth,Bethesda,USA.4Centerfor InternationalEarthScienceInformationNetwork(CIESIN),Columbia University,NewYork,USA.5EcologyandEvolutionaryBiology,Princeton University,Princeton,USA.6DemographicandHealthSurveys,International HealthandDevelopmentDivision,ICFInternational,WashingtonDC,USA.7DepartmentofGlobalHealthandPopulation,HarvardSchoolofPublic Health,Boston,USA.8OfficeofPopulationResearchandWoodrowWilson SchoolofPublicandInternationalAffairs,PrincetonUniversity,Princeton, USA.9BiologicalControlandSpatialEcology,UniversitLibredeBruxelles, Brussels,Belgium.10FondsNationaldelaRechercheScientifique(F.R.S.-FNRS), Brussels,Belgium.11ResearchandInformationServicesofNamibia, Windhoek,Namibia.12ThePopulationCouncil,NewYork,USA.13The InternationalRiceResearchInstitute,LosBanos,Philippines.14MalariaPublic HealthandEpidemiologyGroup,CentreforGeographicMedicine,KEMRIUniversityofOxford-WellcomeTrustResearchProgramme,Nairobi,Kenya.15SchoolofPublicAffairs,BaruchCollege,CityUniversityNewYork,New York,USA.16DepartmentofEconomics,StonyBrookUniversity,NewYork, USA. Received:28October2011Accepted:27April2012 Published:16May2012 References1.TatemA,CampizN,GethingP,SnowR,LinardC: Theeffectsofspatial populationdatasetchoiceonestimatesofpopulationatriskofdisease. PopulHealthMetrics 2011, 9: 4. 2.PatilAP,GethingPW,PielFB,HaySI: Bayesiangeostatisticsinhealth cartography:theperspectiveofmalaria. TrendsParasitol 2011, 27: 246 – 253. 3.RileyS: Large-scalespatial-transmissionmodelsofinfectiousdisease. Science 2007, 316: 1298 – 1301. 4.MolesworthAM,ThomsonMC,ConnorSJ,CresswellMP,MorseAP,Shears P,HartCA,CuevasLE: Whereisthemeningitisbelt?Defininganareaat riskofepidemicmeningitisinAfrica. TransRSocTropMedHyg 2002, 96: 242 – 249. 5.HaySI,GuerraCA,GethingPW,PatilAP,TatemAJ,NoorAM,KabariaCW, ManhBH,ElyazarIRF,BrookerSJ, etal : Worldmalariamap: Plasmodium falciparum endemicityin2007. PLoSMed 2009, 6: e1000048. 6.VezzulliL,PruzzoC,HuqA,ColwellRR: EnvironmentalreservoirsofVibrio choleraeandtheirroleincholera. EnvironMicrobiolRep 2010, 2: 27 – 33. 7.JonesKE,PatelNG,LevyMA,StoreyguardA,BalkD,GittlemanJL,DaszakP: Globaltrendsinemerginginfectiousdiseases. Nature 2008, 451: 990 – 994. 8.ViboudC,BjornstadON,SmithDL,SimonsenL,MillerMA,GrenfellBT: Synchrony,waves,andspatialhierarchiesinthespreadofinfluenza. Science 2006, 312: 447 – 451. 9.SmithDL,GuerraCA,SnowRW,HaySI: Standardizingestimatesofthe Plasmodiumfalciparumparasiterate. MalarJ 2007, 6: 131. 10.EggerJR,ColemanPG: Ageandclinicaldengueillness. EmergInfectDis 2007, 13: 924 – 925. 11.MillerE,Cradock-WatsonJE,PollockTM: Consequencesofconfirmed maternalrubellaatsuccessivestagesofpregnancy. Lancet 1982, 2: 781 – 784. 12.PitzerVE,ViboudC,SimonsenL,SteinerC,PanozzoCA,AlonsoWJ,Miller MA,GlassRI,GlasserJW,ParasharUD,GrenfellBT: Demographicvariability, vaccination,andthespatiotemporaldynamicsofrotavirusepidemics. Science 2009, 325: 290– 294. 13.TalaveraA,PerezEM: Ischoleradiseaseassociatedwithpoverty? JInfect DevCtries 2009, 3: 408 – 411. 14.AllisonSP: Malnutrition,diseaseandoutcome. Nutrition 2000, 16: 590 – 593. 15.GethingPW,KiruiVC,AleganaVA,OkiroEA,NoorAM,SnowRW: Estimating thenumberofpaediatricfeversassociatedwithmalariainfection presentingtoAfrica'spublichealthsectorin2007. PLoSMed 2010, 7: e1000301. 16.SoaresMagalhaesRJ,ClementsACA: Mappingtheriskofanaemiain preschool-agechildren:thecontributionofmalnutrition,malariaand helminthinfectionsinWestAfrica. PLoSMed 2011, 8: e1000438. 17.SchurN,HurlimannE,GarbaA,TraoreMS,NdirO,RatardRC,TchuenteLT, KristensenTK,UtzingerJ,VounatsouP: Geostatisticalmodel-based estimatesofschistosomiasisprevalenceamongindividualsaged < 20yearsinWestAfrica. PLoSNeglTropDis 2011, 5: e1194. 18.DeichmannU,BalkD,YetmanG: Transformingpopulationdatafor interdisciplinaryusages:fromcensustogrid. 2001,Documentationfor GPWVersion2availableonlyathttp://sedac.ciesin.columbia.edu/plue/gpw/ GPWdocumentation.pdf. 19.BalkDL,DeichmannU,YetmanG,PozziF,HaySI,NelsonA: Determining globalpopulationdistribution:methods,applicationsanddata. Adv Parasitol 2006, 62: 119 – 156. 20.DobsonJE,BrightEA,ColemanPR,DurfeeRC,WorleyBA: LandScan:a globalpopulationdatabaseforestimatingpopulationsatrisk. PhotogrammEngRemoteSens 2000, 66: 849 – 857. 21.LinardC,GilbertM,SnowRW,NoorAM,TatemAJ: Populationdistribution, settlementpatternsandaccessibilityacrossAfricain2010. PLoSOne 2012, 7: e31743. 22.UnitedNationsPopulationDivision: Worldpopulationprospects,2010 revision .NewYork:UnitedNations;2010. 23.GethingPW,NoorAM,GikandiPW,OgaraEAA,HaySI,NixonMS,SnowRW, AtkinsonPM: Improvingimperfectdatafromhealthmanagement informationsystemsinAfricausingspace-timegeostatistics. PLoSMed 2006, 3: e271. 24.HealthMetricsNetwork: Statisticssavelives:Strengtheningcountryhealth informationsystems .Geneva:WHOHealthMetricsNetwork;2005. 25.MurrayCJL,LopezAD,WibulpolprasertS: Monitoringglobalhealth:Time fornewsolutions. BrMedJ 2004, 329: 1096 – 1100.26.KubiakRJ,ArinaminpathyN,McLeanAR: Insightsintotheevolutionand emergenceofanovelinfectiousdisease. PLoSComputBiol 2010, 6: e1000947. 27.BrookerS,HaySI,BundyDA: Toolsfromecology:usefulforevaluating infectionriskmodels? TrendsParasitol 2002, 18: 70 – 74. 28.FergusonNM,CummingsDAT,CauchemezS,FraserC,RileyS,MeeyaiA, IamsirithawornS,BurkeDS: Strategiesforcontaininganemerging influenzapandemicinSoutheastAsia. Nature 2005, 437: 209 – 214. 29.HaySI,OkiroEA,GethingPW,PatilAP,TatemAJ,GuerraCA,SnowRW: Estimatingtheglobalclinicalburdenof Plasmodiumfalciparum malaria in2007. PLoSMed 2010, 7: e100029. 30.WorldHealthOrganization: TheWorldMalariaReport .Geneva:WorldHealth Organization;2008. 31.CibulskisRE,BellD,ChristophelEM,HiiJ,DelacolletteC,BakyaitaN,Aregawi MW: Estimatingtrendsintheburdenofmalariaatcountrylevel. AmJTrop MedHyg 2007, 77: 133 – 137. 32.LinardC,TatemAJ: Large-scalespatialpopulationdatabasesininfectious diseaseresearch. IntJHealGeogr 2012, 11: 7. 33.JohanssonEW,NewbyH,RenshawM,WardlawT: Malariaandchildren. progressininterventioncoverage .NewYork:UnitedNationsChildren'sFund (UNICEF)/TheRollBackMalariaPartnership(RBM);2007.Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page13of14 http://www.pophealthmetrics.com/Content/10/1/8

PAGE 14

34.RiedelN,VounatsouP,MillerJM,GosoniuL,Chizema-KaweshaE,Mukonka V,SteketeeRW: Geographicalpatternsandpredictorsofmalariariskin Zambia:Bayesiangeostatisticalmodellingofthe2006Zambianational malariaindicatorsurvey(ZMIS). MalarJ 2010, 9: 37. 35.GuerraCA,HowesRE,PatilAP,GethingPW,VanBoeckelTP,TemperleyWH, KabariaCW,TatemAJ,ManhBH,ElyazarIRF, etal : Theinternationallimits andpopulationatriskof Plasmodiumvivax transmissionin2009. PLoS NeglTropDis 2010, 4: e774. 36.GuerraCA,GikandiPW,TatemAJ,NoorAM,SmithDL,HaySI,SnowRW: Thelimitsandintensityof Plasmodiumfalciparum transmission: implicationsformalariacontrolandeliminationworldwide. PLoSMed 2008, 5: e38. 37.BrookerS,MiguelE,WaswaP,NamunyuR,MoulinS,GuyattH,BundyD: ThepotentialofrapidscreeningmethodsforSchistosomamansoniin westernKenya. AnnTropMedParasitol 2001, 95: 343 – 351. 38.BrookerS,BeasleyM,NdinaromtanM,MadjiouroumEM,BaboguelM, DjenguinabeE,HaySI,BundyDA: Useofremotesensinganda geographicalinformationsysteminanationalhelminthcontrol programmeinChad. BullWorldHealthOrgan 2002, 80: 783 – 789. 39.KabatereineN,BrookerS,TukahebwaE,KazibweF,OnapaA: Epidemiology andgeographyofSchistosomamansoniinUganda:implicationsfor planningcontrol. TropMedIntHealth 2004, 9: 372. 40.ClementsACA,FirthS,DembeleR,GarbaA,ToureS,SackoM,LandoureA, Bosque-OlivaE,BarnettAG,BrookerS,FenwickA: UseofBayesian geostatisticalpredictiontoestimatelocalvariationsinSchistosoma haematobiuminfectioninwesternAfrica. BullWorldHealthOrgan 2009, 87: 921 – 929. 41.BrookerSJ,ClementsACA,HotezPJ,HaySI,TatemAJ,BundyDAP,Snow RW: Theco-distributionof Plasmodiumfalciparum andhookworm amongAfricanschoolchildren. MalarJ 2006, 5: 99. 42.PullanRL,GethingPW,SmithJL,MwandawiroCS,SturrockHJ,GitongaCW, HaySI,BrookerS: Spatialmodellingofsoil-transmittedhelminth infectionsinKenya:adiseasecontrolplanningtool. PLoSNeglTropDis 2011, 5: e958. 43.BrookerS,ClementsAC,BundyDA: Globalepidemiology,ecologyand controlofsoil-transmittedhelminthinfections. AdvParasitol 2006, 62: 221 – 261. 44.BrookerS,HotezPJ,BundyDA: Hookworm-relatedanaemiaamong pregnantwomen:asystematicreview. PLoSNeglTropDis 2008,2: e291. 45.DellicourS,TatemAJ,GuerraCA,SnowRW,terKuileFO: Quantifyingthe numberofpregnanciesatriskofmalariain2007:ademographicstudy. PLoSMed 2010, 7: e1000221. 46.vanEijkA,HillJ,AleganaV,KiruiV,GethingP,terKuileF,SnowR: Coverage ofmalariaprotectioninpregnantwomeninsub-SaharanAfrica:a synthesisandanalysisofnationalsurveydata. LancetInfectDis 2011, 11: 190 – 207. 47.FischerE,PahanD,ChowdhuryS,RichardusJ: Thespatialdistributionof leprosycasesduring15yearsofaleprosycontrolprogramin Bangladesh:anobservationalstudy. BMCInfectDis 2008, 8: 126. 48.KalipeniE,ZuluLC: HIVandAIDSinAfrica:ageographicanalysisat multiplespatialscales. GeoJournal 2010,doi:10.1007/s10708-010-9358-6. 49.ChaoDL,HalloranME,LonginiIMJr: Vaccinationstrategiesforepidemic cholerainHaitiwithimplicationsforthedevelopingworld. ProcNatl AcadSciUSA 2011, 108: 7081 – 7085. 50.FergusonNM,CummingsDAT,FraserC,CajkaJC,CooleyPC,BurkeDS: Strategiesformitigatinganinfluenzapandemic. Nature 2006, 442: 448 – 452. 51.RakowskiF,GruzielM,Bieniasz-KrywiecL,RadomskiJP: Influenzaepidemic spreadsimulationforPoland-alargescale,individualbasedmodel study. PhysicaA:StatisticalMechanicsanditsApplications 2010, 389: 3149 – 3165. 52.RaoDM,ChernyakhovskyA,RaoV: Modelingandanalysisofglobal epidemiologyofavianinfluenza. EnvironModelSoftw 2009, 24: 124 – 134. 53.BalcanD,ColizzaV,GoncalvesB,HuH,RamascoJJ,VespignaniA: Multiscalemobilitynetworksandthespatialspreadingofinfectious diseases. ProcNatlAcadSci 2009, 106: 21484 – 21489. 54.DyeC: Healthandurbanliving. Science 2008, 319: 766 – 769. 55.UnitedNationsPopulationDivision: Worldurbanizationprospects,2009 revision .NewYork:UnitedNations;2009. 56.TatemAJ,HaySI: Measuringurbanizationpatternandextentformalaria research:areviewofremotesensingapproaches. JUrbanHealth 2004, 81: 363 – 376. 57.TatemAJ,NoorAM,HaySI: Assessingtheaccuracyofsatellitederived globalandnationalurbanmapsinKenya. RemoteSensEnviron 2005,96: 87 – 97. 58.SchneiderA,FriedlMA,PotereD: Mappingglobalurbanareasusing MODIS500-mdata:Newmethodsanddatasetsbasedon'urban ecoregions'. RemoteSensEnviron 2010, 114: 1733 – 1746. 59.BalkD,MontgomeryM,McGranahanG,KimD,MaraV,ToddM,BuettnerT, DorelienA: Mappingurbansettlementsandtherisksofclimatechange inAfrica,AsiaandSouthAmerica .In Populationdynamicsandclimate change .EditedbyMartineG,GuzmanJ-M,McGranahanG,SchensulD, TacoliC.NewYork:UNPD;2009:88 – 103. 60.KimD: Econometricmodelingofcitypopulationgrowthindeveloping countries .NewYork:StateUniversityof;2011. 61.GemperliA,VounatsouP,KleinschmidtI,BagayokoM,LengelerC,SmithT: SpatialpatternsofinfantmortalityinMali:theeffectofmalaria endemicity. AmJEpidemiol 2004, 159: 64 – 72. 62.ChinB,MontanaL,BasaganaX: Spatialmodelingofgeographic inequalitiesinchildmortalityacrossNepal. HealthPlace 2011, 17: 929 – 936. 63.ElbersC,LanjouwJ,LanjouwP: Micro-levelestimationofpovertyand inequality. Econometrica 2003, 71: 355 – 386. 64.ProtheroRM: Populationmovementsandtropicalhealth. GlobalChange andHumanHealth 2002, 3: 20 – 32. 65.StoddardS,MorrisonA,Vazquez-ProkopecG,Paz-SoldanV,KochelT,Kitron U,ElderJ,ScottT: Theroleofhumanmovementinthetransmissionof vector-bornepathogens. PLoSNeglTropDis 2010, 3: e481. 66.TatemAJ,SmithDL: Internationalpopulationmovementsandregional Plasmodiumfalciparum malariaeliminationstrategies. ProcNatlAcadSci 2010, 107: 12222 – 12227. 67.Paz-SoldanV,StoddardS,Vazquez-ProkopecG,MorrisonA,ElderJ,KitronU, KochelT,ScottT: AssessingandMaximizingtheAcceptabilityofGPS DeviceUseforStudyingtheRoleofHumanMovementinDengueVirus TransmissioninIquitos,Peru. AmJTropMedHyg 2010, 82: 723 – 730. 68.TatemA,QiuY,SmithD,SabotO,AliA,MoonenB: Theuseofmobile phonedatafortheestimationofthetravelpatternsandimported Plasmodiumfalciparum ratesamongZanzibarresidents. MalarJ 2009, 8:287.doi:10.1186/1478-7954-10-8 Citethisarticleas: Tatem etal. : Mappingpopulationsatrisk:improving spatialdemographicdataforinfectiousdiseasemodelingandmetric derivation. PopulationHealthMetrics 2012 10 :8. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Tatem etal.PopulationHealthMetrics 2012, 10 :8 Page14of14 http://www.pophealthmetrics.com/Content/10/1/8


xml version 1.0 encoding utf-8 standalone no
mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS
xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance
xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd
metsHdr CREATEDATE 2012-11-01T16:05:09
agent ROLE CUSTODIAN TYPE ORGANIZATION
name BioMed Central
dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1
mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml
xmlData
epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI
http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd
epdcx:description epdcx:resourceId sword-mets-epdcx-1
epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork
http:purl.orgdcelements1.1title
epdcx:valueString Mapping populations at risk: improving spatial demographic data for infectious disease modeling and metric derivation
http:purl.orgdctermsabstract
Abstract
The use of Global Positioning Systems (GPS) and Geographical Information Systems (GIS) in disease surveys and reporting is becoming increasingly routine, enabling a better understanding of spatial epidemiology and the improvement of surveillance and control strategies. In turn, the greater availability of spatially referenced epidemiological data is driving the rapid expansion of disease mapping and spatial modeling methods, which are becoming increasingly detailed and sophisticated, with rigorous handling of uncertainties. This expansion has, however, not been matched by advancements in the development of spatial datasets of human population distribution that accompany disease maps or spatial models.
Where risks are heterogeneous across population groups or space or dependent on transmission between individuals, spatial data on human population distributions and demographic structures are required to estimate infectious disease risks, burdens, and dynamics. The disease impact in terms of morbidity, mortality, and speed of spread varies substantially with demographic profiles, so that identifying the most exposed or affected populations becomes a key aspect of planning and targeting interventions. Subnational breakdowns of population counts by age and sex are routinely collected during national censuses and maintained in finer detail within microcensus data. Moreover, demographic and health surveys continue to collect representative and contemporary samples from clusters of communities in low-income countries where census data may be less detailed and not collected regularly. Together, these freely available datasets form a rich resource for quantifying and understanding the spatial variations in the sizes and distributions of those most at risk of disease in low income regions, yet at present, they remain unconnected data scattered across national statistical offices and websites.
In this paper we discuss the deficiencies of existing spatial population datasets and their limitations on epidemiological analyses. We review sources of detailed, contemporary, freely available and relevant spatial demographic data focusing on low income regions where such data are often sparse and highlight the value of incorporating these through a set of examples of their application in disease studies. Moreover, the importance of acknowledging, measuring, and accounting for uncertainty in spatial demographic datasets is outlined. Finally, a strategy for building an open-access database of spatial demographic data that is tailored to epidemiological applications is put forward.
http:purl.orgdcelements1.1creator
Tatem, Andrew J
Adamo, Susana
Bharti, Nita
Burgert, Clara R
Castro, Marcia
Dorelien, Audrey
Fink, Gunter
Linard, Catherine
John, Mendelsohn
Montana, Livia
Montgomery, Mark R
Nelson, Andrew
Noor, Abdisalan M
Pindolia, Deepa
Yetman, Greg
Balk, Deborah
http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1
http:purl.orgeprintentityTypeExpression
http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066
en
http:purl.orgeprinttermsType
http:purl.orgeprinttypeJournalArticle
http:purl.orgdctermsavailable
epdcx:sesURI http:purl.orgdctermsW3CDTF 2012-05-16
http:purl.orgdcelements1.1publisher
BioMed Central Ltd
http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus
http:purl.orgeprintstatusPeerReviewed
http:purl.orgeprinttermscopyrightHolder
Andrew J Tatem et al.; licensee BioMed Central Ltd.
http:purl.orgdctermslicense
http://creativecommons.org/licenses/by/2.0
http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights
http:purl.orgeprintaccessRightsOpenAccess
http:purl.orgeprinttermsbibliographicCitation
Population Health Metrics. 2012 May 16;10(1):8
http:purl.orgdcelements1.1identifier
http:purl.orgdctermsURI http://dx.doi.org/10.1186/1478-7954-10-8
fileSec
fileGrp sword-mets-fgrp-1 USE CONTENT
file sword-mets-fgid-0 sword-mets-file-1
FLocat LOCTYPE URL xlink:href 1478-7954-10-8.xml
sword-mets-fgid-1 sword-mets-file-2 applicationpdf
1478-7954-10-8.pdf
structMap sword-mets-struct-1 structure LOGICAL
div sword-mets-div-1 DMDID Object
sword-mets-div-2 File
fptr FILEID
sword-mets-div-3


!DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd'
ui 1478-7954-10-8
ji 1478-7954
fm
dochead Review
bibl
title
p Mapping populations at risk: improving spatial demographic data for infectious disease modeling and metric derivation
aug
au id A1 ca yes snm Tatemmi Jfnm Andrewinsr iid I1 I2 I3 email andy.tatem@gmail.com
A2 AdamoSusanaI4 sadamo@ciesin.columbia.edu
A3 BhartiNitaI5 nbharti@princeton.edu
A4 BurgertRClaraI6 CBurgert@icfi.com
A5 CastroMarciaI7 mcastro@hsph.harvard.edu
A6 DorelienAudreyI8 dorelien@Princeton.EDU
A7 FinkGuntergfink@hsph.harvard.edu
A8 LinardCatherineI9 I10 linard.catherine@gmail.com
A9 JohnMendelsohnI11 john@raison.com.na
A10 MontanaLiviaMontana.livia@gmail.com
A11 MontgomeryRMarkI12 I16 mmontgomery@popcouncil.org
A12 NelsonAndrewI13 dr.andy.nelson@gmail.com
A13 NoorMAbdisalanI14 anoor@nairobi.kemri-wellcome.org
A14 PindoliaDeepadpindolia@gmail.com
A15 YetmanGreggyetman@ciesin.columbia.edu
A16 BalkDeborahI15 deborah.balk@baruch.cuny.edu
insg
ins Department of Geography, University of Florida, Gainesville, USA
Emerging Pathogens Institute, University of Florida, Gainesville, USA
Fogarty International Center, National Institutes of Health, Bethesda, USA
Center for International Earth Science Information Network (CIESIN), Columbia University, New York, USA
Ecology and Evolutionary Biology, Princeton University, Princeton, USA
Demographic and Health Surveys, International Health and Development Division, ICF International, Washington DC, USA
Department of Global Health and Population, Harvard School of Public Health, Boston, USA
Office of Population Research and Woodrow Wilson School of Public and International Affairs, Princeton University, Princeton, USA
Biological Control and Spatial Ecology, Université Libre de Bruxelles, Brussels, Belgium
Fonds National de la Recherche Scientifique (F.R.S.-FNRS), Brussels, Belgium
Research and Information Services of Namibia, Windhoek, Namibia
The Population Council, New York, USA
The International Rice Research Institute, Los Banos, Philippines
Malaria Public Health and Epidemiology Group, Centre for Geographic Medicine, KEMRI University of Oxford Wellcome Trust Research Programme, Nairobi, Kenya
School of Public Affairs, Baruch College, City University New York, New York, USA
Department of Economics, Stony Brook University, New York, USA
source Population Health Metrics
issn 1478-7954
pubdate 2012
volume 10
issue 1
fpage 8
url http://www.pophealthmetrics.com/Content/10/1/8
xrefbib pubidlist pubid idtype doi 10.1186/1478-7954-10-8pmpid 22591595
history rec date day 28month 10year 2011acc 2742012pub 1652012
cpyrt 2012collab Tatem et al.; licensee BioMed Central Ltdnote .This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
kwdg
kwd Population
Epidemiology
Demography
Disease mapping
abs
sec
st
Abstract
The use of Global Positioning Systems (GPS) and Geographical Information Systems (GIS) in disease surveys and reporting is becoming increasingly routine, enabling a better understanding of spatial epidemiology and the improvement of surveillance and control strategies. In turn, the greater availability of spatially referenced epidemiological data is driving the rapid expansion of disease mapping and spatial modeling methods, which are becoming increasingly detailed and sophisticated, with rigorous handling of uncertainties. This expansion has, however, not been matched by advancements in the development of spatial datasets of human population distribution that accompany disease maps or spatial models.
Where risks are heterogeneous across population groups or space or dependent on transmission between individuals, spatial data on human population distributions and demographic structures are required to estimate infectious disease risks, burdens, and dynamics. The disease impact in terms of morbidity, mortality, and speed of spread varies substantially with demographic profiles, so that identifying the most exposed or affected populations becomes a key aspect of planning and targeting interventions. Subnational breakdowns of population counts by age and sex are routinely collected during national censuses and maintained in finer detail within microcensus data. Moreover, demographic and health surveys continue to collect representative and contemporary samples from clusters of communities in low-income countries where census data may be less detailed and not collected regularly. Together, these freely available datasets form a rich resource for quantifying and understanding the spatial variations in the sizes and distributions of those most at risk of disease in low income regions, yet at present, they remain unconnected data scattered across national statistical offices and websites.
In this paper we discuss the deficiencies of existing spatial population datasets and their limitations on epidemiological analyses. We review sources of detailed, contemporary, freely available and relevant spatial demographic data focusing on low income regions where such data are often sparse and highlight the value of incorporating these through a set of examples of their application in disease studies. Moreover, the importance of acknowledging, measuring, and accounting for uncertainty in spatial demographic datasets is outlined. Finally, a strategy for building an open-access database of spatial demographic data that is tailored to epidemiological applications is put forward.
bdy
Introduction
The spatial modeling and mapping of diseases is increasingly being undertaken to derive health metrics, guide intervention strategies, and advance epidemiological understanding
abbrgrp
abbr bid B1 1
. This has been driven by a recognition of the spatial, temporal, and demographic heterogeneities in disease risk (Table
tblr tid T1 1), and has resulted in significant recent methodological advances (for example
B2 2
B3 3
). Moreover, the need for estimates of populations at risk to guide funding allocations within the Millennium Development Goals (MDGs:
http://www.un.org/millenniumgoals) framework continues to drive the development of disease mapping and modeling approaches.
table
Table 1
caption
b Heterogeneities in disease risks
tgroup align left cols 2
colspec colname c1 colnum 1 colwidth 1*
c2
thead valign top
row rowsep
entry
Heterogeneity type
Background information and examples
tfoot
Disease morbidity, mortality, and speed of spread vary substantially with demographic profiles, with clear risk groups and vulnerable populations existing. These have important implications for planning and targeting intervention strategies. The risk of pathogen infection to host populations exists at two spatial levels. First, there is a probability of initial exposure of a population to a pathogen, which defines the population risk. Second, there is a probability of transmission of a disease within a population, which defines the individual risk. Within these epidemic and endemic classifications, the implications for interventions vary across disease landscapes dependent upon the host-pathogen relationships.
tbody
Spatial
Understanding relevant spatial heterogeneities underlies our ability to map host risk of pathogen exposure. Predictions of disease importation or emergence are limited by our ability to distinguish disease-specific hotspots from continuous risk surfaces. Spatial variation in risk is defined by the specific biology of each host-pathogen relationship. Epidemiologically relevant spatial heterogeneities can be highly specific to each infection and must be correctly identified within the proper context of the ecology and landscape of each host-pathogen relationship. Spatial heterogeneities that impact risk profiles for exposure to a pathogen include large-scale environmental factors, such as temperature, access to water, and rainfall abundance, which can affect host susceptibility (e.g. within the African meningitis belt
B4 4
), host exposure (e.g. proximity to malaria vector habitats
B5 5
), and pathogen viability (e.g. cholera survival in the environment
B6 6
). Within a population, the transmission events of infections drive the spatial progression of an outbreak after the initial exposure to the pathogen has already taken place. Transmission events are rarely observed and risk profiles must be constructed using proxies for transmission, again highlighting characteristics specific to each host-pathogen relationship. Risk profiles for directly transmitted diseases focus on host contacts between infectious and susceptible individuals. Important components of these contacts are host density, susceptibility, and mobility. Each of these factors can also be defined across spatial scales, from within household contact patterns to settlement-level risk factors. Urban and rural residence can be thought of as a basic (yet dichotomized) spatial heterogeneity that is closely associated with density and landscape, but typically urbanization has not been defined in spatial terms. Similarly, transmission of vector-mediated infections is impacted by spatial heterogeneities at the household and community level determined by host density, prevention measures, vector mobility and vector abundance. Spatial patterns of environmentally mediated infections will also be determined by the host-pathogen relationship.
Temporal
Epidemiologically important temporal heterogeneities will also be specific to each infection. For emerging infections, long-term changes in host settlements, habitat loss, and changing levels of interactions between humans and animal species interactions can define the risk of disease emergence over time
B7 7
(e.g. ebola, SARS, monkeypox, HIV, H1N1 and H5N1 influenza). In other situations, seasonal and environmental factors may determine the population level risk of pathogen exposure (e.g. malaria vector habitats, hyperendemic areas of meningitis). Short-term risk of infection, or transmission of a pathogen within a population, is determined by the biology of the relationships between the host, pathogen and vector. These relationships establish the host susceptibility and infectious periods, and therefore the risk of transmission events. Population level susceptibility profiles (natural or derived) vary across temporal scales with respect to prior exposure and preventative measures. Temporal likelihood of transmission will be determined by length of exposure, and changes in abundance and susceptibility of the host and vector. Exposure and contact rates (density, migration) over the course of a day (as in commuter patterns for influenza
B8 8
) are additional examples of temporal heterogeneities in transmission likelihood and risk across temporal scales.
Demographic and Socioeconomic
Susceptibility and transmissibility of infectious disease vary across differing demographic and socioeconomic groups due to differences in immunity, mobility, contact patterns and health status. Small-scale variations in socioeconomic and demographic factors can have a large influence on the geographical variation of infections compared to environmental factors. Age represents one of the most significant factors, with risk of morbidity and mortality of many diseases varying substantially across age groups. These include large variations in mortality and morbidity by age for malaria
B9 9
and for clinical attack risk for dengue
B10 10
. Heterogeneities in susceptibility and transmissibility also exist between the sexes, and especially during childbearing age for women, when pregnancy increases the risks of death for both the mother and fetus, and are important for diseases such as congenital rubella syndrome (CRS)
B11 11
. At a population scale, differences in vital rates such as birth rates create heterogeneities in disease risk across space and time, as evidenced by rotavirus in the US
B12 12
. For macro-parasite infections, such as helminths, in addition to environmental risk factors, the population at risk often depends on socioeconomic profiles and access to key infrastructure (housing quality, adequate sanitation and drinking water). For micro-parasite infections with human-to-human transmission, risk is again associated with individual socioeconomic attributes, but also with community/neighborhood attributes. In other words, the concentration of poverty or poor sanitation services increase risk, as evidenced by cholera outbreaks
B13 13
. Finally, in addition to information on poverty status, knowledge of nutritional status is important; malnutrition can increase (i) susceptibility to many infectious diseases, (ii) the period of infectiousness (by reducing immune function and delaying recovery) and (iii) disease associated mortality
B14 14
.
Given the high degrees of individual and local heterogeneity within geographic regions or administrative units, effective policy design requires a detailed knowledge of the spatial distribution of relevant population attributes of interest, including size, age, gender, income, nutritional status, vaccination rates, or child mortality. From a public health perspective, detailed spatial datasets not only allow investigation of the relationship between policy inputs and individual-specific outcomes, but also build detailed and realistic predictive models and derive suites of health metrics. Disease mapping and spatial modeling studies have become increasingly detailed and sophisticated, with rigorous handling of uncertainties built in, but are limited when it comes to estimating populations at risk. Detailed spatial datasets on population distributions now exist, but maps of other demographic and socioeconomic characteristics to identify vulnerable subgroups remain lacking. To quantify these spatial variations in population attributes, recent high-impact studies have had to overlook subnational demographic variations in characteristics and rely on applying simple national-scale adjustments, e.g.
5
B15 15
B16 16
B17 17
.
The availability of high-resolution population data has increased dramatically through a series of global population mapping efforts over the past 15 years. Initially restricted to a few countries, location-specific population numbers have been made available for the globe over the past decades through the combined efforts of projects like the Gridded Population of the World (GPW)
B18 18
, the Global Rural Urban Mapping Project (GRUMP)
B19 19
, LandScan
B20 20
, and AfriPop
B21 21
(
http://www.afripop.org). All databases are in the public domain and allow individuals, companies, researchers, and policymakers to access population data either by administrative units or by user-specified geographic boundaries of interest. While the generation of these comprehensive population databases clearly constitutes a major achievement from a scientific perspective, two main factors limit the degree to which these databases can be used for research as well as for policy and planning: limited time frames and limited information on population attributes of interest. The first limitation is mostly the result of the irregular collection of detailed population data as well as the effort required in compiling global datasets at any given point in time. Given that most countries independently collect full censuses only once per decade and data sharing is complicated by a large set of copyright issues, most current population databases contain population data only on a five- or 10-year basis. When analyses warrant data for noncensus years, national growth rates
B22 22
, subnational growth rates from National Statistical Offices, or interpolation between available data points may be applied to produce estimates for intermediate years, as annual population fluctuations are generally limited.
The second constraint is more critical: little is known about characteristics of the underlying populations being mapped in detail. From a planning or research perspective, these factors can be of critical importance, as outlined in Table
1. Various freely available datasets exist to facilitate mapping improvements and add significant value to epidemiological analyses, but these remain scattered across different sources and require processing to be integrated into mapping. Here we review these sources of more detailed, contemporary, freely available, and relevant spatial demographic data, focusing on low-income regions of the world where disease burden is highest, and put forward a strategy for building an open-access database to link the various datasets, tailored to epidemiological applications.
Usages of spatial demographic data in epidemiology
Population distribution datasets constitute an essential denominator required for many infectious disease studies. It is well known that disease transmission is spatially focal and heterogeneous (Table
1), partially due to the clustered nature of population distribution. The epidemiology of many diseases makes surveillance-based methods (reliant upon reporting from health facilities) for estimating populations at risk and disease burden problematic, particularly in low-income regions
B23 23
B24 24
B25 25
, while spatial heterogeneity in human population distribution can produce significant effects on transmission
3
B26 26
. Cartographic and spatial modeling approaches have proven to be effective in tackling these factors (e.g.,
B27 27
B28 28
B29 29
). Such approaches can help characterize large-scale patterns of disease spread to evaluate intervention impact
3
and produce globally consistent measures of morbidity of known fidelity, which often represent the only plausible method in many African countries where surveillance data is incomplete, unreliable, and inconsistent
23
B30 30
B31 31
. As the precision and detail of disease risk mapping and modeling improves, spatial population datasets that capture these patterns are therefore required if populations at risk are to be more accurately quantified and disease spread among populations is realistically modeled for prediction and prevention purposes. Uses of gridded population count data in epidemiological studies are documented in Tatem et al.
1
and Linard and Tatem
B32 32
, and here we focus on studies that have attempted to incorporate spatial data on population subgroups.
Applications of gridded population datasets in epidemiology have involved estimating numbers of clinical cases, modeling the spatial progression of an epidemic, risk mapping and assessing the effects of urbanization, and the study of diseases ranging from dengue and yellow fever to HIV and leprosy. The majority of spatial modeling approaches of infectious diseases have been based on the environmental correlates of infection, due in part to the availability of high spatial resolution environmental data and relative paucity of spatial socioeconomic and demographic data. The most widespread uses of gridded population datasets in an epidemiological context have been in the study of malaria. Global spatial demographic datasets have been used to estimate populations at risk of malaria, which forms a fundamental metric for decision-makers at national and international levels
30
B33 33
. While approaches for mapping malaria have become increasingly sophisticated (e.g.,
2
), those for mapping population distributions have not kept pace, especially in low-income regions
1
, where detailed spatial information on population composition is rarely available or utilized.
Previous studies that have aimed to enumerate vulnerable population subgroups at risk for different diseases have solely focused on utilizing simplistic national-level adjustments. The malaria burden in children under 5 years old was recently estimated based on a Zambia-wide survey and LandScan population data adjusted by a national-level estimate of the proportion of under-5 children
B34 34
. Similarly, the numbers at risk of malaria globally in different age groups were estimated by applying national-level adjustments to GRUMP data
5
29
B35 35
B36 36
. Models of disease prevalence were overlaid onto population density maps adjusted by national-level proportions again to quantify school-age children and young adults at risk of schistosomiasis
17
B37 37
B38 38
B39 39
B40 40
and hookworm
38
B41 41
B42 42
B43 43
and the number of pregnant women infected with hookworm in sub-Saharan Africa
B44 44
. Specific estimates of populations at risk of malaria for pregnant women and children have also been derived from these maps, by combining GRUMP data with national-scale age, sex, and fertility data from the United Nations Population Division
15
B45 45
B46 46
. Finally, the numbers of children under 5 with anemia in West Africa were estimated using similar techniques
16
. In all of these examples outlined here, the problems of overlooking subnational variations in population through the national-level adjustments applied are illustrated in an example in the next section.
Spatio-temporal transmission models aim to simulate contacts between infectious and susceptible individuals and estimate the spatial spread of the disease. This helps to identify areas and times at risk of disease and assists in planning targeted interventions
B47 47
B48 48
. Sophisticated spatially explicit models have been developed to study the spatial progression of infectious diseases. Many of such spatially explicit models have made use of gridded population datasets as input data
3
B49 49
. Gridded population data have also been used to develop agent-based simulation models at the regional level
28
B50 50
B51 51
and at the global level
B52 52
B53 53
. Whatever the spatial approach for modeling, population data are essential as these models, which generally require the generation of a virtual society with an appropriate distribution and composition of people
3
. Gridded data are preferred by these models in that the gridding process removes the irregularity associated with the native administrative units in which these data were initially reported and thereby makes the data more flexible for use with a variety of other spatial units or features. In addition, global (or continent-level) gridded population data provide valuable input datasets mainly because of their wide coverage, consistent spatial resolution, and availability in the public domain. Notably missing as discussed above is information on population attributes. This represents a limitation for models that can be substantially improved through the incorporation of realistic population attributes to build ‘synthetic’ populations. Previous studies have had to rely on national-level statistics or the application of census-derived attributes from one country applied to multiple others (e.g.,
28
).
Improving estimates of children under 5 years at risk of it Plasmodium falciparum malaria
The lack of availability of subnational spatial datasets on specific population groups that are particularly vulnerable to P. falciparum malaria has meant that simple national-level adjustments have been applied in influential studies to estimate the spatial distributions of, for example, children under 5
15
or pregnant women
45
46
at risk. To illustrate the importance of mapping vulnerable populations to a level of spatial detail approaching that now used in disease mapping, here we compare estimates of under-5 children at risk of P. falciparum malaria in Tanzania in 2007 using transmission risk classes (Figure
figr fid F1 1(a))
5
overlaid onto a population distribution map (
http://www.afripop.org) adjusted to represent children under 5 by (i) applying a single nationwide percentage adjustment as defined by the UN’s World Population Prospects
22
(as undertaken in
15
– for Tanzania, percentage under 5 is estimated to be 17.9%) and (ii) applying per-district proportions of under-5 children derived from ward-level census data (Figure
1(b)).
fig Figure 1For Tanzania in 2007text
For Tanzania in 2007: (a) P. falciparum malaria transmission classes (adapted from Hay5, measured by P. falciparum Parasite Rate (PfPR), (b) percentage of residents under 5 years of age by ward, (c) percentage differences in estimates of number of children under 5 at risk of the highest transmission class by national- vs. ward-level adjustments.
graphic file 1478-7954-10-8-1
Table
T2 2 shows the difference in estimates of under-5 children residing in each transmission class, with large percentage differences found in each transmission class. We do not examine the spatial patterns of differences here, as this is beyond the scope of this analysis, but they remain an interesting area for exploration
1
. Overall, the adjustments from finer-scale age distributions indicate that national-level estimates substantially overestimate numbers at risk. Whereas Table
2 summarizes the overestimates by transmission classes, Figure
1(c) shows the spatial pattern of the misestimation in the highest transmission class. It shows the percentage differences obtained in estimates of under-5 children at risk of PfPR > 40% transmission level (mapped in Figure
1(a)) resulting from use of the ward-level map of children under 5 years rather than from applying a single nationwide adjustment. Most of these wards show differences above 25%, and several have discrepancies of greater than 100%. These malaria transmission maps and the populations at risk estimates derived from them are increasingly being used to guide planning, policy, and control. Such substantial differences in estimates of populations at risk, achievable through the use of an improved spatial demographic composition data, illustrate the urgent need to develop spatial databases of vulnerable populations.
Table 2
Estimates of numbers of children under 5 at risk of
P. falciparum
malaria in Tanzania using the two differing demographic methods described in the text
4
c3 3
c4
Transmission Class
U5PAR1: UN Nationwide adjusted
U5PAR2: Census unit adjustments
Percentage change from U5PAR1 to U5PAR2
U5PAR = Under-5 population at risk.
PfPR < 5%
770547
650174
−15.62175961
PfPR = 5–40%
4315638
3383040
−21.6097365
PfPR > 40%
773992
630518
−18.5368841
Spatial demographic data to meet needs
From an epidemiological and health metrics perspective, fundamental characteristics are age and sex. The most commonly needed age-sex specific groups in developing countries are: infants, children under 5, women of childbearing ages, and the elderly (Table
1). More specific needs might require the population of pregnant women, young adults, or urban children. Even though these numbers can generally be approximated by multiplying total population numbers by estimated national population fractions, the large epidemiologically important heterogeneity in population composition generated by migration and differential mortality and birth rates within countries and regions, and particularly between urban and rural residents, is likely to induce substantial degrees of imprecision in resultant output metrics (see previous section). The problem becomes even more severe when researchers or policy makers are primarily interested in nondemographic aspects of the population. In many cases, the main variable of interest may be a fraction of the population with certain health or behavioral characteristics: the number of children not vaccinated, the number of women without access to contraceptives, the number of children not going to school or not receiving formal health care. Many of these characteristics are not census-based, but rather can be ascertained through survey data, an aspect that we shall discuss in further detail below. Clearly it is not feasible for global population databases to generate on-demand maps for each of these factors on a regular basis, nevertheless the potential to leverage current freely available population databases appears large. Table
T3 3 documents the principal datasets that are readily available without cost across multiple countries to achieve this.
Table 3
Sources of freely available spatial demographic data
5
c5
Data (standard survey name)/source
Time intervals
Typical spatial coverage
Typical strata
Relevant variables
Census
National Statistical Offices
Typically 10 years
Census enumerator area or courser level
Urban/rural, race or ethnic groups (often)
Sex, age, education, migration status, household and dwelling characteristics
Census Microdata
https://international.ipums.org/international/
Typically 10 years
Admin 1-3
Urban/rural
Household and dwelling characteristics, sex, age, education, migration status, children ever born, children surviving
DHS (Demographic and Health Survey)
Household, women 15–49, men 15–59, children born in the last five years
http://www.measuredhs.com/
Varies by country, typically every 5 years
National, Admin 1/region, GPS coordinates of cluster locations for most recent surveys (last 15 years)
Urban/rural
Household and dwelling characteristics, sex, age, education, maternal and child health, fertility and full birth history, family planning, domestic violence, biomarkers, nutrition
MICS (Multi-indicator cluster survey)
http://www.unicef.org/statistics/index_24302.html
UNICEF (Round 2, 1999–2001; round 3 2005–2007; round 4 is in the field 2009–present)
National, Admin 1
Urban/rural
Household and dwelling characteristics, sex, age, education, status, maternal and child health, child labor, domestic violence, summary birth history, anthropometry
LSMS (Living Standard Measure Survey)
(Integrated Household Budget Survey and many others that are locally adapted)
http://iresearch.worldbank.org/lsms/lsmssurveyFinder.htm
Irregular
National, Admin 1, some GPS coordinates
Urban/rural
Household and dwelling characteristics, sex, age, education, migration status,consumption, expenditures, income, nutrition,anthropometry, summary birth history
MIS (Malaria Indicator Survey)
http://www.measuredhs.com/
http://www.malariasurveys.org/
Varies by country, typically every 3 years
National, Admin 1/region, GPS coordinates of cluster locations for some surveys (last five years)
Urban/rural
Household and dwelling characteristics, sex, age, education, biomarkers
AIS (AIDS Indicator Survey)
http://www.measuredhs.com/
Varies by country, typically every 3 years
National, Admin 1/region, GPS coordinates of cluster locations for some surveys (last eight years)
Urban/rural
Household and dwelling characteristics, sex, age, education, biomarkers
DHS (Demographic and Health Survey)
Household, women 15–49, men 15–59, children born in the last five years
http://www.measuredhs.com/
Varies by country, typically every 5 years
National, Admin 1/region, GPS coordinates of cluster locations for most recent surveys (last 15 years)
Urban/rural
Household and dwelling characteristics, sex, age, education, maternal and child health, fertility and full birth history, family planning, domestic violence, biomarkers, nutrition
MICS (Multi-indicator cluster survey)
http://www.unicef.org/statistics/index_24302.html
UNICEF (Round 2, 1999–2001; round 3 2005–2007; round 4 is in the field 2009-present)
National, Admin 1
Urban/rural
Household and dwelling characteristics, sex, age, education, status, maternal and child health, child labor, domestic violence, summary birth history, anthropometry
LSMS (Living Standard Measure Survey)
(Integrated Household Budget Survey and many others that are locally adapted)
http://iresearch.worldbank.org/lsms/lsmssurveyFinder.htm
Irregular
National, Admin 1, some GPS coordinates
Urban/rural
Household and dwelling characteristics, sex, age, education, migration status, consumption, expenditures, income, nutrition, anthropometry, summary birth history
MIS (Malaria Indicator Survey)
http://www.measuredhs.com/
http://www.malariasurveys.org/
Varies by country, typically every 3 years
National, Admin 1/region, GPS coordinates of cluster locations for some surveys (last five years)
Urban/rural
Household and dwelling characteristics, sex, age, education, biomarkers
AIS (AIDS Indicator Survey)
http://www.measuredhs.com/
Varies by country, typically every 3 years
National, Admin 1/region, GPS coordinates of cluster locations for some surveys (last eight years)
Urban/rural
Household and dwelling characteristics, sex, age, education, biomarkers
Census data form the basis of existing spatial demographic databases
19
20
, and such population and housing censuses are undertaken for almost all countries in the world, including developing countries, generally every 10 years (the date of past and upcoming planned censuses are available here:
http://unstats.un.org/unsd/demographic/sources/census/censusdates.htm), but these provide only population counts. A range of other population-attribute information is generally collected during population censuses such as age, gender, urban/rural residence, and migration information, and, for the majority of countries, made available in some form on national statistical office websites. This information supplies a series of single population characteristics at whatever level of geographic detail is made available by the National Statistical Office. Often, this information is available through data tables aggregated at coarse administrative levels, however, and full-detail datasets can be difficult to obtain. An addition to the aggregated full census data are large samples of household-level records derived from censuses (census microdata) that provide age and sex structure, as well as many other compositional measures, reported generally by administrative level 1 (e.g., province) or 2 (e.g., district). These data keep information about households intact so that combinations of variables can be made. The largest repository of such data is the International Public Use Microdata Series (
https://international.ipums.org/international/) and the data held there are mapped in Figure
F2 2(a).
Figure 2Maps showing the availability of useful demographic datasets for deriving subnational estimates of population attributes
Maps showing the availability of useful demographic datasets for deriving subnational estimates of population attributes. (a) Numbers of census microdata records maintained at the International Public Use Microdata Series repository (
https://international.ipums.org/international/), (b) combined numbers of Demographic and Health Surveys (DHS), Malaria Indicator Surveys (MIS), and AIDS Indicator Surveys (AIS) conducted for each country, (c) combined numbers of DHS, MIS, and AIS with GPS cluster coordinates available.
1478-7954-10-8-2
While census aggregates and census microdata samples are typically large enough to cover small or moderately sized geographic areas, they are only carried out approximately every 10 years and are limited in content. Survey data offer much richer content on shorter time intervals but are limited in spatial coverage (e.g., Figure
2(b)). In most high- and low-income countries, geo-referenced household-based surveys are collected on a regular basis. These surveys contain detailed local population characteristics for a finite number of locations, which could be used to generate characteristic-prevalence surfaces for a given country and year. Overlaying these characteristic surfaces with population estimates would likely become an invaluable tool both for researchers and policymakers.
Data on a rich variety of population attributes can be obtained from a range of international household survey programs, each of which is listed in Table
3. These provide subnational urban (or rural) age and sex structures, educational compositions, employment information, and countless other socioeconomic and health indicators at the level of subnational regions. Large household survey programs such as the Multiple Indicator Cluster Surveys (MICS) and Demographic and Health Surveys (DHS) listed in Table
3 are reasonably well standardized and cover many low-income countries (Figure
2(b), with multiple rounds in each country. Additionally, in most recent DHS data sets, the survey clusters have been geo-coded (Figure
2(c)). In order to protect the confidentiality of survey respondents, cluster locations are randomly displaced by up to 2 km in urban areas and 5 km in rural areas. Moreover, in rural areas, the DHS cluster locations provided can represent large and potentially heterogeneous areas. The existing approaches using DHS data have not taken advantage of spatial modeling to expand the use of DHS data below the survey region level (usually administrative level 1). The geocoded cluster data from the DHS allow for data regrouping to different levels of representativeness while still respecting the sample frame. While MICS now regularly collect geocoded data, these datasets are not available to enable mapping of the data at finer level than the survey region level.
Each of the datasets described here and listed in Table
3 report demographic information aggregated by named administrative units. Rarely, however, are spatial data on the boundaries of these units provided with the data. For spatial analysis, therefore, GIS boundary datasets must be found that match the reported administrative units. This is often a nontrivial task given regular boundary changes over time, alternative names, and mismatches with national boundaries. The initiation of open access repositories of standardized administrative boundary datasets (e.g., GADM:
http://www.gadm.org/), and documented histories of changes (e.g.,
http://www.statoids.com/) simplifies such operations. Moreover, DHS also shares the geography for their surveys on request.
Designing a spatial demographic database
The datasets described in the previous section are presently scattered across disparate sources (Table
3). To better fulfill the needs of disease modeling and cartographic-style derivations of health metrics, we propose the construction of a spatial database. The construction of this database would involve not only the housing of the disparate demographic datasets in a central open access location, but also their linkage to GIS datasets to enable the construction of spatial datasets representing a variety of epidemiologically relevant variables. The recent development of spatially enabling tools for database servers, such as PostGIS (
http://postgis.refractions.net), which provides support for geographic objects in object-relational databases, provides the ideal framework for construction of the database. The database would be hosted on a central server and accessed through an interactive web portal. Table
T4 4 outlines the spatial datasets that would be included in a database to spatially reference the datasets in Table
3 and provide additional information to increase mapping capabilities. The framework spatial data are open-access GIS datasets that can be reused by multiple organizations for different purposes. Figure
F3 3 outlines how these datasets link together in the relational spatial database. The key objectives of this database would be to:
indent 1. Provide disaggregated spatially-referenced data on population sizes and characteristics such as age, sex, urban/rural location, and education
2. Facilitate data sharing between differing platforms and demographic mapping projects
3. Provide a high degree of transparency, documentation, and flexibility with respect to data sources and the treatment of uncertainty
Table 4
Components of relational spatial demographic database based on freely available datasets
Feature
Example dataset
Example dataset source
National boundaries
SALB
http://www.unsalb.org
Administrative boundaries
GADM
http://www.gadm.org
DHS boundaries
MEASURE DHS
http://www.measuredhs.com
Coastlines
GBWD
http://dds.cr.usgs.gov/srtm/
Water bodies
SWDB
http://dds.cr.usgs.gov/srtm/version2_1/SWBD/
Land cover
GlobCover
http://www.ionia1.esrin.esa.int
Protected areas
WDBPA
http://www.wdpa.org
Urban extents
MODIS
http://www.sage.wisc.edu/people/schneider/research/data.html
Settlement locations
NGA Geonames
http://www.earth-info.nga.mil/gns/html
Elevation and slope
SRTM
http://www.srtm.csi.cgiar.org
Infrastructure
gRoads
http://www.ciesin.columbia.edu/confluence/display/roads/Global±Roads±Data
Figure 3Design of a relational spatial demographic database
Design of a relational spatial demographic database. Table
4 provides details on each layer.
1478-7954-10-8-3
The database is designed to encourage data sharing, built in a manner that can be replicated across different nodes, with standardized, agreed-upon representations. For example, whilst the GRUMP and AfriPop project outputs take different forms and use differing modeling techniques, each is built upon standard representations (national boundaries, coastlines, administrative units) and aim to use the most detailed and contemporary population data available. A standardized database framework would encourage sharing of new and improved datasets between projects, benefitting a range of user groups. By building in differing levels of access control, new datasets can be reviewed and processed before release to a wider user community, and also datasets that remain copyrighted can be controlled in their accessibility.
Documentation of all aspects of the data and database structure are key to ensuring ease of use, integration with epidemiological applications, and accessibility to a wide user community. This will focus around database version control, the development of a data dictionary, with full documentation of the datasets archived within it, and metadata accompanying the GIS-related datasets. The distinction between spatial metadata and a data dictionary must be made: they are much different and both are necessary. A data dictionary is needed to understand the shortened name and values of particular variables, for example, whereas the metadata speak to the spatial lineage and quality of the data. Some institutional or database history is sometimes warranted, for example, when data collection for a given variable has changed. Ensuring that this documentation is oriented toward the user through full explanation of assumptions made and quality issues with the data provided will be important. Moreover, the construction of a library of tools and techniques for analyzing and using the data within the database using a forum and other mechanisms, such as a code repository, will facilitate ease of use. Finally, the provision of quantitative measures of data quality and uncertainty will be of great importance. This could range from basic information of the timeliness, resolution, spatial uncertainty, and standard errors of datasets that enable informed interpretation of resulting mapped products to the more rigorous handling and measurement of uncertainties (see next section).
New methods, data, and future challenges
We have so far outlined datasets and a basic framework for compiling them to meet immediate disease modeling and health metric demographic needs. Several opportunities and possibilities exist however with which to supplement and improve the scope and accuracy of resources available. Here we outline these briefly, along with future challenges likely to be faced in implementation.
Urban populations
The health divide between urban and rural populations has been well documented
B54 54
, as have the increasing levels of urbanization around the world
B55 55
. While much has been done in the past 10 years to refine population data that delineates urban areas
19
B56 56
B57 57
, much less is known about populations within cities even within relatively coarse divisions (city center, suburban, peri-urban) or by slum dwellers and others (much also remains to be learned about population distribution within rural areas). Africa, in particular, will undergo rapid urbanization in the coming decades
55
, yet the data record to understand the demographic variation and health conditions in these cities, let alone changes in disease transmission that may result from urban change, is largely absent. If it is important to understand what is happening within urban areas, even the currently available cluster-level data of the DHS program (Figure
2(c)) is inadequate. While the DHS program and some other surveys have focused on collecting larger urban samples, there remains a need for large samples of urban populations to permit city-specific analyses. Additionally, the definition of urban is not standard across the DHS countries. There is reason to believe that even greater heterogeneity of health and socioeconomic characteristics exist within urban areas.
As with the demographic datasets discussed already, there are a variety of disparate spatial datasets on urban populations that could be brought together to get a better perspective. City population counts for cities with populations of 100,000 and above are produced by the UN Population Division World Urbanization Prospects
55
. Alternative sources of city and settlement population sizes include the City Population website (
http://www.citypopulation.de), while different projects are focused on mapping city extents, which these counts could be matched to (e.g.,
19
B58 58
,
http://www.afripop.org). There still exist significant gaps, however, such as time-series of urban spatial extents, which would facilitate the development of ways to forecast changes in urban extents. Also, information on properly defined neighborhoods within cities is important, such as within-household and within-neighborhood population density, but so are other contexts (e.g., schools).
Subnational spatial and temporal projections
Most low-income countries do not produce population projections, or forecasts, at a subnational level. Even the United Nations Population Division’s urban population projections
55
do not produce city-level population projections. Yet the demographic inputs for generating subnational estimates and projections are increasingly becoming available. Subnational projections are now being undertaken at least for very large countries (for example, India and China) and for small and large cities in the developing world
B59 59
. For the latter, the forecasting method departs from the traditional cohort-component method and instead uses longitudinal data on cities and subnational estimates of demographic rates (urban fertility, mortality, and migration) derived from survey and census microdata in an econometric model of city growth
B60 60
. These new approaches depend on harnessing old data in a spatial framework. The spatial framework allows disparate units to be linked in new ways, yielding new estimates and projections. Because these methods are largely probabilistic and derived from modeling exercises, the uncertainty associated with these estimates should also be characterized.
Quantifying uncertainty
The variety of ages, spatial resolutions, and sample sizes of input demographic data translates to great variations in accuracies and uncertainties of any output gridded demographic data products, and this is rarely acknowledged
1
. The most basic level of quantification and communication of this uncertainty to users involves the provision of information on input datasets and methods used in construction, such as is undertaken for GPW, GRUMP
19
, and AfriPop
21
(
http://www.afripop.org). Ideally, a more rigorous quantification of the uncertainty inherent in output gridded demographic datasets should be undertaken. The rigorous handling and propagation of uncertainty through a mapping process is now regularly undertaken in disease risk mapping within a Bayesian framework (e.g.,
2
16
17
), resulting in full posterior prediction distributions for each grid cell, providing flexibility in the derivation of differing uncertainty metrics and enabling the production of accompanying uncertainty maps. Undertaking an equivalent approach for deriving accompanying uncertainty maps for demographic datasets would require consideration of the input datasets and output requirements. For instance, this could take the form of estimating the uncertainty in gridded population distribution mapping from census data summarized by administrative unit. Here, two of the major sources of uncertainty are the age of the census data in relation to the output prediction year and the size of the administrative units relative to the population sizes within them and output grid cell size. While uncertainty in temporal projection of census data is relatively well studied, the spatial aspect remains unexplored. For example, gridding a 50,000 kmsup 2 administrative unit containing 1 million children under age 5 to 30 arc second resolution results in greater uncertainty about the population size and composition residing in each grid square than does the same resolution gridding of a 1,000 km2 administrative unit containing 10,000 children under age 5. Based on simulating all possible permutations of grid square composition, bound by the limits imposed by the original administrative unit size and vulnerable population totals, per-grid square measures of spatial uncertainty in composition that relate directly to the gridding methodology could be derived. Secondly, the availability of the DHS cluster locations opens up the possibility of estimating surfaces of variables with associated uncertainty, and this is discussed below.
Gridding household survey data
The availability of the GPS coordinates of DHS clusters (Figure
2(c)) has prompted several studies to utilize geostatistical approaches to derive continuous estimated surfaces of variables of interest. However, survey data are generally collected to be nationally representative and, as such, their sampling frames may not lend themselves to finely resolved geographic grids. The DHS program has been a leader in collecting and providing geocoded information of the survey clusters in addition to their standard data files in which data can be tabulated by first-order subnational regions as well as urban/rural classification. Early examples have demonstrated the value of such approaches for deriving continuous maps of variables of interest from geolocated DHS cluster data. For example, Gemperli et al.
B61 61
investigated spatial patterns of malaria endemicity as well as socio-economic risk factors on infant mortality in Mali using a Bayesian hierarchical geostatistical model. Meanwhile, Soares and Clements
16
used a similar approach for anemia mapping. However, these approaches did not take account of the DHS sampling design or the random spatial displacement that cluster data undergo, and overcoming these issues should be a priority for future applications
B62 62
. Apart from utilizing the cluster location, the subnational regions supply information that can be used with more finely resolved grids. One approach that uses spatial coverage of census aggregates combined with the attribute breadth in survey data is that of the Poverty Mapping efforts
B63 63
. But this is not necessarily the only approach to consider.
Migration and mobility mapping
Very little is known about migration and mobility within countries, which may occur seasonally and periodically as well as permanently, except through case studies and qualitative place-specific analyses. Disease modeling and health metric derivation, as well as demographic analyses, increasingly require information on migration and mobility
B64 64
B65 65
B66 66
. These data are the weak link of the demographic record – even the stock estimates of subnational migration have been largely ignored. Disease modelers often want to know about daily movements rather than decadal ones, but the decadal moves may be important to evaluate for changes to place-specific vulnerability of residents. Decadal moves should be examined more closely with existing survey and census microdata; characterizing more frequent moves will require data collection methods that depart from the standard demographic tool kit. Use of new data, such as spatial locations derived from GPS tracking devices
B67 67
and cell phone usage
B68 68
may show promise. However, to be useful, methods for using these data in combination with more standard demographic data will be necessary.
Conclusions
Growing trends in research and funding for disease mapping and spatial modeling to derive health metrics and guide strategies are increasing needs for spatial demographic data of similar scope and quality for use in estimating sizes and characteristics of populations at risk. However, existing spatial demographic databases are often based on coarse resolution and outdated input and lack any consideration of population attribute mapping. These drawbacks are likely contributing to substantial uncertainties in disease modeling and health metric outputs
1
. Here we have shown that datasets to rectify this exist but remain scattered across multiple repositories and websites, requiring collation into a central open-access database to become more widely used and build on the strengths of each data type, overcoming temporal, spatial, and attribute limitations. We have put forward a basic database design here to achieve this and lay the foundations for undertaking detailed mapping of population attributes for providing spatial demographic data in disease studies.
Competing interests
The authors declare that they have no competing interests.
Author's contributions
AJT conceived and designed the manuscript. All authors contributed to writing the manuscript and have read and approved the final manuscript.
bm
ack
Acknowledgments
AJT is supported by a grant from the Bill & Melinda Gates Foundation (#49446), which also supports DP. AJT acknowledges funding support from the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health. CL is supported by a grant from the Fondation Philippe Wiener Maurice Anspach. This paper is the result of a working group meeting held in May 2011 in New York, funded by the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health. AD received financial support from the National Institute for Child Health and Human Development, Grant No. R24HD047879. AMN is supported by a Wellcome Trust Intermediate Research Fellowship (##095127). SA is supported in part by NASA under contract NNG08HZ11C for the continued operation of the Socioeconomic Data and Applications Center (SEDAC). CRB is supported by the US Agency for International Development-funded MEASURE Demographic and Health Survey.
refgrp The effects of spatial population dataset choice on estimates of population at risk of diseaseTatemACampizNGethingPSnowRLinardCPopul Health Metrics20119410.1186/1478-7954-9-4Bayesian geostatistics in health cartography: the perspective of malariaPatilAPGethingPWPielFBHaySITrends Parasitol201127246lpage 25310.1016/j.pt.2011.01.003pmcid 3109552link fulltext 21420361Large-scale spatial-transmission models of infectious diseaseRileySScience20073161298130110.1126/science.113469517540894Where is the meningitis belt? Defining an area at risk of epidemic meningitis in AfricaMolesworthAMThomsonMCConnorSJCresswellMPMorseAPShearsPHartCACuevasLETrans R Soc Trop Med Hyg20029624224910.1016/S0035-9203(02)90089-112174770World malaria map: Plasmodium falciparum endemicity in 2007HaySIGuerraCAGethingPWPatilAPTatemAJNoorAMKabariaCWManhBHElyazarIRFBrookerSJetal PLoS Med20096e1000048265970819323591Environmental reservoirs of Vibrio cholerae and their role in choleraVezzulliLPruzzoCHuqAColwellRREnviron Microbiol Rep20102273310.1111/j.1758-2229.2009.00128.xGlobal trends in emerging infectious diseasesJonesKEPatelNGLevyMAStoreyguardABalkDGittlemanJLDaszakPNature200845199099410.1038/nature0653618288193Synchrony, waves, and spatial hierarchies in the spread of influenzaViboudCBjornstadONSmithDLSimonsenLMillerMAGrenfellBTScience200631244745110.1126/science.112523716574822Standardizing estimates of the Plasmodium falciparum parasite rateSmithDLGuerraCASnowRWHaySIMalar J2007613110.1186/1475-2875-6-131207295317894879Age and clinical dengue illnessEggerJRColemanPGEmerg Infect Dis20071392492510.3201/eid1306.070008279285117553238Consequences of confirmed maternal rubella at successive stages of pregnancyMillerECradock-WatsonJEPollockTMLancet198227817846126663Demographic variability, vaccination, and the spatiotemporal dynamics of rotavirus epidemicsPitzerVEViboudCSimonsenLSteinerCPanozzoCAAlonsoWJMillerMAGlassRIGlasserJWParasharUDGrenfellBTScience200932529029410.1126/science.1172330301040619608910Is cholera disease associated with poverty?TalaveraAPerezEMJ Infect Dev Ctries2009340841119762952Malnutrition, disease and outcomeAllisonSPNutrition20001659059310.1016/S0899-9007(00)00368-310906565Estimating the number of paediatric fevers associated with malaria infection presenting to Africa's public health sector in 2007GethingPWKiruiVCAleganaVAOkiroEANoorAMSnowRWPLoS Med20107e100030110.1371/journal.pmed.1000301289776820625548Mapping the risk of anaemia in preschool-age children: the contribution of malnutrition, malaria and helminth infections in West AfricaSoares MagalhaesRJClementsACAPLoS Med20118e100043810.1371/journal.pmed.1000438311025121687688Geostatistical model-based estimates of schistosomiasis prevalence among individuals aged <20 years in West AfricaSchurNHurlimannEGarbaATraoreMSNdirORatardRCTchuenteLTKristensenTKUtzingerJVounatsouPPLoS Negl Trop Dis20115e119410.1371/journal.pntd.0001194311475521695107Transforming population data for interdisciplinary usages: from census to gridDeichmannUBalkDYetmanG2001Documentation for GPW Version 2 available only at
http://sedac.ciesin.columbia.edu/plue/gpw/GPW documentation.pdfDetermining global population distribution: methods, applications and dataBalkDLDeichmannUYetmanGPozziFHaySINelsonAAdv Parasitol200662119156315465116647969LandScan: a global population database for estimating populations at riskDobsonJEBrightEAColemanPRDurfeeRCWorleyBAPhotogramm Eng Remote Sens200066849857Population distribution, settlement patterns and accessibility across Africa in 2010LinardCGilbertMSnowRWNoorAMTatemAJPLoS One20127e3174310.1371/journal.pone.0031743328366422363717cnm United Nations Population Division World population prospects, 2010 revision publisher New York: United Nations2010Improving imperfect data from health management information systems in Africa using space-time geostatisticsGethingPWNoorAMGikandiPWOgaraEAAHaySINixonMSSnowRWAtkinsonPMPLoS Med20063e27110.1371/journal.pmed.0030271147066316719557Health Metrics Network Statistics save lives: Strengthening country health information systems Geneva: WHO Health Metrics Network2005Monitoring global health: Time for new solutionsMurrayCJLLopezADWibulpolprasertSBr Med J20043291096110010.1136/bmj.329.7474.1096Insights into the evolution and emergence of a novel infectious diseaseKubiakRJArinaminpathyNMcLeanARPLoS Comput Biol20106e100094710.1371/journal.pcbi.1000947294797820941384Tools from ecology: useful for evaluating infection risk models?BrookerSHaySIBundyDATrends Parasitol200218707410.1016/S1471-4922(01)02223-1316684811832297Strategies for containing an emerging influenza pandemic in Southeast AsiaFergusonNMCummingsDATCauchemezSFraserCRileySMeeyaiAIamsirithawornSBurkeDSNature200543720921410.1038/nature0401716079797Estimating the global clinical burden of Plasmodium falciparum malaria in 2007HaySIOkiroEAGethingPWPatilAPTatemAJGuerraCASnowRWPLoS Med20107e100029World Health Organization The World Malaria Report Geneva: World Health Organization2008Estimating trends in the burden of malaria at country levelCibulskisREBellDChristophelEMHiiJDelacolletteCBakyaitaNAregawiMWAmJTrop Med Hyg200777133137Large-scale spatial population databases in infectious disease researchLinardCTatemAJInt J Heal Geogr201211710.1186/1476-072X-11-7JohanssonEWNewbyHRenshawMWardlawT Malaria and children. progress in intervention coverage New York: United Nations Children's Fund (UNICEF)/The Roll Back Malaria Partnership (RBM)2007Geographical patterns and predictors of malaria risk in Zambia: Bayesian geostatistical modelling of the 2006 Zambia national malaria indicator survey (ZMIS)RiedelNVounatsouPMillerJMGosoniuLChizema-KaweshaEMukonkaVSteketeeRWMalar J201093710.1186/1475-2875-9-37284558920122148The international limits and population at risk of Plasmodium vivax transmission in 2009GuerraCAHowesREPatilAPGethingPWVan BoeckelTPTemperleyWHKabariaCWTatemAJManhBHElyazarIRFPLoS Negl Trop Dis20104e77410.1371/journal.pntd.0000774291475320689816The limits and intensity of Plasmodium falciparum transmission: implications for malaria control and elimination worldwideGuerraCAGikandiPWTatemAJNoorAMSmithDLHaySISnowRWPLoS Med20085e3810.1371/journal.pmed.0050038225360218303939The potential of rapid screening methods for Schistosoma mansoni in western KenyaBrookerSMiguelEWaswaPNamunyuRMoulinSGuyattHBundyDAnn Trop Med Parasitol20019534335110.1080/0003498012006343711454244Use of remote sensing and a geographical information system in a national helminth control programme in ChadBrookerSBeasleyMNdinaromtanMMadjiouroumEMBaboguelMDjenguinabeEHaySIBundyDABull World Health Organ200280783789256766012471398Epidemiology and geography of Schistosoma mansoni in Uganda: implications for planning controlKabatereineNBrookerSTukahebwaEKazibweFOnapaATrop Med Int Health2004937210.1046/j.1365-3156.2003.01176.x14996367Use of Bayesian geostatistical prediction to estimate local variations in Schistosoma haematobium infection in western AfricaClementsACAFirthSDembeleRGarbaAToureSSackoMLandoureABosque-OlivaEBarnettAGBrookerSFenwickABull World Health Organ20098792192910.2471/BLT.08.058933278936120454483The co-distribution of Plasmodium falciparum and hookworm among African schoolchildrenBrookerSJClementsACAHotezPJHaySITatemAJBundyDAPSnowRWMalar J200659910.1186/1475-2875-5-99163572617083720Spatial modelling of soil-transmitted helminth infections in Kenya: a disease control planning toolPullanRLGethingPWSmithJLMwandawiroCSSturrockHJGitongaCWHaySIBrookerSPLoS Negl Trop Dis20115e95810.1371/journal.pntd.0000958303567121347451Global epidemiology, ecology and control of soil-transmitted helminth infectionsBrookerSClementsACBundyDAAdv Parasitol200662221261197625316647972Hookworm-related anaemia among pregnant women: a systematic reviewBrookerSHotezPJBundyDAPLoS Negl Trop Dis20082e29110.1371/journal.pntd.0000291255348118820740Quantifying the number of pregnancies at risk of malaria in 2007: a demographic studyDellicourSTatemAJGuerraCASnowRWter KuileFOPLoS Med20107e100022110.1371/journal.pmed.1000221281115020126256Coverage of malaria protection in pregnant women in sub-Saharan Africa: a synthesis and analysis of national survey datavan EijkAHillJAleganaVKiruiVGethingPter KuileFSnowRLancet Infect Dis20111119020710.1016/S1473-3099(10)70295-4311993221273130The spatial distribution of leprosy cases during 15 years of a leprosy control program in Bangladesh: an observational studyFischerEPahanDChowdhurySRichardusJBMC Infect Dis2008812610.1186/1471-2334-8-126256493418811971HIV and AIDS in Africa: a geographic analysis at multiple spatial scalesKalipeniEZuluLCGeoJournal201010.1007/s10708-010-9358-6Vaccination strategies for epidemic cholera in Haiti with implications for the developing worldChaoDLHalloranMELonginiIMsuf JrProc Natl Acad Sci U S A20111087081708510.1073/pnas.1102149108308414321482756Strategies for mitigating an influenza pandemicFergusonNMCummingsDATFraserCCajkaJCCooleyPCBurkeDSNature200644244845210.1038/nature0479516642006Influenza epidemic spread simulation for Poland a large scale, individual based model studyRakowskiFGruzielMBieniasz-KrywiecLRadomskiJPPhysica A: Statistical Mechanics and its Applications20103893149316510.1016/j.physa.2010.04.029Modeling and analysis of global epidemiology of avian influenzaRaoDMChernyakhovskyARaoVEnviron Model Softw20092412413410.1016/j.envsoft.2008.06.011Multiscale mobility networks and the spatial spreading of infectious diseasesBalcanDColizzaVGoncalvesBHuHRamascoJJVespignaniAProc Natl Acad Sci2009106214842148910.1073/pnas.0906910106279331320018697Health and urban livingDyeCScience200831976676910.1126/science.115019818258905United Nations Population DivisionWorld urbanization prospects, 2009 revisionNew York: United Nations2009Measuring urbanization pattern and extent for malaria research: a review of remote sensing approachesTatemAJHaySIJ Urban Health20048136337610.1093/jurban/jth124317384115273262Assessing the accuracy of satellite derived global and national urban maps in KenyaTatemAJNoorAMHaySIRemote Sens Environ200596879710.1016/j.rse.2005.02.001335006822581985Mapping global urban areas using MODIS 500-m data: New methods and datasets based on 'urban ecoregions'SchneiderAFriedlMAPotereDRemote Sens Environ20101141733174610.1016/j.rse.2010.03.003 Mapping urban settlements and the risks of climate change in Africa, Asia and South America BalkDMontgomeryMMcGranahanGKimDMaraVToddMBuettnerTDorelienAPopulation dynamics and climate changeNew York: UNPDeditor Martine G, Guzman J-M, McGranahan G, Schensul D, Tacoli C200988103KimD Econometric modeling of city population growth in developing countries New York: State University of2011Spatial patterns of infant mortality in Mali: the effect of malaria endemicityGemperliAVounatsouPKleinschmidtIBagayokoMLengelerCSmithTAm J Epidemiol2004159647210.1093/aje/kwh00114693661Spatial modeling of geographic inequalities in child mortality across NepalChinBMontanaLBasaganaXHealth Place20111792993610.1016/j.healthplace.2011.04.00621555234Micro-level estimation of poverty and inequalityElbersCLanjouwJLanjouwPEconometrica20037135538610.1111/1468-0262.00399Population movements and tropical healthProtheroRMGlobal Change and Human Health20023203210.1023/A:1019636208598The role of human movement in the transmission of vector-borne pathogensStoddardSMorrisonAVazquez-ProkopecGPaz-SoldanVKochelTKitronUElderJScottTPLoS Negl Trop Dis20103e481International population movements and regional Plasmodium falciparum malaria elimination strategiesTatemAJSmithDLProc Natl Acad Sci2010107122221222710.1073/pnas.1002971107290144620566870Assessing and Maximizing the Acceptability of GPS Device Use for Studying the Role of Human Movement in Dengue Virus Transmission in Iquitos, PeruPaz-SoldanVStoddardSVazquez-ProkopecGMorrisonAElderJKitronUKochelTScottTAmJTrop Med Hyg201082723730The use of mobile phone data for the estimation of the travel patterns and imported Plasmodium falciparum rates among Zanzibar residentsTatemAQiuYSmithDSabotOAliAMoonenBMalar J2009828710.1186/1475-2875-8-287280011820003266