Citation
High-resolution reconstruction of the United States human population distribution, 1790 to 2010

Material Information

Title:
High-resolution reconstruction of the United States human population distribution, 1790 to 2010
Series Title:
Scientific Data
Creator:
Jawitz, James
Publisher:
Nature Publishing Group
Publication Date:
Language:
English
Physical Description:
Journal Article

Notes

Abstract:
Where do people live, and how has this changed over timescales of centuries? High-resolution spatial information on historical human population distribution is of great significance to understand human-environment interactions and their temporal dynamics. However, the complex relationship between population distribution and various influencing factors coupled with limited data availability make it a challenge to reconstruct human population distribution over timescales of centuries. This study generated 1-km decadal population maps for the conterminous US from 1790 to 2010 using parsimonious models based on natural suitability, socioeconomic desirability, and inhabitability. Five models of increasing complexity were evaluated. The models were validated with census tract and county subdivision population data in 2000 and were applied to generate five sets of 22 historical population maps from 1790–2010. Separating urban and rural areas and excluding non-inhabitable areas were the most important factors for improving the overall accuracy. The generated gridded population datasets and the production and validation methods are described here.
Acquisition:
Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by James Jawitz.

Record Information

Source Institution:
University of Florida Institutional Repository
Holding Location:
University of Florida
Rights Management:
All rights reserved by the submitter.

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

DataDescriptor: High-resolution reconstructionoftheUnitedStates humanpopulationdistribution, 1790 to 2010YuFang1&JamesW.Jawitz1Wheredopeoplelive,andhowhasthischangedovertimescalesofcenturies?High-resolutionspatial informationonhistoricalhumanpopulationdistributionisofgreatsigni cancetounderstandhumanenvironmentinteractionsandtheirtemporaldynamics.However,thecomplexrelationshipbetween populationdistributionandvariousin uencingfactorscoupledwithlimiteddataavailabilitymakeita challengetoreconstructhumanpopulationdistributionovertimescalesofcenturies.Thisstudygenerated 1 -kmdecadalpopulationmapsfortheconterminousUSfrom 1790 to 2010 usingparsimoniousmodels basedonnaturalsuitability,socioeconomicdesirability,andinhabitability.Fivemodelsofincreasing complexitywereevaluated.Themodelswerevalidatedwithcensustractandcountysubdivisionpopulation datain 2000 andwereappliedtogenerate vesetsof 22 historicalpopulationmapsfrom 1790 – 2010 Separatingurbanandruralareasandexcludingnon-inhabitableareaswerethemostimportantfactorsfor improvingtheoverallaccuracy.Thegeneratedgriddedpopulationdatasetsandtheproductionand validationmethodsaredescribedhere.DesignType(s) dataintegrationobjective  source-baseddataanalysisobjective MeasurementType(s) PopulationDistributionCharacteristic TechnologyType(s) computationalmodelingtechnique FactorType(s) temporal_interval SampleCharacteristic(s) Homosapiens  contiguousUnitedStatesofAmerica  StateofCalifornia  StateofMaryland  population  waterbody  protectedarea  elevation  densesettlementbiome1SoilandWaterSciencesDepartment,UniversityofFlorida,Gainesville,Florida 32611 ,USA.Correspondenceand requestsformaterialsshouldbeaddressedtoJ.W.J.(email:jawitz@u .edu).OPENReceived: 29 September 2017 Accepted: 28 February 2018 Published: 24 April 2018 www.nature.com/scientificdata SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 671

PAGE 2

Background&SummaryHumanactionshavecausedsubstantialalterationstotheEarth,transformingthelandscape,affecting ecosystempatternsandprocesses,drivingbiodiversityloss,alteringglobalhydrologicaland biogeochemicalcycles,amplifyingresourceexploitationandenvironmentaldeterioration,and contributingtoclimatechange1 – 3.Suchecologicalandsocietalconsequencesvaryacrossspace.Human populationdensityisconsideredtobeausefulindicatorofthetypeandintensityofthehuman environmentinteraction,withhigherpopulationdensityleadingtohigherlevelsofimpacts4 – 7.Therefore, itisvitaltocreatereliablespatiallyexplicit,high-resolutionestimatesofthehumanpopulation distributiontoadvanceourunderstandingofcoupledhumanandnaturalsystems,toprovidesupportto policydecision-making,andtoachieveecologicalandsocioeconomicsustainability8. Censusdatahavebeenroutinelycollectedandapplied,however,suchdataareascribedtode ned administrativeunits,leadingtoabruptchangesinpopulationattheadministrativeboundary,and maskingofspatialheterogeneitywithinadministrativeunits9,10.FortheconterminousUS,censusdata areavailableatthecountylevelfrom1790to2010(ref.11).Thesedatacouldprovidestrongsupportfor improvedunderstandingofhuman-environmentinteractions,ifmorere nementwaspossiblewithinthe administrativeunits.Dasymetricmappingisanarealinterpolationtodisaggregatedatafromasetofareal unitsinto nerunitsusingancillarydata12.Themodernrevolutioningeospatialdataavailabilitygreatly facilitatesdasymetricmappingandthecreationofmoreaccuratedataonpopulationdistribution13, includingtheGriddedPopulationoftheWorld(GPW)14,theGlobalRuralUrbanMappingProject (GRUMP)15,LandScan16,andWorldPop17. Arecentchallengehasbeentomapthehumanpopulationdistributioninlowincomenations,where detailedupdatedcensusdataorhighresolutiongeospatialdataarelacking18 – 21.Similarchallengesfrom insuf cientdataavailabilityoccurinestimatinghistoricalpopulationdistributions.Thehighresolution mappingeffortsdescribedabovewerealldevelopedbasedonmoderncensusdatasince1990,andareof limitedutilityforlong-term(e.g.,multi-decadal)dynamicanalyses22,23.Anotableexceptionisthe HistoryDatabaseoftheGlobalEnvironment(HYDE)24.WhilethetemporalrangefromHYDEisvast (from1000BCto2005AD),the10-kmspatialresolutionisrelativelylow,constrainedbyinput censusdata. Majorin uencingfactorsthathavebeenconsideredinhumanpopulationmappingincludeland cover,nightlights,topography,urbanareas,androads25 – 27.Landuse/landcoverdata,derivedfrom remotesensingimages,bestre ectpopulationdensityandhavebeensuggestedasanimportantsource forhumanpopulationmapping19,28.However,suchimagerydataonlybecameavailablebeginningwith thelaunchofthe rstlandsatellite,Landsat-1,in1972.Severalgloballandcoverdatasets29 – 33haveused censusdataasaninputoraproxyforhumanactivitiestosimulatethespatialmapofanthropogenically managedland(e.g.cropland,pasture).However,itwouldbecircularandinfeasibletousesuchderived landcoverdatatomodelpopulationdistribution. Thegoalofthisstudyistogeneratespatiallyexplicithumanpopulationdistributionmapsforthe conterminousUSthatcouldbeusedtoadvancestudiesofanthropogeniceffectsontheenvironment,and toprovidesupportforpolicydecision-making.Recognizingthatthespatialresolutionofcensus administrativeboundariesistheprincipalfactoraffectingmapaccuracy21,weusecounty-levelcensus data,asthisisthehighestresolutiondataconsistentlyavailablefrom1790to2010.Despitethescarcityof reliablehistoricallandcovermaps,theseparationofurbanandruralsettlementareascouldsigni cantly improvetheaccuracyofpopulationdistributionmapping19,since80.7%oftheUSpopulationlivedin urbanareas,whichcoveredonlyabout3.1%ofthetotallandarea,accordingtothe2010UScensus34. Power-lawscalingrelationships35betweenurbanareaandpopulationwereappliedtoestimatehistorical urbanareas.Additionaldataincludingelevation,waterbodies,andprotectedareaswereusedtoallocate populationtourbanandruralareaswithincounties.Fivemodelsofincreasingcomplexity,fromone (M1)toeightvariables(M5),weredevelopedtomodelpopulationdistribution.Themodelswere validatedwithmeasureddatafrom2000basedoncomparisonwithcensusdataatthetractandcounty subdivisionlevels,andappliedtogenerate vesetsof22historicaldecadalpopulationmapsfrom 1790 – 2010(HistoricalpopulationdatasetfortheconterminousUS,DataCitation1).Themodelgeneratedurbanextentswerealsoassessedintwofast-growingregionsinwhichhistoricaldatawere available36,37:SanFrancisco/SacramentoandBaltimore/WashingtonDC.MethodsDatacollectionCensusdataatfourlevelsofspatialresolution,mainlyobtainedfromtheNationalHistoricalGeographic InformationSystem(NHGIS)(https://www.nhgis.org/)38,wereused:totalandurbanpopulationat countylevel(1790 – 2010),countysubdivisionpopulation(1980 – 2010),censustractpopulation (1990 – 2010),andpopulationforurbanareas(2000and2010).NotethatAmericanIndianswerenot fullyincludedinthecensuspriorto1900(ref.39).Ourresultsthusunderestimatedpopulationswhere AmericanIndianssettled.County,countysubdivision,censustract,andurbanareaboundaryshape les werealsoobtainedfromNHGIS.Censusdivisionboundariesandpopulationforselectedurbanareasin earlierdecadeswereobtainedfromtheUSCensusBureau(Fig.1). Waterbodydata(mediumresolution,at1:100000-scale),includinglake/pondfeatures,swamps/ marshes,reservoirs,playas,estuaries,andicemass,werecompletedin2001andderivedfromthe www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 672

PAGE 3

NationalHydrographyDataset(http://nhd.usgs.gov/).Protectedareasdata(version1.3),designatedto preservebiologicaldiversityandothernatural,recreationandculturaluses,werereleasedin2012and obtainedfromNationalGapAnalysisProgram(http://gapanalysis.usgs.gov/padus/).Digitalelevation modeldata(Fig.2)werereleasedin2013andderivedfromNASAShuttleRadarTopographyMission Version3.0(http://www2.jpl.nasa.gov/srtm/),witharesolutionof90m.Historicaldataonurbanspatial extentsforSanFrancisco/SacramentoandBaltimore/WashingtonDCregion36,37werefromtheUSGS LandCoverInstitute(LCI).LevelofspatialunitsTheabovedatawereusedtocomputepopulationforeachinhabitedpixel( k )ineachdecade( t )from 1790to2010(excluding1960).Thedatawerevariouslyappliedoverthefollowingspatialunits:urban area( ),censustract( c1),countysubdivision( c2),county( i ),censusdivision( ),andRegion( ).There were9censusdivisions( :D1 – D9)and7Regions( :R1 – R7)withconstantboundariesovertime, whilethenumberofotherspatialunitsvariedwiththeexpansionofhumansettlementsandthe modi cationofadministrativeboundaries.Forexample,thenumberofcountiesincreasedfrom292in 1790to3109in2010.Also,in2010,therewere7,754146pixels(1km2),72,271censustracts,35,532 countysubdivisions,and3,535urbanareasintheconterminousUS.PopulationcountdeterminationDecennialcensusdataontotalpopulation( PT)andaggregatedurbanpopulation( PU)foreachurbanarea ( )determinedtheremainingruralpopulation( PR)bydifference,foreachcounty( i )from1790to2010 (excluding1960,forwhichdigitalurbanpopulationdataaremissing): PR PTP PU ; 1 Thenumberofcountiescontainingurbanpopulationincreasedfrom19in1790to2424in2010.For countieswithmorethanoneurbanarea,weobtainedthepopulationforindividualurbanareasfromthe USCensusBureau.Noteapopulationthresholdof2500wasusedforidentifyingurbanareas(https:// www.census.gov/geo/reference/ua/urban-rural-2010.html).ArealextentdelineationThe2000censusincludesfullcoverageofarealextent( AU)forallofthe3610urbanareas.However,areal extentsareincompletefor1990andlargelyunknowninpriordecades.The1990NHGISdatacoveronly 396largeurbanareaswithpopulationlargerthan50,000.Thus,the2000datawereusedasabaselinefor Figure1. Themajorspatialinputdataforthepopulationdownscalingmodels,andassociated administrativeunitboundaries(censusdivisionsD1toD9,state,county,andurbanareas). Un-inhabitable areasincludedesignatedprotectedareas,waterbodies,andhighlandswitheleviation,z,largerthan3500m. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 673

PAGE 4

historicalbackwardprojectionsofurbanarealextents.Thearealextentofeachurbanarea( )inacensus division( )priorto2000wasestimatedonthebasisofapowerlawscalingrelationshipwithits population PU, 35: AU ; PU ; ; for A 2 where and aretheproportionalitycoef cientandscalingfactorforcensusdivision estimatedby ttinglog( AU)andlog(PU)in2000accordingtoequation(2).Stronglinearrelationshipsbetweenlog( AU) andlog(PU)havebeenidenti edbasedonmodernsatelliteimageryandcensusdata40 – 43,andalso observedinancientsettlements,forexample,thePre-HispanicsettlementsintheBasinofMexico44.Our hypothesiswasthatthisscalingwasstableandcouldbeusedtoreconstructhistoricalsettlementpatterns. Empiricalstudies45haverevealedspatialvariationof values:0.375forEnglandandWales,0.914for Japan,1.38forChina,andlargerthan2/3forUSurbanizedareas(populationlargerthan50,000).Spatial variationof wasincorporatedherebydevelopingsuchrelationshipsforeachcensusdivision, However,thelackofcorresponding AUand PUdataforurbanareasbefore2000precludedfurther Figure2. Topo-demographyrelationships. TopographyandboundariesforregionsR0toR7forthe conterminousUS( a ),andrelationshipsbetweenlogpopulationdensityandcountymeanelevationin2000for thesevenregions( b ).Region4isdividedintothreesub-regionsbyelevation.Solidlinesarelinearregressions withslopem. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 674

PAGE 5

evaluationofthetemporalstationarityof .Thus,thescalingrelationshipswereconsideredstablefor simplicity,withconstantparametervaluesovertime.Theextentofruralareas( AR)withineachcounty( i ) wasdeterminedasthedifferencebetweenthecountytotalarea( AT)andtheaggregatedextentsofall urbanareasinthecounty: AR ATP AU ; 3 Twosimplifyingassumptionswerefurthermadetodelineatetheextentofeachurbanareaovertime:1) urbanizationwasmonotonic,withhistoricallyestablishedurbanareasstillextantin2000;2)urbanareas developedoutwardfromtheircenter,soregionsfarthestfromthecenterwouldurbanizelast.Theurban extentsforeachurbanareaineachdecadeweredelineatedbyconcentricallyshrinkinginwardtowards theircentermovingbackwardwithtimefrom2000,withtheremainderconsideredasruralareas.In uencecoef cientcalculationInhabitability,topographicsuitability,andsocio-economicdesirabilitywerethethreemajorfactors consideredtoin uencehumanpopulationdistribution.Thecalculationmethodforthein uence coef cient( w )ofeachfactorforeachpixel( k )isexplainedbelow.Notethein uencecoef cientsfor inhabitability( w0 k)andtopographicsuitability( w1 k)weresteadyovertime,whilethein uencecoef cient forsocio-economicdesirability( w2 k)changedwiththeurbanizationprocess. Inhabitability .Inhabitablezoneswerede nedheretoexcludeprotectedareas,waterbodieslargerthan 1km2,andareaswithelevationhigherthan3500m(Fig.1).Protectedareaswereregardedasnoninhabitableiftheirstatuswasde nedas “ Designated ” ,indicatinglegaloradministrativedecree,andif theirpublicaccesswasclassi edas “ Restricted ” or “ Closed ” .Notethatprotectedareasarenotconstant duringthetimeperiodofthestudy,andsomepopulationsweredisplacedinthecreationofprotected areasintheUS.Butthetotalnumberofsuchdisplacedpersonsisrelativelylowsuchthatweexpectthe impactonour nalresultstobeminor.Low-populationcensustractswerealsotreatedasnoninhabitableareas.Preliminaryanalysesrevealedthatarealweightingresultedinoverestimatesforlow populationtracts.Thissimpli cationwasfoundtoreducetheoverallmodelerrors(seeTable1forthe speci ccutofftractpopulationforeachdivision).Inhabitableareaswereassumedtobesteadyovertime. Thein uencecoef cientofinhabitability, w0 k,forpixel k wassetaszerofornon-inhabitableareasand oneforinhabitableareas. Naturalsuitability .Elevationhasarelativelylargervariationandbetterresolutionwithincounties comparedtoothernaturalfactors,includingtemperatureandprecipitation,andplaysanimportantrole inin uencingpopulationdistribution46.Weaggregatedthisdetailedelevationdatatocountylevel insteadofcensustractandcountysubdivisionbecausethelatterdataareonlyrecentlyavailableandare thususedhereonlyforvalidationpurposes.Althoughsuchaggregationmasksvariationwithincounties, especiallyforthewesternmountainousregions,wefoundthatavalidrelationshipdevelopedfroma perspectiveofregionalizationcouldgenerallyre ectthetopographicin uence.Wetestedtherelationship betweencountymeanelevationandpopulationdensityusinglinear,log,andlogisticfunctionsusingthe Rstatisticalsoftware.Wechoseelevationoverslopetorepresenttopographicsuitabilitybecausewefound thattheformerhadabetterlinearrelationship(largerR2)withpopulationdensity ln PDi mzi b; fori A 4 where i and denotecountyandRegionillustratedinFig.2, PDiand ziarepopulationdensityandmean elevationforcounty i ,and band mare ttingparametersforRegion DivisionMARE,whencutoffpopulation = cutoffpopulation(persons) 1000persons1500persons2000persons2500persons3000persons ConterminousUS1.191.101.071.151.30 D10.590.600.610.640.701000 D24.213.973.713.974.552000 D30.940.640.530.580.672000 D40.440.440.460.540.661500 D50.950.720.760.850.991500 D60.410.410.450.490.571000 D70.540.530.550.600.661500 D80.430.440.440.450.491500 D91.161.511.511.551.631000Table1.Determinationofcutoffpopulationbelowwhichlow-populationcensustractsweretreated asnon-inhabitableareasforeachdivision. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 675

PAGE 6

Wecalculatedthein uencecoef cientofelevation, w1 k,forpixel k fromthefollowingrelation w1 k emzk; fork A 5 wherezkiselevationforpixel k .Onlyinhabitableareaswereconsideredwhencalculatingpopulation density.Tominimizethenumberofmodelparameters,geographicallyadjacentcensusdivisionsand stateswithsimilar mvaluesfromequation(4)werecombinedintosevenregions( :R1 – R7,shownin Fig.2),thatuponintegrationstillretainedsigni cantrelationshipsbetween ziandIn( PDi).Tworegions correspondtocensusdivisions(R1andR2),oneincludesonlyonestate(R5,Florida),twocomprise multiplecontiguousstates(R3andR4),andtwoarecomposedofonecensusdivisionplusmultiple neighboringstates(R6andR7).Nosigni cantrelationshipwasfoundforColorado(R0,inFig.2),and thuspopulationwasnotweightedbytopographicsuitabilitythere.Signi cantnegativelinear relationshipsbetween ziandIn( PDi)werefoundforallregionsexcept100m o z o 300minR4 (Fig.2).ThepresenceofthecityofAtlantaatapproximately300mdisruptedthegeneralpatternthatwas foundintherestofthehotsoutheasterncoastalplain.Theslope, m ,variedamongregions,butall relationshipsweresigni cantwith p -value o 0.001. Socioeconomicdesirability .Weconsideredsocioeconomicdesirabilityforurban( U )andrural( R ) areasseparately.Urbanpopulationdensitydecreaseswithincreasingdistancefromtheurbancenter.This trendhasbeendescribedsimplywithanexponentialdecaymodel47,butwithgrowingattentiononfractal cities,aninversepowerfunctionhasbeenusedmorerecently48 – 50.Here,weusedtheinversepower functiontodescribehowthein uencecoef cientofsocioeconomicdesirability w2 kforurbanpixels ( k U )changesacrossspace. w2 k rk ; fork A U k A 6 where rk isradialdistancefromthecenterofurbanarea tothepixel k ,and isdensitygradientfor division .Weappliedthefollowingrelationsuggestedbyrecentstudies51,52tolinkparameter from socioeconomicdesirabilitywiththeurbanarea-populationscalingfactor fromequation(2) 2 2 7 Like valueswerecalculatedfordivisionsandregardedasconstantovertime. Forruralareas,proximitytoanurbancenterisadvantageousforeconomicdevelopment.The in uencecoef cientofsocioeconomicdesirabilityw2 kforruralpixels( k R )wasdeterminedusinga gravitymodelofmarketpotential53w2 k PN 1 P U ; r 2 k PN 1 1 r 2 k ; fork A R 8 where Nisthenumberofurbanareasthatarewithinthemaximumurbanin uencedistanceindecade t Dt( rkj D t),correspondingtodailypercapitatravelrangewhichexponentiallyincreaseddueto transporttechnologyevolutionfrom30min1790to100kmin2000(ref.54).Thegravitymodelwas adoptedhere,basedonitswideapplicationinre ectingtheaccessibilityofurbanmarkets55 – 57.PopulationmappingmodelsFivepopulationdistributionmodelsofincreasingcomplexityweredeveloped,usingthein uence coef cientsdescribedabove(normalizedtorangebetween0and1).M1simplyallocatedcensuscounty populationhomogeneouslywithincounties.M2separatedurbanandruralareas,andthen homogeneouslyallocatedurbanandruralpopulationwithinurbanandruralareas,respectively.M3 excludednon-inhabitableareas,includingwaterbodies,protectedareas,highlands,andlowpopulation censustracts.M4extendedModel3withtheadditionoftopographicsuitability.M5addedsocioeconomicdesirability.FromM3toM5,thepopulationdistributionmapswereobtainedbymultiplying thepopulationrasterbythenormalizedweightinggrid,witheachsubsequentmodelcontributingan additionalin uencecoef cienttothepreviousmodel.Themostcompletemodel(M5)includedall coef cientsasshowninthefollowingequation Pk w0 kw1 k sw2 k dPw0 kw1 k sw2 k dPZ; fork A Z U ; R 9 where Z indicatesurban( U )orrural( R )pixel,andtheexponents s and d weighttherelativeimportance oftopographicsuitabilityandsocioeconomicdesirabilityonpopulationmapping.Thelattertwo parameterswerecalibratedtoobtainthehighestmappingaccuracywithvaluesevaluatedbetween0.2and 3foreachdivision(Tables2and3).Equation9wasimplementedthroughModelBuilderinArcGISto produceourhistoricalpopulationdatasets.Themodelsweusedarepubliclyavailableandcanbefreely downloadedfrom gshare(HistoricalPopulationModels,DataCitation1).AccuracyassessmentAll vemodels,M1-M5,wereappliedtomaphistoricalpopulationfrom1790to2010(excluding1960) (DataCitation1).Thebestwaytoassesstheaccuracyofpopulationdistributionmapsistocompare www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 676

PAGE 7

themwithcensusdataata nerlevelthanwasusedforthemodelinput.Whilethecountylevelinput datawerealreadythe nestresolutionthatwasavailableconsistentlythroughoutthetemporalrangeof interest,higherresolutioncensustractdatawithfullcoverageoftheconterminousUSwereavailable from1990to2010,andthesewereusedasaprimaryreferencehere.Notethatthespatialresolutionof ourpopulationproductsprecludedvalidationforcensustractswitharealessthan1km2,which represented12.6%ofthetractsand9.7%ofthetotalpopulationfortheconterminousUSin2010.We supplementedthetractanalysiswithreferencedataatacoarserlevel,countysubdivision,from1980to 2010,toachieveamorecomprehensivevalidationandalsotoassesstheaccuracybasedonreferencedata atdifferentgeographicscales.Weagainusedonlycountysubdivisionswithareaslargerthan1km2(98.8%ofthecountysubdivisions,andgreaterthan99%oftheconterminousUSpopulation). Wegeneratedgriddedpopulationdatabasedoncountyleveldataandthenaggregatedpopulation gridsbycensustractsandcountysubdivisionstocomparethemtocensusdataatthecorrespondinglevel. Weappliedmeanabsoluterelativeerror(MARE)toassesstheoverallperformanceofthe2000 populationproductsfortheconterminousUSandeachcensusdivision MARE 1 NcXNcc 1 McOcOc 10 wheresubscript c isreferencedatageographicunit(censustractorcountysubdivision), Ncisthenumber ofevaluatedunitsintheconterminousUSorcensusdivision, Mcismodeledpopulationaggregatedfor unit c ,and Ocisobservedpopulationfromcensusdata. The vemodelswereassessedintermsofbothaccuracyandeffectiveness.WeusedMAREtocompare modelaccuracy,withlowerMAREvaluesindicatingbettermodelperformance.Modeleffectivenesswas determinedfromtheimportanceoftheaddedin uencefactorsbasedonthemagnitudeofMARE reductioncomparedtothepreviousmodel.Wemadefulluseoftheavailable ne-resolutioncensus populationdataandassessedtheaccuracyofthemodel-generatedhistoricalpopulationmapsforthe conterminousUSbasedoncensustractdatafrom1990to2010andcountysubdivisiondatafrom1980to 2010.Furthermore,weassessedtheaccuracyofourgeneratedhistoricalurbanextentsforselected regions.Thereiscurrentlynohistorical(urban)landusedatabasefortheUSAfrom1790topresent. However,theUSGSLCI36,37developedhistoricalurbanextentsfortwofast-growingregions:San Francisco/SacramentoandBaltimore/WashingtonDC.Thoseinvestigatorsmappedurbanlanduse changeandproducedmapsfrom1900to1996forSanFrancisco/Sacramentoandfrom1792to1992for Baltimore/WashingtonDC.Therefore,thehistoricalurbanextentsfortheseregionsreconstructedinthis studyfortheclosestcorrespondingyearswerecomparedtotheUSGSLCIresultstotestthevalidityof thescalingrelationshipsandtheurbanareadelineationmethod.Weusedtherelativeoverlapofurban pixelsbetweenourresultsandthosefromUSGSLCIasanaccuracyindicator.Notethattheminimum populationthresholdusedtode neurbanareasintheUSGSLCIstudieswas500people,butwe constrainedourcomparisontoincludeonlytheareaswithmorethan2500people,tobeconsistentwith ourthresholdforurbanareas. Finally,wenotethataccuracyassessmentsforpopulationproductsarecommonlyconductedusing partialdatasetsduetotheavailabilityofhigherresolutionreferencedata.Forexample,Gaughan etal.58usedonlyfoururbanareastovalidatepopulationproductsfortheentiremainlandofChina,and Sorichetta etal.18usedsixcountriesforaccuracyassessmentofpopulationproductsfor28countriesin LatinAmericaandtheCaribbean. DivisionMARE,whens = s 0.20.40.60.81.01.41.51.82.02.22.42.62.83.0 ConterminousUS1.011.011.011.011.011.011.011.021.021.031.031.041.051.05 D10.580.560.550.550.550.550.550.560.570.580.590.600.610.621.00 D23.743.693.643.603.573.523.493.453.433.413.393.373.363.353.00 D30.520.520.520.520.520.520.520.520.520.520.520.520.520.521.00 D40.450.450.450.450.450.450.450.450.460.460.460.460.460.461.00 D50.730.740.750.760.780.800.820.840.860.880.890.910.930.950.20 D60.410.410.410.410.410.410.410.410.410.410.410.420.420.421.00 D70.530.520.520.520.520.520.520.520.520.520.520.520.520.521.00 D80.680.680.680.680.680.680.690.690.700.700.710.710.720.721.00 D91.171.191.201.221.241.271.291.321.351.371.391.411.441.460.20Table2.Determinationofthetopographicalsuitabilityfactor,s,foreachdivision. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 677

PAGE 8

DataRecordsThereconstructedhigh-resolutionhistoricalhumanpopulationdatasetfortheconterminousUSfrom 1790to2010(excluding1960)areavailableon gshare,including vesetsofpopulationproductsderived fromM1toM5(HistoricalpopulationdatasetfortheconterminousUS,DataCitation1).Thedatacan bedownloadedinEsrigridformatforeachdecadeataresolutionof1km,withthevaluesrepresenting thehumanpopulationperpixelcell.TechnicalValidationPopulationmappingillustrationThespatialdetailsgeneratedbyM1toM5areillustratedinFig.3,withafocusonthenorthernpartofthe SouthAtlanticdivision(D5),whichexhibitssigni canttopographicalvariationfromthecoastalplainto theAppalachianPlateau,andincludesmanycitiesofdifferentsizes.Thecensustract(Fig.3b)andcounty subdivisiondata(Fig.3c)arecomparedtoall vemodeloutputs(Fig.3d).M1andM2produced homogeneouspopulationswithincounties,andurbanandruralareas,respectively.M3alsoproduced homogeneouspopulationswithinurbanandruralareas,buthadmorelowpopulationpixelsduetothe exclusionofnon-inhabitableareas.M4andM5generatedmorespatialdetailswithinurbanandrural areasforeachcounty.Withinputfrombothtopographicsuitabilityandsocio-economicdesirability,M5 producedmoredetailedinformationconsistentwiththecensustractpopulationdistribution,notonlyfor thelessdenselypopulatedzoneinthenortheastbutalsothehighpopulationregionsinthesouthwest.PopulationmappingmodelaccuracyCensustractpopulationdatawasconsideredastheprimaryreferenceforvalidationduetoitshigher resolution,withmeanareaandpopulationof120km2and4310personsin2000.Basedonourvalidation usingcensustractpopulationforyear2000(Fig.4a)fortheconterminousUS,accuracyincreasedwith modelcomplexityfromM1toM5,withMAREdecreasingfrom6.96(M1)to1.54(M5),indicatingM5as themostaccuratemodel.Bycensusdivision( :D1-D9),modelaccuracyshowedsimilarimproving trendsfromM1toM5,exceptM2forD4andD9.ThemodelperformedworstforD2,however,this divisionalsoshowedimprovedperformancewithincreasedmodelcomplexity. Validationusingcountysubdivisionpopulationdata(meanareaandpopulationof221km2and7932 persons)resultedinsimilaroverallresults,butwithhigheraccuracy,indicatedbysmallerMAREvalues forModels1to5(Fig.4b).TheMAREdecreasedfromM1(3.11)toM4(1.22),thenincreasedmarginally (1.23)forM5,showingM4asthemostaccuratemodel.Similartrendswerefoundformostdivisions, exceptD2andD5,withM3asthemostaccuratemodel,andalsoD3,wheremodelaccuracydecreased fromM2(1.14)toM5(1.22).NotethatsmallornodifferencesinMAREwerefoundbetweenthemost accuratemodelsandM5forthesedivisions.D1andD8presentedworsemodelperformancethanthe otherdivisions.PopulationmappingmodeleffectivenessTheimprovedaccuracyfoundabovewasachievedbytheincreasedcomplexityfromM1toM5.Models1 to5increasedincomplexitybyaddingadditionalinputdata.M1requiredonlyonedataset(countytotal population, PT),M2requiredtheadditionofurbanpopulation( PU)andextentofurbanareas( AU)fora totalof3variables,M3addedinhabitableareafor4variables,M4alsoincludedtopographyandits relativeimportance( s )for6variables,andM5addedsocioeconomicdesirabilityanditsrelative importance( d ),culminatingin8variables. Census-tract-basedvalidationfortheconterminousUS(Table4)showedthatM3and2resultedinthe largestreductioninMAREcomparedtotheirnextlower-complexitymodels:3.58and1.53,respectively, revealingtheimportanceofseparatingurbanandruralareasaswellasexcludingnon-inhabitableareas. DivisionMARE,when d = d 0.020.040.060.080.100.200.400.600.801.001.201.401.601.802.00 ConterminousUS0.9660.9660.9660.9670.9670.9680.9720.9790.9870.9981.0101.0251.0421.0611.081 D10.5460.5460.5450.5450.5450.5450.5470.5510.5560.5630.5720.5810.5920.6030.6160.20 D23.3533.3593.3653.3713.3783.4103.4793.5523.6323.7173.8093.9074.0114.1234.2410.02 D30.5180.5170.5170.5170.5160.5150.5130.5130.5150.5190.5250.5320.5400.5490.5600.40 D40.4520.4510.4510.4510.4510.4490.4480.4480.4490.4520.4560.4620.4680.4760.4850.60 D50.7290.7280.7270.7260.7250.7200.7130.7070.7020.7000.6990.7000.7020.7060.7101.00 D60.4080.4080.4070.4070.4070.4070.4070.4090.4130.4180.4250.4330.4430.4540.4660.20 D70.5220.5210.5210.5200.5200.5180.5150.5130.5140.5170.5210.5270.5340.5420.5510.60 D80.6810.6800.6800.6800.6800.6790.6780.6790.6820.6870.6920.7000.7090.7190.7290.40 D91.1691.1681.1661.1651.1641.1581.1471.1371.1291.1221.1171.1131.1111.1101.1101.80Table3.Determinationofthesocio-economicdesirabilityfactor, d ,foreachdivision. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 678

PAGE 9

Incontrast,M4andM5resultedinonlysmallreductionsinMARE.Also,theadditionalfactorsineach modelshowedvariedeffectsamongtheninedivisions.FromM1toM2,thelargestMAREreduction (15.67)wasfoundfortheMountainDivision(D8),demonstratingtheimportanceofdifferentiating urbanandruralareaswheretractareashavebothhighmeanandhighvariance.FromM2toM3,the MiddleAtlanticDivision(D2)showedthelargestMAREreduction(16.12),indicatingthesigni cant effectofexcludingun-inhabitableareaswherepopulationdensityishigh.TheMAREreductionsfrom M3toM4andM4toM5weresmallinallregions.Overall,themagnitudeofMAREreductiondecreased asthemodelcomplexityincreasedfromM1toM5(Table4).AlthoughthemostcomplexmodelM5had thelowestMARE,themodeleffectiveness59consideringbothmodelaccuracyandmodelcomplexity decreasedfromM3toM5.WethusconsideredM3asthemostef cientmodel.UrbanextentdynamicsThehistoricalurbanareaswerereconstructedbasedontheoldestavailableurbanareaboundaries(2000) usingthescalingrelationshipbetweenurbanareaandpopulation.Weillustratedtheurbanextentsfor Baltimore/WashingtonDCandSanFrancisco/Sacramentoregionsfromourreconstructedresultsforthe Figure3. Comparisonofmeasuredandmodeledpopulationmapsforyear2000. Populationmapgenerated byM5fortheconterminousUS( a ),censustractpopulation( b ),countysubdivisionpopulation( c ),andthe ve modeloutputs(M1-M5)( d )intheSouthAtlanticregion. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 679

PAGE 10

closestcorrespondingyearsandtheUSGSLCIresultsinFig.5.Generally,ourreconstructedurbanareas matchedwellwiththeUSGSproductsespeciallysincethe1950s,albeitwithsomedeviationinthe locationcentersfromwhichtheurbanareasdevelopedoutwardintheearlydecades.ForBaltimore/ WashingtonDC,thefractionofUSGSLCIurbanpixelsthatwereoverlappedbyourmodelwere0.88, 0.84,0.73,0.70,and0.92for1990,1970,1950,1900,and1850.ForSanFrancisco/Sacramentoregion,the fractionswere0.78,0.82,0.73,0.65,0.50,and0.62for1990to1900(Fig.5).Oursimpli edreconstruction tendedtooverestimateurbanextentsinbothregions,andthelowerpopulationthresholdsusedtode ne urbanareasfortheUSGSLCIstudiesresultedinmorescatteredsmallurbanareasthaninourresults. Theassessmentofthehistoricalurbanareasintwofast-growingregionsshowedthatourreconstructed urbanextentsre ectthegeneraldynamicpatternofurbanareascomparedtotheUSGSLCI referencedata.ValidationofgeneratedhistoricalpopulationHistoricalpopulationdistributionwasreconstructedfortheconterminousUSfrom1790to2010 (excluding1960).Validationisonlypossiblebeginningwiththeavailabledata.Forcensus-tract-based validation(1990-2010),sinceModels3to5werecalibratedbasedon2000populationdata,theiraccuracy for1990wasunsurprisinglylowerthanthe2000output.Forexample,forM5MAREvalueswere2.1and 1.5fortheconterminousUSin1990and2000,respectively(Table5).Usingthe vemodelstoproject forwardfrom2000to2010fortheconterminousUSalsohadhigherMAREvaluesexceptM2.However, basedoncountysubdivisionvalidation(1980 – 2010),ourpopulationproductsshowedhighaccuracyfor bothbackwardprojectionfrom2000(1990and1980)andforwardprojection(2010),withsimilar accuracyasthe2000products.Ourpopulationproductspriorto1980couldbevalidatedifcensustract, countysubdivision,orothermoredetailedpopulationdatabecomeavailable.UsageNotesThedatasetgeneratedhereprovidedready-to-usehistoricalpopulationmapsintheconterminousUS from1790to2010attheresolutionof1km.Ourstudyshowedthevalidityofapplyingscaling relationshipsbetweenurbanareaandurbanpopulationtoreconstructhistoricalurbanareas,andthe effectivenessofmodelingspatio-temporalpopulationdistributionsusingexistingdata.Ourbackward 0.1 1.0 10.0 100.0 M1M2M3M4M5MARE 0.1 1.0 10.0 100.0 M1M2M3M4M5MARE D1 D2 D8 conterminous USa b Figure4. Populationmappingmodelperformancecomparison. ( a )comparisonbasedoncensustract populationdata;( b )comparisonbasedoncountysubdivisionpopulationdata.Meanabsoluterelativeerror (MARE),onalogscale,formostcensusdivisions(greylines)showedsimilartrendsinimprovedmodel performancewithincreasingmodelcomplexity.Theorange,red,andbluelinesindicatecensusdivisionsD1, D2,andD8forcomparison.ThedashedlineindicatesmodelperformancefortheentireconterminousUS. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 6710

PAGE 11

projectionwasnotjustbasedoncurrentpopulationinformationbuthistoricalpopulationdataobtained fromNHGIS,whichprovidedaccurateinformationabouthumansettlementandpopulationdensityat countylevelandservedasafoundationtofurtherdisaggregationofpopulationwithinadministrative units.Thehighresolutionoftheinputcensuspopulationdata,theseparationofurbanandruralextents, anduseofinhabitability,elevation,andsocioeconomicdesirabilityin uencingfactorsallcontributedto thegoodaccuracyofthe nalproducts. Our naloutputincluded vesetsofhistoricalpopulationproductsfrom1790to2010reconstructed basedonmodelsM1toM5.Generally,modelaccuracyimprovedwithincreasingcomplexity.According tothecensus-tract-basedvalidation,weproposedM5asthemostaccurateandM3asthemostef cient, whilethecounty-subdivision-basedvalidationsuggestedM4asthemostaccurateandM2asthemost ef cient,largelybecausethelargersizeofcountysubdivisionsmaskedtheeffectofadditionalfactors includedinmorecomplexmodels.Therefore,wesuggestthatusersconsidertheirstudyunitswhen selectingthemodelproducts.Also,wesuggestapplyingtheef cientmodelswhentransferringour approachtootherregions,andadoptingthemostaccuratemodelswhendirectlyapplyingourmodelgeneratedpopulationproducts.Additionally,consideringthevariedmodelperformanceamongthe divisions,wecautionuserstochoosethemostappropriateproductsbasedontheirspeci cstudyregions. Whencomparedtothelargetaskofmodelinghumanpopulationatacontinentalscaleoveratime periodofhundredsofyears,ourmodelsincludeonlyasmallnumberofparameters,andwethusconsider all vemodelsM1toM5asparsimonious.Ourmodelsusednomorethanthreeweightingcoef cients, partlydeterminedbytheavailabilityofhistoricalgeospatialdata,while,forcomparison,10weighting coef cientswereusedinLandScan16.Similartoourmethod,Landscan16andpopulationmodelingfor Asia19andAfrica20,21alsoappliedmultipleancillaryvariablesandallocatedpopulationusingassociated weights.However,landcoverdata,whichwerenotavailableoverthelongtimescalesofinterestinthis study,werethemajorinputfortheirmodeling.Landscandidnotprovidedetailsontheirmodeling methods19.Thepopulationmappingin uencecoef cientspreviouslydeveloped19 – 21werebasedon xed valuesofpopulationdensityforspeci clandcovers,whilethosewedevelopedascontinuousfunctionsof elevation,urbanproximity,andmarketpotential(equations(5),(6)and(8))supportedmorevariation. Further,ourreconstructionwasbasedonthesamemodelstructurewithconsiderationofthedynamic trendsofthemodelinputparametersovertime,includingthenumberofurbanareasandurban population,increasingurbanin uencedistanceovertime,andchangesofdensitygradientswithin urbanareas. Naturalfactorsotherthanaltitude,includingtemperatureandprecipitation,wereexcludedbecauseof theirrelativelysmallvariationwithinmostcounties.However,altitudewasfoundtobeanimportant factortoaccountforintra-countynaturalsuitabilityvariability.Wefurtheremphasizethatourvalidated populationproductsusedthesamemethodfortheentireconterminousUS,withnosigni cantaccuracy differencesintheEastandWest.Anotherimportantfactorthatcouldbeconsideredmoreexplicitlyin extensionstothisworkistransportnetworks,giventheirsigni cantroleinin uencinghuman populationdistribution.Transportnetworkshaveevolvedovertime,shiftingfromriversandcanals,to roadsandrailroads,andthentoairtransport54,buttherearenocurrentlyavailablecomprehensive historicaldatabasesoftransportationnetworks(raillines,roads),precludingtheirdirectinclusioninour models.However,theeffectofthetransporttechnologyevolutionwasindirectlyre ectedinourmodels throughthegrowthofthedailytravelrangeinthegravitymodelofmarketpotential.Theavailabilityof temporallydynamicdatalimitedtheselectionofin uencingfactors.Weaimedtoapplyconsistent techniquesovertimetomakethe nalproductsappropriatefordynamicanalysis. Ourhistoricalpopulationproductsweregeneratedandvalidatedbasedoncensuspopulationdata. Thisdata-basedapproachwasthereforesubjecttothelimitationofthecurrentlyavailablecensusdata. First,wedidnotconsiderAmericanIndiansinourpopulationdistributionreconstructionsincecensus DivisioncensustractMAREreductioncountysubdivisionMAREreduction M1-M2M2-M3M3-M4M4-M5M1-M2M2-M3M3-M4M4-M5 conterminousUS1.833.580.000.011.710.120.07-0.02 D10.330.000.050.002.100.251.76-0.04 D22.4016.120.03-0.010.950.07 0.170.00 D30.220.81 0.010.001.20 0.01 0.01 0.06 D4 0.192.030.000.001.720.020.000.01 D50.411.000.010.051.470.05 0.01 0.05 D60.120.380.02 0.100.480.340.01 0.01 D74.911.44 0.010.051.200.060.05 0.02 D815.671.700.000.018.501.920.190.09 D9 2.366.89 0.020.006.060.420.04 0.01Table4.ModeleffectivenessassessmentresultsbyDivisionfor2000populationproducts. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 6711

PAGE 12

datadidnotincludeAmericanIndianspriorto1900.Itisstillachallengetocollectandcon rm populationnumbersforAmericanIndiansduetothesparseandinconsistentrecord.Second,urbanareas arede nedbasedonapopulationthresholdof2500,however,areaswithlessthan2500peopleinthepast couldhaveheldarolesimilartomodernurbanareas.Ourreconstructionthusmightunderestimatethe attractivenessofsmallerurbanareas.Third,lackofhighresolutionhistoricaldataconstraineda comprehensivevalidationdatingbackto1790.Ourhistoricalpopulationproductspriorto1980canbe validatedwhenmoredetailedcensusdatabecomeavailable.Despitetheselimitations,thepopulationdata fromUSCensusBureauarethebestcurrentlyavailablesourcefordata-drivenanalyses. Weappliedthefollowingmajorassumptionsinthisstudy:Monotonicurbanizationand homogeneouslyoutwardexpansion,constant ,andsteadyinhabitableareas.The rsttwoassumptions wereusedtoreconstructhistoricalurbanareasandthethirdwasappliedtodeterminethein uence coef cientofinhabitability.Whileindividualurbandevelopmentpatternsmayoftendeviatefromthe Figure5. Comparisonofreconstructedhistoricalurbanextents. Urbanareadistributionin2000( a ), Baltimore-WashingtonDCfrom1790to1790fromthisstudy(toppanel)andreference37(bottom)( b ),and SanFrancisco/Sacramentofrom1900to1990fromthisstudy(top)andreference36(bottom)( c ). www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 6712

PAGE 13

rstassumption(suchaslinearexpansionsalongriversorroads),simpli cationswerenecessaryto developef cientmodelsthatcouldbefeasiblyappliedtothelargespatialandtemporalscaleofourstudy. Thisassumptioncouldleadtovariedimpactsonmodelaccuracyforspeci ccitieswithdifferent developmentpatterns.However,wefoundthatthedynamicurbanextentswegeneratedre ectthe generalpatternofurbanareasbasedoncomparisonwithmeasureddataavailablefortwofast-growing regions,Baltimore-WashingtonDCdatingbackto1792andSanFrancisco/Sacramentodatingto1900 (Fig.5).Forthesecondassumptionofconstant ,currentlythereisnoconsistentopiniononhow may changeovertime.Theoreticalconsiderationsfromageometricperspectivesuggested = 2/3,with dimensionsof3forpopulationand2forarea35,althoughthereissomecontroversyonthedimensionof population.Temporalanalysesbasedonamodelofsettlementstructureandsocialnetworks44provided supportforthedynamicnatureof :between2/3forunstructuredsettlementsand5/6forlargerand densersettlementswithinfrastructurenetworks.However,empiricalanalysisdiscoveredlittlevariationin exponentsforTaiwanatdifferentdevelopmentphases41.Here,wefound = 0.95foralldivisionsexcept Paci c(D9)with = 0.86(Fig.6)basedoncurrenturbandata,however,alackofhistoricalurbanareas datacouldnotsupportadynamicanalysisonthisexponentandweassumedittobeconstant.Asmaller forunstructuredsettlements44wouldsuggestsmallersizesofurbanareasandthusahigherpopulation densityinearliertimes,whichmightalsoexplainouroverestimationofhistoricalurbanextents. Regardingourthirdassumption,thetotalinhabitableareamayhaveexpandedwithincreasinghuman settlementpressureandadvancingtechnologicaldevelopment.Forexample,widespreaddrainage convertedwetlandstocroplandsandsettlements60.Thus,ourmodelmayoverestimatehistorical inhabitableareasandthusunderestimatepopulationdensityinpreviousdecades.However,data availabilitylimitationsnecessitatedsimplifyingassumptions,whichresultininevitablemodelling limitations.Futureworkcouldimprovethesemodelsasmoredatabecomeavailable. Thespatiallyexplicithistoricalpopulationdatageneratedherecouldfacilitateadvancingour understandingofcoupledhumanandnaturalsystems.WiththearrivaloftheAnthropocene,thescope, intensity,andrateofchangesinhuman-natureinteractionshaveincreaseddramatically3.Itisthus becomingincreasinglyimportanttounderstandthedynamicsofhuman-natureinteractionsattimescales ofdecadestocenturies8.Forexample,fromtheperspectiveofwaterresources,severalstudieshave quanti edthegeographicrelationshipbetweenhumansettlementsandrivers61,62,however,itremains elusivehowsuchrelationshipshavechangedovertime.Ourpopulationproductscouldhelpevaluate dynamichuman-naturerelationships. ModelcensustractMAREcountysubdivisionMARE 1990200020101980199020002010 M14.296.968.022.963.033.115.20 M22.945.134.941.461.451.401.42 M32.111.553.561.341.321.291.29 M42.191.553.561.261.201.221.21 M52.131.543.551.271.181.231.24Table5.Modelaccuracyassessmentresultsbymodelbasedoncensustractpopulation(1990 – 2010) andcountysubdivisionpopulation(1980 – 2010). Division 9: y = 0.0047x0.86R = 0.85 Division 1-8: y = 0.0028x0.95R = 0.911E+0 1E+1 1E+2 1E+3 1E+4 1E+5 1E+41E+61E+8Urban Areas (km2)Population (persons) Figure6. Scalingrelationshipbetweenurbanpopulationandurbanarea,2000. www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 6713

PAGE 14

References1.Ellis,E.C. etal. Usedplanet:Aglobalhistory. Proc.NatlAcadSci.USA 110, 7978 – 7985(2013). 2.Foley,J.A. etal. GlobalConsequencesofLandUse. Science 309, 570 – 574(2005). 3.Vitousek,P.M.,Mooney,H.A.,Lubchenco,J.&Melillo,J.M.HumandominationofEarth'secosystems. Science 277, 494 – 499(1997). 4.Ellis,E.C.&Ramankutty,N.Puttingpeopleinthemap:anthropogenicbiomesoftheworld. FrontEcolEnviron 6, 439 – 447(2007). 5.Grimm,N.B. etal. Globalchangeandtheecologyofcities. Science 319, 756 – 760(2008). 6.Grimm,N.B. etal. Thechanginglandscape:ecosystemresponsestourbanizationandpollutionacrossclimaticandsocietal gradients. FrontEcolEnviron 6, 264 – 272(2008). 7.Sanderson,E.W. etal. TheHumanFootprintandtheLastoftheWild. Bioscience 52, 891 – 904(2002). 8.Liu,J. etal. CoupledHumanandNaturalSystems. Ambio 36, 639 – 649(2007). 9.Jia,P.,Qiu,Y.L.&Gaughan,A.E.A ne-scalespatialpopulationdistributionontheHigh-resolutionGriddedPopulationSurface andapplicationinAlachuaCounty,Florida. AppliedGeography 50, 99 – 107(2014). 10.Lung,T.,Lubker,T.,Ngochoch,J.K.&Schaab,G.Humanpopulationdistributionmodellingatregionallevelusingveryhigh resolutionsatelliteimagery. AppliedGeography 41, 36 – 45(2013). 11.U.S.CensusBureau.DecennialCensushttps://www.census.gov/history/www/programs/demographic/decennial_census.html (2016). 12.Mennis,J.Generatingsurfacemodelsofpopulationusingdasymetricmapping. ProfessionalGeographer 55, 31 – 42(2003). 13.Wu,S.,Qiu,X.&Wang,L.PopulationEstimationMethodsinGISandRemoteSensing:AReview. GIScience&RemoteSensing 42, 80 – 96(2005). 14.Deichmann,U.,Balk,D.&Yetman,G.Transformingpopulationdataforinterdisciplinaryusages:fromcensustogridhttp:// sedac.ciesin.org/gpw-v2/GPWdocumentation.pdf(2016). 15.Balk,D.L. etal. DeterminingGlobalPopulationDistribution:Methods,ApplicationsandData. AdvParasitol 62, 119 – 156(2006).16.Dobson,J.E.,Bright,E.A.,Coleman,P.R.,Durfee,R.C.&Worley,B.A.LandScan:aglobalpopulationdatabaseforestimating populationsatrisk. PhotogrammEngRemoteSensing 66, 849 – 857(2000). 17.Tatem,A.J.WorldPop,opendataforspatialdemography. Sci.Data 4, 170004(2017). 18.Sorichetta,A. etal. High-resolutiongriddedpopulationdatasetsforLatinAmericaandtheCaribbeanin2010,2015,and2020. Sci.Data 2, 150045(2015). 19.Gaughan,A.E.,Stevens,F.R.,Linard,C.,Jia,P.&Tatem,A.J.HighresolutionpopulationdistributionmapsforSoutheastAsiain 2010and2015. PLoSONE 8, e55882(2013). 20.Linard,C.,Gilbert,M.,Snow,R.W.,Noor,A.M.&Tatem,A.J.PopulationDistribution,SettlementPatternsandAccessibility acrossAfricain2010. PLoSONE 7, e31743(2012). 21.Tatem,A.J.,Noor,A.M.,vonHagen,C.,DiGregorio,A.&Hay,S.I.HighResolutionPopulationMapsforLowIncomeNations: CombiningLandCoverandCensusinEastAfrica. PLoSONE 2, e1298(2007). 22.Gustafson,E.J.,Hammer,R.B.,Radeloff,V.C.&Potts,R.S.Therelationshipbetweenenvironmentalamenitiesandchanging humansettlementpatternsbetween1980and2000inthemidwesternUSA. LandscapeEcol 20, 773 – 789(2005). 23.Linard,C.,Gilbert,M.&Tatem,A.J.Assessingtheuseofgloballandcoverdataforguidinglargeareapopulationdistribution modelling. GeoJournal 76, 525 – 538(2011). 24.Goldewijk,K.K.,Beusen,A.&Janssen,P.Long-termdynamicmodelingofglobalpopulationandbuilt-upareainaspatially explicitway:HYDE3.1. Holocene 20, 565 – 573(2010). 25.Fang,Y. etal. NaturalformingcausesofChinapopulationdistribution. ChineseJournalofAppliedEcology 23, 3488 – 3495(2012). 26.Song,G.,Yu,M.,Liu,S.&Zhang,S.Adynamicmodelforpopulationmapping:amethodologyintegratingaMonteCarlo simulationwithvegetation-adjustednight-timelightimages. IntJRemoteSens 36, 4054 – 4068(2015). 27.Voss,P.R.&Chi,G.HighwaysandPopulationChange. RuralSociology 71, 33 – 58(2006). 28.Stevens,F.R.,Gaughan,A.E.,Linard,C.&Tatem,A.J.DisaggregatingCensusDataforPopulationMappingUsingRandom ForestswithRemotely-SensedandAncillaryData. PLoSONE 10, e0107042(2015). 29.Goldewijk,K.K.Estimatinggloballandusechangeoverthepast300years:TheHYDEDatabase. GlobalBiogeochemicalCycles 15, 417 – 433(2001). 30.Kaplan,J.O.,Krumhardt,K.M.&Zimmermann,N.TheprehistoricandpreindustrialdeforestationofEurope. Quaternary ScienceReviews 28, 3016 – 3034(2009). 31.Pongratz,J.,Reick,C.,Raddatz,T.&Claussen,M.Areconstructionofglobalagriculturalareasandlandcoverforthelast millennium. GlobalBiogeochemicalCycles 22, GB3018(2008). 32.Ramankutty,N.&Foley,J.A.Estimatinghistoricalchangesingloballandcover:Croplandsfrom1700to1992. GlobalBiogeochemicalCycles 13,997 – 1027(1999). 33.Ramankutty,N.,Foley,J.A.,Norman,J.&McSweeney,K.Theglobaldistributionofcultivablelands:currentpatternsand sensitivitytopossibleclimatechange. GlobalEcolBiogeogr 11, 377 – 392(2002). 34.U.S.CensusBureau.2010CensusUrbanAreaFacts.https://www.census.gov/geo/reference/ua/uafacts.html(2016). 35.Nordbeck,S.UrbanAllometricGrowth. Geogra skaAnnalerSeriesB-HumanGeography 53, 54 – 67(1971). 36.Bell,C.,Acevedo,W.&Buchanan,J.Dynamicmappingofurbanregions:growthoftheSanFranciscoSacramentoregionhttps:// landcover.usgs.gov/urban/umap/pubs/urisa_cb.php(1995). 37.Crawford-Tilley,J.S.,Acevedo,W.,Foresman,T.&Prince,W.Developingatemporaldatabaseofurbandevelopmentforthe Baltimore/Washingtonregionhttps://landcover.usgs.gov/urban/umap/pubs/asprs_jt.php(1996). 38.MinnesotaPopulationCenter. NationalHistoricalGeographicInformationSystem:Version11.0[Database] .UniversityofMinnesota:Minneapolis,(2016). 39.U.S.CensusBureau.CensusesofAmericanIndians.https://www.census.gov/history/www/genealogy/decennial_census_records/ censuses_of_american_indians.html(2017). 40.Lo,C.ModelingthepopulationofChinausingDMSPoperationallinescansystemnighttimedata. PhotogrammEngRemote Sensing 67, 1037 – 1047(2001). 41.Lo,C.&Welch,R.Chineseurbanpopulationestimates. AnnalsoftheAssociationofAmericanGeographers 67, 246 – 253(1977). 42.Sutton,P.Modelingpopulationdensitywithnight-timesatelliteimageryandGIS. Computers,EnvironmentandUrbanSystems 21, 227 – 244(1997). 43.Sutton,P.,Roberts,D.,Elvidge,C.&Baugh,K.CensusfromHeaven:Anestimateoftheglobalhumanpopulationusingnighttimesatelliteimagery. IntJRemoteSens 22, 3061 – 3076(2001). 44.Ortman,S.G.,Cabaniss,A.H.F.,Sturm,J.O.&Bettencourt,L.M.A.ThePre-HistoryofUrbanScaling. PLoSONE 9, e87902(2014). 45.Lee,Y.AnallometricanalysisoftheUSurbansystem:1960-80. EnvironPlanningA 21, 463 – 476(1989). 46.Cohen,J.E.&Small,C.Hypsographicdemography:Thedistributionofhumanpopulationbyaltitude. ProcNatlAcadSciUSA 95, 14009 – 14014(1998). www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 6714

PAGE 15

47.Clark,C.UrbanPopulationDensities. JournaloftheRoyalStatisticalSocietySeriesA(General) 114, 490 – 496(1951). 48.Batty,M.TheSize,Scale,andShapeofCities. Science 319, 769 – 771(2008). 49.Batty,M.&Kim,K.S.Formfollowsfunction:reformulatingurbanpopulationdensityfunctions. UrbanStud 29, 1043 – 1069(1992). 50.Chen,Y.Thedistance-decayfunctionofgeographicalgravitymodel:Powerlaworexponentiallaw? Chaos,Solitons&Fractals 77, 174 – 189(2015). 51.Chen,Y.Anewmodelofurbanpopulationdensityindicatinglatentfractalstructure. InternationalJournalofUrbanSustainable Development 1, 89 – 110(2010). 52.Chen,Y.Characterizinggrowthandformoffractalcitieswithallometricscalingexponents. DiscreteDynamicsinNatureand Society 2010, 194715(2010). 53.Krugman,P.R. Development,geography,andeconomictheory ,Vol.6(MITpress:Cambridge,Massachusetts,1997). 54.Grbler,A. Theriseandfallofinfrastructures:dynamicsofevolutionandtechnologicalchangeintransport (Physica-Verlag: Heidelberg,1990). 55.Luo,W.&Wang,F.H.MeasuresofspatialaccessibilitytohealthcareinaGISenvironment:synthesisandacasestudyinthe Chicagoregion. Environ.PlannB 30, 865 – 884(2003). 56.Simini,F.,Gonzalez,M.C.,Maritan,A.&Barabasi,A.L.Auniversalmodelformobilityandmigrationpatterns. Nature 484, 96 – 100(2012). 57.Noulas,A.,Scellato,S.,Lambiotte,R.,Pontil,M.&Mascolo,C.ATaleofManyCities:UniversalPatternsinHumanUrban Mobility. PLoSONE 7, 37027(2012). 58.Gaughan,A.E. etal. SpatiotemporalpatternsofpopulationinmainlandChina,1990to2010. Sci.Data 3, 160005(2016). 59.Paudel,R.&Jawitz,J.W.Doesincreasedmodelcomplexityimprovedescriptionofphosphorusdynamicsinalargetreatment wetland? Ecol.Eng. 42, 283 – 294(2012). 60.Dahl,T.E. WetlandslossesintheUnitedStates,1780'sto1980's .ReporttotheCongress:NationalWetlandsInventory,St: Petersburg,FL(USA)(1990). 61.Ceola,S.,Laio,F.&Montanari,A.Human-impactedwaters:Newperspectivesfromglobalhigh-resolutionmonitoring. Water ResourcesRes 51, 7064 – 7079(2015). 62.Kummu,M.,deMoel,H.,Ward,P.J&Varis,OHowclosedowelivetowater?Aglobalanalysisofpopulationdistanceto freshwaterbodies. PLoSONE 6, e20578(2011).DataCitation1.Fang,Y.&Jawitz,J. Figshare http://doi.org/10.6084/m9. gshare.c.3890191(2018).AdditionalInformationCompetinginterests: Theauthorsdeclarenocompetinginterests. Howtocitethisarticle: Fang,Y.&Jawitz,J.W.High-ResolutionReconstructionoftheUnitedStates HumanPopulationDistribution,1790-2010. Sci.Data 5:180067doi:10.1038/sdata.2018.67(2018). Publisher ’ snote: SpringerNatureremainsneutralwithregardtojurisdictionalclaimsinpublishedmaps andinstitutionalaf liations. OpenAccess ThisarticleislicensedunderaCreativeCommonsAttribution4.0InternationalLicense,whichpermitsuse,sharing,adaptation,distributionandreproductioninany mediumorformat,aslongasyougiveappropriatecredittotheoriginalauthor(s)andthesource,providea linktotheCreativeCommonslicense,andindicateifchangesweremade.Theimagesorotherthirdparty materialinthisarticleareincludedinthearticle ’ sCreativeCommonslicense,unlessindicatedotherwisein acreditlinetothematerial.Ifmaterialisnotincludedinthearticle ’ sCreativeCommonslicenseandyour intendeduseisnotpermittedbystatutoryregulationorexceedsthepermitteduse,youwillneedtoobtain permissiondirectlyfromthecopyrightholder.Toviewacopyofthislicense,visithttp://creativecommons. org/licenses/by/4.0/ TheCreativeCommonsPublicDomainDedicationwaiverhttp://creativecommons.org/publicdomain/ zero/1.0/appliestothemetadata lesmadeavailableinthisarticle. TheAuthor(s)2018 www.nature.com/sdata/ SCIENTIFIC DATA|5 : 180067|DOI: 10 1038 /sdata. 2018 6715