Population synthesis is the process of creating agent representations of the model population based on available data. Sample-based methods are more traditional, but new methods also create synthetic populations sample-free.
Sample-based methods either involve synthetic reconstruction or combinatorial optimization (reweighting) based on existing datasets on population characteristics such as census data.
In synthetic reconstruction the joint-distribution of relevant population attributes are used to create a fitted population and generate individual units on that population. The most common method is Iterative Proportional Fitting (IPF). A procedure in which the attributes of an individual unit are taken from a contingency table with fixed marginal totals. For each agent in the model a random sample is taken from a probability distribution of the relevant attributes existing in the population. This process is repeated until all the attributes are assigned to the agent population. The result is a reconstruction of the original population.
In combinatorial optimization a sample population is generated and then repeatedly modified until it meets a threshold of required constraints. First a set of randomly selected households are taken from an existing population dataset. A random household from this sample and one from the large dataset is assessed for fit. If there is a better fit, the households are switched. The assessment and potential switch is repeated, and the sample population gradually improves its fit to a set of population constraints. The result is a sample population
A sample-free method in synthetic reconstruction involves generating individual units and placing them into households or other groupings until the entire population is used. The method draws the individual's attributes at the most disaggregated level from joint distributions. After all the individuals are generated, the population is compared to the joint distributions and inconsistencies are handled by shifting attribute values. The last step is to gather the individuals into households or groupings. Look for this technique to be applied to migratory groups, areas undergoing rapid change and other underrepresented, marginalized populations.
Populations and Generator Tools:
Population generative tools are available now with existing synthetic populations or for use on new population data.
Uses Iterative Proportional Updating (IPU) that, unlike IPF, controls for both the agent and agent grouping at the same time. Used for creating realistic human populations for prediction of anatomical, physiological and phase 1 metabolic variation in the population in response to exposure or dosage.
PopGen for SimTRAVEL
Adds person-level attributes in addition to census data distributions in population synthesis. These populations are designed for application to urban planning and analysis of transportation, routes, activities, vehicles, emissions and land-use. Arizona State University (ASU) is integrating it into UrbanSim.
Synthetic Populations and Ecosystems of the World (SPEW)
Provides a synthetic population and ecosystem from available data of over 80 countries in American Community Survey (ACS), International Public Use Microdata Samples (IPUMS) and other population samples using simple random sampling. Carnegie Mellon University (CMU) plans to add moment matching and iterative proportional fitting in future versions. Moment matching is a statistical technique used to estimate population parameters by deriving equations that describe the population characteristic's expected mean.
Virginia Bioinformatics Institute Synthetic Data
Synthetic populations of Portland, Oregon, Montgomery, Virginia, West Africa, and Washington, D.C. that have been applied to studies of infectious disease, incarceration rates, and emergency management.
RTI U.S. Synthetic Household Population
Provides a representation of households and persons in U.S. populations from publicly available data sources. These data are placed on a map and represent distribution variations within census blocks. Used for representations of the demographic characteristics of a population including age, gender, race, income, and educational attainment. The map and underlying data is available online for free to track infectious disease, study transportation networks or optimize supply chains.
The data for SPEW, RTI US Synthetic Household and other populations from the Models of Infectious Disease Study (MIDAS) can also be found here:
Arentze, Theo, Harry Timmermans, and Frank Hofman. 2007. “Creating Synthetic Household Populations: Problems and Approach.” Transportation Research Record: Journal of the Transportation Research Board 2014 (December): 85–91.
Barthelemy, Johan, and Philippe L. Toint. 2013. “Synthetic Population Generation Without a Sample.” Transportation Science 47 (2): 266–79. doi:10.1287/trsc.1120.0408.
Beckman, Richard J., Keith A. Baggerly, and Michael D. McKay. 1996. “Creating Synthetic Baseline Populations.” Transportation Research Part A: Policy and Practice 30 (6): 415–29.
Deming, W. Edwards, and Frederick F. Stephan. 1940. “On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals Are Known.” The Annals of Mathematical Statistics 11 (4): 427–444.
Huang, Zengyi, and Paul Williamson. 2001. “A Comparison of Synthetic Reconstruction and Combinatorial Optimisation Approaches to the Creation of Small-Area Microdata.” Department of Geography, University of Liverpool.
McNally, Kevin, Richard Cotton, Alex Hogg, and George Loizou. 2014. “PopGen: A Virtual Human Population Generator.” Toxicology 315 (January): 70–85.
Müller, Kirill, and Kay W. Axhausen. 2010. “Population Synthesis for Microsimulation: State of the Art.” In . Monte Verità, Ascona, Switzerland: ETH Zürich, Institut für Verkehrsplanung, Transporttechnik, Strassen-und Eisenbahnbau (IVT).
Williamson, Peter, Michael Birkin, and Phillip H. Rees. 1998. “The Estimation of Population Microdata by Using Data from Small Area Statistics and Samples of Anonymised Records.” Environment and Planning A 30 (5): 785–816.
Wise, Sarah. 2014. “Using Social Media Content to Inform Agent-Based Models for Humanitarian Crisis Response.” George Mason University. http://digilib.gmu.edu/xmlui/handle/1920/8879.