Regional employment statistics play a vital role in deciding how governments allocate their resources. The more people work in a given area, the more funding that region’s government is likely to receive in order to support essential public services.
Officials in Arlington County began to see a downward trend in their transportation funding due to declining employment numbers—a shift they suspected had more to do with gaps in governmental data than an actual downturn in their working population. Given its proximity to Washington DC, the county hosts a wide variety of federal civilian, contract, and military employees—populations often required to work across multiple job sites, making accurate statistical tracking a major challenge.
The standard formula used by the Metropolitan Washington Council of Governments for calculating local employment, outlined above, was initially proposed in the "10/15/15 Memo on Suggested Approach for Preparing Baseline Employment Estimates."
To fill in these grey areas in federal records, Arlington County demographers commissioned SDAL to develop new, more accurate means of estimating regional employment using external data sources as well as proxies for employment.
Our SDAL team approached Arlington County’s employment data issue from three distinct angles:
- Supplementing Existing Data Sources: researchers worked to construct a comprehensive data set of federal work sites by merging together alternative sources such as GSA federal leasing data, Base Realignment and Closure reports, commercial employment statistics, and demographers’ institutional knowledge.
- Geocoding and Address Verification: the team also documented errors in the geocoding of worksite data, compiling a list of recommendations for collecting accurate address information moving forward.
- Developing Surrogates for Employment Data: researchers performed an exploratory study to assess whether measurements at the building level, such as water, sewer, and cell phone usage, could be used in a model to estimate building occupancy as a surrogate for employment.
In fulfilling its first two objectives, the SDAL team has created a cleaner, more comprehensive data set that can be checked against standard federal records on a quarterly basis, allowing Arlington County officials to validate current employment numbers and quantify any bias that may exist in the system.
Researchers found surrogate measures of office building occupancy such as water usage were more accurate when paired with information about how the interior is split into commercial areas (e.g. retail stores such as restaurants, hair salons) and accounting for baseline water usage from cooling, office kitchens, landscaping, visitors to the building, and restrooms.
As proof of concept, occupancy estimates for two office-only commercial buildings that did not house federal employees were modeled using a simple linear regression model with cooling degree days as the predictor variable and monthly water consumption over a two-year period as the independent variable. Occupancy estimates came within 10% of the number of employees reported in the federal administration data. The next step is to refine the model taking into account additional independent variables and using more diverse commercial space.