Census Bureau Economists to Present at Upcoming Allied Social Science Association and American Economic Association Meeting

By Randy Becker, Center for Economic Studies

U.S. Census Bureau economists will present results from their research at the annual meeting of the Allied Social Science Association (ASSA) and American Economic Association (AEA) in Chicago Jan. 6-8, 2017. This meeting brings together more than 11,000 economists and scholars in related fields from around the world and showcases ongoing research in economics. Census Bureau economists will also serve as discussants of related papers in their fields of expertise, act as panelists and recruit doctoral candidates interested in careers at the Census Bureau.

This year, the ASSA/AEA meeting includes 18 papers with Census Bureau co-authors showcasing recent findings on the following diverse range of topics:

Labor market outcomes: Wages, benefits and employment continue to be a major area of research at the Census Bureau. We will present papers examining the poor labor market outcomes experienced by the long-term unemployed (Abraham, Haltiwanger, Sandusky and Spletzer); the recent earnings growth of job stayers, job switchers, and those transitioning to/from nonemployment (Hahn, Hyatt and Janicki); and the important role firm characteristics play in earnings inequality (Spletzer and Haltiwanger). Another paper looks at the quality of income data from household surveys for the population age 65 and over, and its impact on poverty measurement (Bee and Mitchell).

Other papers examine the labor market effects of institutional changes, such as:

  • The impact of occupational licensing on wages, benefits, and employment (Gittleman, Klee and Kleiner);
  • The effect of a temporary increase in Medicaid reimbursement rates on nurse practitioner labor supply (Udalova);
  • Employer-sponsored 401(k) plans and the impact of changes to auto-enrollment rules on participation rates (Gideon and Mitchell).

These papers use a number of different data, including linked employee-employer data from the Longitudinal Employer-Household Dynamics (LEHD) program, the Current Population Survey (CPS), Survey of Income and Program Participation (SIPP), American Community Survey (ACS) and various administrative records sources.

Business cycle and dynamics: Other papers to be presented focus on the business cycle and changing business dynamism.

  • One paper examines whether job-to-job moves of workers contribute to the cyclicality of employment growth at firms of different sizes and wages, with particular attention on the Great Recession (Haltiwanger, Hyatt and McEntarfer).
  • Another paper demonstrates that cyclical fluctuations in part-time work come from changes in the transition rates between full- and part-time employment (within-job changes in hours) rather than between part-time work and unemployment (Warren).
  • Whether the changing pace of business dynamism is due to changes in the volatility of productivity shocks, or in the response of business to those shocks, is the focus of another paper (Decker, Haltiwanger, Jarmin and Miranda).
  • The diffusion of the Universal Product Code and its links to productivity growth and the reorganization of retail supply chains will be the subject of another presentation (Basker and Simcoe).

Research and development (R&D) and economic growth: A session on “using data science to examine the link between university R&D and innovation” will feature a few papers by Census Bureau authors.

  • One paper investigates the relationship between federal funding of university-based R&D and entrepreneurship activity and success (Jarmin, Zolas, Goldschlag and Lane).
  • Another paper examines the effect of federal funding of research on the outcomes of underrepresented students, including their propensity to start a business and placement in R&D-performing, high-tech firms (Buffington, Harris, Feng and Weinberg).
  • The employment of recent science, technology, engineering and math (STEM) doctorates and post-doctorates in firms of various types (e.g., research intensive, startups vs. established, high- vs. low-productivity, local vs. out-of-state), and their wage outcomes, is the focus of another paper (Barth, Davis, Marschke, Wang and Zhou).
  • A related paper examines R&D spillovers from geographic proximity to other R&D-performing firms and the mobility of scientists and engineers (Barth, Davis, Freeman, Marschke and Wang).

Spatial issues: Other papers also look at spatial issues.

  • One paper examines the role of parent sorting in the spatial variation in intergenerational mobility (Rothbaum).
  • Another paper uses data from the SIPP and ACS to create wealth estimates for smaller geographies and smaller populations than were previously available (Chenevert, Gottschalck, Klee and Zhang).
  • Using matched employer-employee data from the LEHD program, another paper examines the suitability of Commuting Zone definitions and proposes a data-driven method of defining local labor markets (Foote, Kutzbach and Vilhuber).

More: In addition to these and other papers by Census Bureau co-authors, there will be presentations of research papers based on Census Bureau microdata, written by researchers using the Federal Statistical Research Data Center (FSRDC) network.

Economists at the Census Bureau, and our collaborators in the FSRDCs, play a key role in creating and improving statistical products that are essential to policymakers, researchers and the public. These products come from a variety of sources, such as survey microdata on businesses and households, linked employer-employee data, and confidential microdata from federal and state administrative and statistical agencies. Our economists apply these data to the study of income and labor dynamics, industrial organization, household structure, health and disability, international trade and other topics.

For further details on the papers to be presented at the ASSA/AEA meeting, including a preliminary program with abstracts, see <www.aeaweb.org/conference/2017/preliminary>. Also see <www.census.gov/research/conferences/assa/2017.html> for additional information about the authors and presentations.

For more information on working papers by Census Bureau researchers and FSRDC researchers, see <www.census.gov/research/working_papers/>.

For presentations by Census Bureau researchers at previous ASSA meetings, and at other major professional meetings, see <www.census.gov/research/conferences/>.

Posted in Uncategorized | 1 Comment

How Much Do Startups Impact Employment Growth in the U.S.?

Written by:  Jim Lawrence, Economy-Wide Statistics Division

The U.S. Census Bureau releases data every year describing changes for businesses operating in the United States. This infographic shows the number of jobs created by startups annually between 2004 and 2014, as well as job creation from startups as a percentage of total U.S. employment for 2006 and 2014. Figure 1 shows job creation from startups as a percentage of total U.S. employment for each year between 2004 and 2014.

Figure 1:  Job Creation from Startups as a Percentage of Total U.S. Employment From 2004 to 2014


The percentages in Figure 1 might not fully demonstrate the role of startups in overall employment growth in the United States. This is in part because these percentages represent a comparison of job creation, which is a ‘flow’ measure, to the employment level, a ‘stock’ measure. A flow measure like annual job creation is a measure of change, which is applied to the level of employment each year. The stock of employment in the United States in any given year, therefore, represents the accumulation of net employment flows for each of the years preceding it.

An alternative way to assess the impact of job creation from startups is to compare it to another flow measure. One option would be to compare it to gross job creation, defined as the total number of jobs created in a given year by all establishments combined, including establishments that constitute startups. We define an establishment as a single physical location at which business is conducted or services or industrial operations are performed. It is not necessarily identical to a company or enterprise, which may consist of one or more establishments.

The converse of gross job creation is gross job destruction, which is defined as the total number of jobs destroyed in a given year by all establishments combined. By definition, establishments that constitute startups do not destroy jobs. The difference between gross job creation and gross job destruction represents yet another way to compare the impact of job creation for a given year, net job creation. For example, according to the 2014 Business Dynamics Statistics, gross job creation in the United States in 2014 was 16.0 million and gross job destruction was 13.3 million. The difference, 2.7 million, represents net job creation for the year.

In terms of whether it is preferable to compare job creation from startups to gross job creation or net job creation, an argument can be made that it is preferable to use gross job creation. This is because startups only create jobs and do not destroy them. Therefore, they do not have a net job creation number associated with them in the way that nonstartup firms do. In this sense, job creation from startups can be seen as a gross number and should therefore be compared to gross job creation, versus net job creation.

Figure 2 shows job creation from startups as a percentage of gross job creation for 2004 to 2014.

Figure 2:  Job Creation From Startups as a Percentage of Gross Job Creation From 2004 to 2014


The percentages in Figure 2 fluctuate between a high of 19.0 percent in 2005 to a low of 13.9 percent in 2012. These percentages are much higher than in Figure 1, which fluctuate between a high of 3.0 percent in 2006 to a low of 1.9 percent in 2013. The significantly higher percentages in Figure 2 demonstrate that job creation from startups, as seen in the context of gross job creation, make a larger contribution to employment growth in the United States than is implied by Figure 1.

If you want to learn more about employment growth, job creation and other statistics impacting business in the United States, visit the Business Dynamics Statistics web site.

Posted in Uncategorized | Leave a comment

Reaching the Foreign-Born: An Examination of Mode of Response by Nativity in the American Community Survey

Written by: Thomas Gryn, Foreign-Born Population Branch, Population Division

The American Community Survey is collected using a variety of response modes. From 2005 to 2012, the survey was administered by mail, telephone (Computer Assisted Telephone Interview [CATI]), or in person (Computer Assisted Personal Interview [CAPI]). In 2013, the American Community Survey added an internet response option. This entry will summarize how offering the internet option affected response patterns of the foreign-born population.

American Community Survey response modes vary across the population, especially for subgroups such as the foreign-born population. A “foreign-born” person is anyone who was not a U.S. citizen at birth, including those who became U.S. citizens through naturalization. A “native” is anyone who was a U.S. citizen at birth, including those born in the United States, as well as those born in Puerto Rico, U.S. Island Areas (the U.S. Virgin Islands, Guam, American Samoa, and the Commonwealth of the Northern Mariana Islands) or born abroad to a U.S. parent(s).

Typically, native respondents to the American Community Survey have been more likely to respond by mail than telephone or in person, while foreign-born respondents have had a higher percentage of telephone and in-person response rates. Figure 1 shows that in 2012, 59 percent of natives responded by mail, 8 percent by telephone, and 33 percent by in-person interview. In contrast, 42 percent of the foreign-born responded by mail, 8 percent by telephone, and 50 percent by in-person interview. The tendency not to respond as much to initial contacts via mail is one reason why the foreign-born are sometimes referred to as a “hard-to-reach” population.










In 2013, natives were more likely than the foreign-born to respond by internet. For both the native and foreign-born population, introduction of the internet mode appears to have reduced the percentage responding by mail, while the telephone and in-person response rates appear mostly unchanged.

Looking only at foreign-born householders in Figure 2, naturalized citizens were more likely to respond by mail or internet than noncitizens. For both naturalized and noncitizens, introduction of the internet mode appears to have reduced the percentage responding by mail, while in-person responses appear mostly unchanged.


Additional research indicated that response patterns by age and sex were similar for natives and the foreign-born. Among householders below the poverty line, the foreign-born were more likely to use in-person modes and less likely to respond by mail or internet than natives. Finally, among foreign-born householders, higher English-language ability corresponded with greater internet response and lower use of in-person modes.

 In conclusion, the foreign-born respond less often by mail and internet and more often by in-person interviews than the native-born. For both the native and foreign-born population, introduction of the internet mode appears to have reduced mail response, while in-person and telephone responses appear mostly unchanged. This suggests that respondents who previously responded by mail are now using the internet option. “Hard to reach” populations such as the foreign-born that respond more by telephone and through in-person interviews will likely continue to respond at high rates using those modes, even with the introduction of the internet mode of response.

Posted in Uncategorized | Leave a comment

Census Bureau Awards Cooperative Agreements to Georgetown University and Purdue University

Written by John Abowd, associate director, Research and Methodology Directorate

Today, the U.S. Census Bureau awarded two cooperative agreements to research teams at Georgetown University and Purdue University. These teams of university-based researchers are at the forefront of the emerging field of privacy-preserving data analysis, and their efforts will assist the Census Bureau in ensuring we continue to be a leader in protecting confidential information.

The Georgetown University project will help develop methods for publishing data that satisfy both formal mathematical privacy requirements and legal standards for privacy protection. Their research, combined with ongoing research at the Census Bureau, will provide improvements to existing methods that protect privacy by avoiding the release of any information that would identify an individual or business in public statistics.

The projects complement new initiatives within the Census Bureau to strengthen our disclosure avoidance methods, especially as they apply to the detailed publications that result from our flagship products: the 2020 Census, the American Community Survey, and the 2017 Economic Census.

The team from Georgetown University, led by Kobbi Nissim, includes two of the computer scientists who originally developed the theory of differential privacy — the first privacy-preserving data analysis model — as well as leading researchers from Harvard University who specialize in cryptography and information law. Their work for the Census Bureau will help improve the way we understand and implement our statutory mandate to protect the confidentiality of all respondent information in the Big Data era.

The Purdue University project will investigate methods to improve the usefulness of anonymized data by studying systems where automated techniques perform many of the tasks currently performed directly by data analysts preparing the publication products. These private automated techniques have the potential to produce high-quality publishable data without compromising the privacy of the respondents even inside the Census Bureau. This research is complementary to our ongoing research on methods that strengthen traditional disclosure avoidance techniques.

Chris Clifton, a computer scientist with an extensive research record in data anonymization leads the team from Purdue University. He is a past program director in the National Science Foundation’s Computing and Information Science Directorate. Their research is expected to help the Census Bureau better understand how to preserve the suitability of our data products for their many uses once we adopt modern privacy-preserving anonymization methods better adapted to the Big Data era.

Both awards are three-year collaborative efforts that will provide us with the time to research, test and further refine innovation methods to enhance our assurance of the protection of confidentiality mandated by U.S.C. Title 13.

The Census Bureau’s mission is to serve as the leading source of quality data about the nation’s people and economy. We honor privacy, protect confidentiality, share our expertise globally, and conduct our work openly. These new cooperative agreements provide complementary approaches to innovative methods and procedures for executing the dual statutory mandates in Title 13 U.S.C. — collect data in order to publish statistics and maintain the confidentiality of respondent information.

Moving forward, the Census Bureau intends to use Cooperative Agreement Authority to enter into partnership with leading experts in order to produce innovative work and to ensure that we remain the leading source of quality data about the nation’s people and economy. We will use this important tool to engage with leading experts in academia, researchers and nonprofit agencies. Our goal is to find the best sources of data, the best methods to analyze these data, and the best tools to provide data to the public.

Posted in Uncategorized | 1 Comment

Research on Plant Dynamics in the Manufacturing Sector

By Lucia Foster and Scott Ohlmacher, Center for Economic Studies

Why do some manufacturing plants grow and thrive while others falter? Economists using U.S. Census Bureau plant-level microdata have approached this complex question from at least three different angles. First, they looked at microeconomic patterns at the plant level. Second, they examined the growth, survival and exit of manufacturing plants throughout the business cycle. Finally, they documented the long-term, secular trends in manufacturing.

Each of these is the subject of recent papers by researchers using Census Bureau microdata from the Annual Survey of Manufactures (ASM), the Census of Manufactures (CM), and the Longitudinal Business Database (LBD). While this blog highlights these papers, extensive literature on this subject using Census Bureau microdata exists (many of these papers can be found in the CES Working Paper series.)

Turning first to the microeconomic patterns, the growth, survival and exit of manufacturing plants depends upon their profitability. Profitability is influenced by many factors internal and external to the plant. One important component of this profitability is the plant’s productivity. Empirical evidence using Census Bureau plant-level data reveals that there are differences in productivity in manufacturing plants even within the same narrowly defined industries. Differences in location and production technology are two possible reasons productivity at manufacturing plants vary, even within the same industry.

The Census Bureau collects information on plant characteristics through the ASM and the CM. However, even when we set controls for relevant characteristics, important differences in productivity remain. According to the productivity literature, Syverson (2011) notes that when using the same measured inputs, a manufacturing plant at the upper end of productivity distribution is able to produce almost twice as much output as a manufacturing plant in the same industry at the lower end of productivity distribution. In discussing possible reasons for these differences, Syverson comments that managers have long been thought to be an important factor, but without data, their importance has been speculative.

The Management and Organizational Practices Survey (MOPS), a supplement to the ASM, is intended to partly fill the data gap by collecting information on these practices. Evidence from the MOPS suggests that management practices are correlated with productivity in manufacturing plants. Bloom et al. (2013) find that “structured” management practices related to monitoring, targeting and incentives are tightly linked to better performance (including higher productivity). These “structured” practices include monitoring a large number of high-frequency key performance indicators (KPIs), setting realistic production targets, making sure that all levels of the organization at the plant are aware of KPIs and targets, and setting bonus, promotion and dismissal incentives based on those targets. While structured management practices are associated with positive outcomes, many plants do not adopt these practices. Researchers are now looking into why there are differences in management practices at plants even within the same firm.

Brynjolfsson and McEhleran (2016) find that the adoption of intensive data-driven decision-making and an increased allocation of decision-making to front-line production workers (versus manager-centric decision-making) is associated with large gains in productivity for plants in industries that are generally capital intensive and utilize “continuous-flow” operations.

The research above focuses on the supply side of profitability, but the demand side is also important. The challenge here is that the Census Bureau does not collect microlevel information on prices. However, researchers have been able to create proxies for the demand side in a limited sample of manufacturing plants for which the Census Bureau collects both revenue and physical output. Using this sample, Foster, Haltiwanger and Syverson (2016) show that much of the growth of plants is dependent on the demand side. They find that new manufacturing plants have higher physical productivity and lower revenue productivity compared to their more mature counterparts, reflecting that new plants set prices low in order to build up their market and grow.

In terms of business cycles, Foster, Grim and Haltiwanger (2016) examine the growth, survival and exit dynamics of manufacturing plants during recent cycles. Regardless of the overall economic conditions, it is generally true that plants that are more productive grow and thrive, while lower productivity plants shrink and exit. In most recent business cycle downturns, this process of reallocation from less productive plants to more productive plants is accelerated. However, they find that in the Great Recession, this reallocation of economic activity from least productive to more productive weakened relative to other downturns. Since this was especially pronounced for young plants, they hypothesize that credit constraints impacted the reallocation. Researchers are also using the MOPS to look at the management and organizational characteristics of manufacturing plants that are able to better weather business cycles.

Finally, researchers have used Census Bureau microdata to better understand long-term trends in manufacturing. Using plant-level data is critical to understanding these trends due to changes in industry classification schemes. Without controlling for these changes, it is unclear what is due to changes in underlying economics activity at plants versus changes in classification.

Pierce and Schott (2016) focus on the decline in U.S. manufacturing employment from 2000 to 2007. They examine the link of this decline to China’s accession to the World Trade Organization (WTO) in 2001. By using the Longitudinal Business Database (a Census Bureau research dataset) and the CM, they examine the response of manufacturing plants while controlling for changes in classification. In addition to changes at the external margin (including within firm relocation of production outside the United States), they find evidence of capital deepening of U.S. manufacturing plants that continued in operation during this period.

One topic that figures in Pierce and Schott’s work is the impact of uncertainty on manufacturing plant’s decisions. They cite anecdotal evidence that uncertainty concerning China’s trade status leading up to its accession to WTO impacted manufacturing plant’s planning decisions. The second wave of the MOPS, which is currently in collection, includes a section on uncertainty. We look forward to research using the MOPS that will enable us to better understand the impact of uncertainty on manufacturing plants’ growth, exit and survival.

Posted in Uncategorized | Leave a comment

Challenges Facing the Disclosure Review Board

Written by: William Wisniewski, Center for Disclosure Avoidance Research

At the U.S. Census Bureau, the Disclosure Review Board is best known as the team that establishes and reviews official Census Bureau disclosure avoidance policies for the public release of data products that do not reveal any information about the survey respondents. Yet, the boards’ members also serve other important and lesser known roles. For example, they work with researchers in the Center for Disclosure Avoidance Research to determine the effectiveness of current disclosure avoidance techniques in protecting data products. In addition, these researchers study and develop new techniques that may be applied to future releases of data products.

This work is critical in meeting the guidelines established under Title 13 and Title 26 of the U.S. Code, which states that the Census Bureau is required to protect the confidentiality of individual respondents when it releases data to the public .

This seemly simple mission can often pose challenges. For example, what occurs if a researcher wants to release counts and demographic characteristics of individuals in every county in the United States? What if a researcher wants to release an infinite number of variables in a Public Use File? What should a researcher do if they encounter small cell sizes within their data product?

These types of questions and others, along with their solutions, will be presented in a topic-contributed session at the 2016 Joint Statistical Meetings on Wednesday, August 3, 2016, titled ”Innovations in Disclosure Avoidance at the U.S. Census Bureau.” We explain specific issues and walk through some of the methods and techniques that are used to ensure the Disclosure Review Board meets its mission. That is, to support the Data Stewardship Executive Policy Committee in its efforts to ensure that the Census Bureau protects all Title 13 and Title 26 respondent confidentiality of publicly released data products.

Looking to the future, the Disclosure Review Board will also continue to face other challenges. It is likely that Census Bureau and other researchers will need to develop, test, and apply new methodologies and techniques to Census Bureau data, particularly as the quantity of potentially linkable data outside of the Census Bureau increases.


Posted in Uncategorized | 2 Comments

Evaluating Possible Administrative Records Uses for the Decennial Census

Written by: Andrew Keller and Scott Konicki

When a household does not respond to the census, the U.S. Census Bureau must send a field worker to that address to complete a nonresponse follow-up interview. For the 2010 Census, 72 percent of American households mailed back a completed census form. The remaining 28 percent that did not respond by mail were counted via a census taker that visited their address. In-person interviews are much more costly than getting a response back in the mail. For the 2020 Census, the Census Bureau is researching the possible use of administrative records to provide a status and count for some addresses in the nonresponse follow-up universe—that is, to indicate whether the housing unit is likely to be occupied or vacant, and how many people may live in it. As outlined below, this information will aid in reducing the number of contacts during the nonresponse follow-up operation.

Over the last four years, the Census Bureau has tested various methods using administrative records to reduce the nonresponse follow-up workload. All tests used administrative records modeling with varying levels of complexity. In the tests, the administrative records allow us to split the nonresponse follow-up address universe into three categories: (1) units identified as administrative records occupied, (2) units identified as administrative records vacant, and (3) addresses identified as no determination.

The figure below shows the flowchart of the contact strategy related to administrative records cases for the nonresponse follow-up operation specific to the 2016 Census Test. When administrative records indicated that an addresses was vacant, it received no in-person visits during the nonresponse follow-up operation.


Addresses that the administrative records indicated to be occupied received only one visit in the 2016 Census Test. All units in the nonresponse follow-up address universe, whether the administrative records indicated they were vacant or occupied, did receive an additional postcard by mail during the nonresponse follow-up operation. The postcard told people at these addresses how to self-respond by filling out the questionnaire online or by responding through the questionnaire assistance line. In short, there are several ways before and during nonresponse follow-up that the Census Bureau is attempting to obtain and use self-responses before using administrative records determinations.

The development of possible administrative records models has been guided by comparing models retrospectively against 2010 Census results. Doing so provides a national evaluation of potential administrative records models. However, a difficulty underlying the evaluation of administrative records modeling usage is handling concerns such as undercounts and erroneous enumerations. Although the analysis using the 2010 Census results provides a solid basis for assessing model performance, it is not the only way to measure it.

To learn more about Nonresponse Follow-Up Contact Strategy for Administrative Record Cases, please join us at the Joint Statistical Meetings.

Posted in Uncategorized | 2 Comments

Researching Methods for Scraping Government Tax Revenue From the Web

Written by: Brian Dumbacher, Mathematical Statistician, Economic Statistical Methods Division, and Cavan Capps, Big Data Lead, Associate Directorate for Research and Methodology

The Quarterly Summary of State and Local Government Tax Revenue is a sample survey conducted by the U.S. Census Bureau that collects data on tax revenue collections from state and local governments. Much of the data are publicly available on government websites. In fact, instead of responding via questionnaire, some respondents direct survey analysts to their websites to obtain the data. Going directly to websites for those data can reduce respondent burden and aid data review.

It would be useful to have a tool that automatically collects, or scrapes, relevant data from the web. Developing such a tool can be challenging. There are thousands of government websites but very little standardization in terms of structure and publications. A large majority of government publications are in Portable Document Format (PDF), a file type not easily analyzed. Finally, both web and PDF documents have constantly changing formats.

To solve this problem, researchers at the Census Bureau are studying and applying methods for unstructured data, text analytics and machine learning. These methods belong to the realm of “Big Data.” Big Data refers to large and frequently generated datasets representing a variety of structures. As opposed to designed survey data, Big Data are “found” or “organic” data. Typically, these data are created for a click log, a social media blog or an online PDF report, but are innovatively repurposed and used for something else such as inferring behavior. Since the data were not specifically designed to infer, they often have unique challenges.

The goal of this research is to develop a web crawler with machine learning that performs three tasks:

  1. Crawls through a government website and discovers all PDFs.
  2. Classifies each PDF as containing relevant data on tax revenue collections.
  3. Extracts the relevant data, organizes it and stores it in a database.

For task 1, we used the open-source software called Apache Nutch. In a production environment, the process will scale up by distributing the work over many computers and then combining the results.

For task 2, we developed a technique to convert PDF documents to text and re-organize the output. A classifying model applied to the converted PDF determines if the document has relevant data on tax revenue collections. This model uses the occurrence of key sequences of words such as “statistical report” and “sales tax income” and other text analysis techniques.

For task 3, we are considering various ideas. Relevant data would probably be found in tables and in close proximity to key sequences of words. We will explore table identification methods based on the distribution of terminology in the PDF and additional modeling that maps the nonstandard data in PDFs to standard definitions in Census Bureau publications.

The Census Bureau looks forward to continuing this web scraping research and exploring new machine learning algorithms that reduce respondent burden, speed survey processing and improve data collection.

To learn more about the research methods for scraping government tax revenue from the web, please join us at the Joint Statistical Meetings on August 2, 2016.

Posted in Uncategorized | 3 Comments

Reducing Respondent Burden in Counting Juveniles

Written by: Suzanne Marie Dorinski, Economic Statistical Methods Division

The U.S. Census Bureau conducts the Census of Juveniles in Residential Placement every other year for the Office of Juvenile Justice and Delinquency Prevention. This survey collects data from almost 2,400 public and private juvenile facilities that hold juveniles charged or adjudicated for a delinquency or status offense to provide a count of juveniles in publicly and privately run juvenile correctional facilities.

The data collection has two parts: (1) questions about the facility and (2) questions about each charged or adjudicated juvenile held in the facility.

For each juvenile, we ask the following:

  • Gender.
  • Date of birth.
  • Race.
  • Who placed the juvenile in the facility.
  • Most serious offense.
  • State or territory where offense was committed.
  • Adjudication status.
  • Admission date.

Facilities have the option of responding by mail, through the internet or by fax. Those that respond online can enter the data for each juvenile or they can upload a data file. For the 2013 collection, we suggested that larger facilities should upload a data file but did not define how big a larger facility is.

Our online data collection tool collects paradata for each response. The paradata file captures the values that the facility enters, as well as any changes that the facility makes, and keeps track of the edit messages that the facility sees while reporting their data. Each action has an associated time stamp, so we can tell how long each facility spends online to report their data.

The graphic below shows that as the number of juvenile records entered online increased, the amount of time spent in the data collection tool increased. To reduce the burden on the juvenile facilities, we could include this graphic in the next data collection and suggest that facilities with 50 or more juvenile records upload a data file instead of spending hours entering that data in the data collection tool. Knowing this information is essential to helping us make responding to the survey easier for staff at the juvenile facilities.

Dorinski 2

We have also shared these results with the Office of Juvenile Justice and Delinquency Prevention, and they plan to use it in the future to adjust their estimation of respondent burden hours that they report to the Office of Management and Budget each year.

I will provide more suggestions for reducing respondent burden for juvenile residential facilities at the 2016 Joint Statistical Meetings and in the conference proceedings.

Posted in Uncategorized | Tagged , , , | Leave a comment

Estimating the Reliability of Product Sales Totals in the Economic Census

Written by: Katherine Jenny Thompson, Complex Survey Methods and Analysis Group; Matthew Thompson, Business Register and MEPS Statistical Methods Branch; and Roberta Kurec, Economic Census and Related Surveys Statistical Methods Branch, Economic Statistical Methods Division

The economic census is the U.S. Census Bureau’s official five-year measure of American business and the economy. It provides industry and geographic detail not typically available from other economic statistics sources benefitting businesses, policymakers and the American public.

The term “census” in this case is actually a slight misnomer. The Census Bureau requests data from most large businesses and a sample of small businesses. We ask each of these businesses to provide data on sales, shipments, and receipts or revenues for each of its establishments (i.e. for each single physical location)—as shown in Figure 1.

Thompson Fig 1

We also ask for the revenues obtained by each establishment from the types of products likely to be produced or sold based on its primary industry. Product statistics are needed by the Bureau of Economic Analysis to benchmark the national accounts, as well as by the Bureau of Labor Statistics in constructing producer price indexes. The North American Product Classification System defines over 8,000 different products that can be reported across the entire census.

As an example, Figure 2 provides a short extract from the product collection for establishments in the “Automobile Dealers” retail trade industry from the 2012 Economic Census. Notice that, on the surface, these products don’t seem to be related to automobile dealers, but they are products that could be found at automobile dealerships, and that is why they are included on the questionnaire. The product list for some establishments can span more than 50 potential products. Additionally, for certain industries Census designates “must-have” products. For example, an automobile dealer should report revenue from automobile sales.

Thompson Fig 2

In most industries, only a few products are frequently reported and many sampled establishments do not report any data on products. This makes it difficult to produce good product statistics and measures of reliability.

For the past two years, the Census Bureau has conducted extensive research into product statistics. Initial research by the team focused on determining a single missing data treatment method for products in the 2017 Economic Census. The research, presented in a topic contributed session entitled “Evaluating Alternative Imputation Methods for Economic Census Products: The Cook-Off” was reported at the 2015 Joint Statistical Meetings.

This year, we have been exploring how to estimate the variance for product sales. Besides the sampling, imputation and post-stratification components, there are additional challenges caused by the lack of good predictors and high expected zero rates for many products, compounded by the high product nonresponse rates. We believe that it is possible to find a variance estimator with good statistical properties for the well-reported products, but we remain concerned about the others. So far, the team has conducted two separate simulation studies that investigate the possibility of finding a variance estimator that performs well on many different products considering only (1) sampling variance and post-stratification, and (2) product nonresponse and hot deck imputation. We will share these results on August 1, 2016, at the JSM. The next phase of our research will combine the findings from the two separate studies to develop a single variance estimator for products.

Posted in Uncategorized | Tagged , | Leave a comment