Written by: Rachelle Hill, Center for Economic Studies, and Katie Genadek, University of Minnesota
Time diary surveys collect information about the different activities the survey respondents participate in throughout the pre-selected diary day, including a general description of the activity and the amount of time spent in each activity. This unique data structure creates novel research opportunities as well as challenges for choosing the appropriate analytic method. In our paper, Investigating Alternative Methods to Estimate Time Use Behaviors, we compare four different analytic methods in estimating time diary data and demonstrate the importance of considering how different modeling techniques may affect the results. We investigate these alternative methods to help time diary researchers better understand the complexities of choosing the correct analytic method and its potential impact to the results.
The Bureau of Labor Statistics sponsors the American Time Use Survey (ATUS), which is conducted by the U.S. Census Bureau. This annual, cross-sectional, time diary survey began in 2003 and is conducted throughout the year. The survey captures a respondent’s daily activities from 4 a.m. of the day prior to the survey until 3:59 a.m. of the survey day.
Interviewers record each activity according to a six-digit coding scheme. Activities include everything from biking to doing laundry to looking for a job. This coding scheme protects the respondent’s identity while also condensing the information into a useable structure that allows researchers to investigate their activity of interest. Despite the detailed coding structure, some aspects of time diary data make analysis difficult.
The American Time Use Survey diary is limited to a small window of time, specifically 24 hours. This short period of time increases the chances that the respondent may not record participation in some activities of interest regardless of whether or not the activity is one in which they frequently engage. For example, some respondents will report no time spent with extended family members because they did not see them on the diary day but see them at other times. In contrast, other respondents will report no time spent with extended family members because they never see them. This is referred to below as a true zero.
Figure 1 illustrates the variability in the percentage of zeros across different family members. After limiting the sample to parents, members of couples and all respondents by using the relationship variables captured in the survey instrument, the figure shows the percentage of respondents that spend a given number of minutes with children, spouses/partners and extended family members, respectively. We draw on these similar measures of family time with differing proportions of zeros to compare different analytic methods.
We compare four analytic methods used in time diary data analyses while drawing on different measures of time with family members (including children under 6, all children, spouse/partner, only spouse/partner, parents and extended family members) from the 2003-2010 American Time Use Survey. By comparing measures of similar concepts across model types, we can compare the estimates produced by the different analytic methods.
The four methods we examine are: Ordinary Least Squares, Tobit, Double Hurdle and Zero-Inflated Count models. Ordinary Least Squares assumes that the variable of interest is continuous and may be biased when the variable is censored at zero. Tobit accounts for a censored distribution and is often applied in time diary analyses but assumes that cases censored at zero are true zeros rather than a mismatch between the diary day and the activity. Double hurdle predicts both the likelihood of not participating on the diary day and the amount of time spent, but there is some evidence of bias when the covariates are related to the likelihood of not participating. Zero-Inflated count models effectively model a large proportion of zeros, predict both the likelihood of participating in an activity and the amount of time spent and assume two causes for not reporting time in a given activity.
In our preliminary results, we find that the model coefficients vary by the proportion of respondents who spend no time with family members. When the proportion of respondents who report no time is smaller, as is the case with parents’ time spent with children, the predictions are nearly the same across the four model types. When the proportion of respondents who report no time is larger, as is the case with respondents’ time with extended family members, then the predictions vary considerably. Specifically, we find that Tobit and Double Hurdle estimates are more variable than Ordinary Least Squares and Zero-Inflated Count models. Such variability is evidence of the need to consider and evaluate different analytic methods and their effects on reported results.
The next step in our analysis is to explore the four methods using simulated data. We will compare estimates from various possible American Time Use Survey data structures including all true zeros, no true zeros and a mix at different proportions. This comparison will help time diary researchers choose the analytically appropriate method for their research question and better understand the implications of their choice for their results.