Data sources
To estimate our models we stack data from two child-level datasets constructed from the 2006 and 2011 Nepal Demographic and Health Surveys (DHS). We then merge to these data information from the 2004 and 2010 Nepal Living Standards Surveys (NLSS). The DHS surveys include our dependent variables for children under five years of age, as well as child, mother and household characteristics that have been shown in past studies to be relevant to explaining child growth. The NLSS includes measures of agricultural activity, access to services, infrastructure, and incomes at the individual and household levels. The NLSS did not visit the same households as the DHS, so we cannot directly match household information. However, both surveys used the same district definitions and identification codes. This allows us to aggregate household observations from the NLSS up to the district level, and then match a set of district-level NLSS variables to DHS households based on district and year combinations. To our knowledge, there is no publicly available crosswalk that would allow a researcher to match children across surveys or to match geographic data at a finer scale (e.g. subdistrict, village, or municipality). Therefore, we do not attempt to produce any matches at scales finer than the district. We match 2004 NLSS data to the 2006 DHS, and 2010 NLSS data to the 2011 DHS. The 2006 DHS includes 5237 children, and the 2011 DHS includes 2335 children. When combined, these datasets provide anthropometric information on 7572 children under age five. A total of 39 children were omitted due to missing values for independent variables, leaving 7533 child-level records for analysis. The validity of our DHS-NLSS matching rests on the assumption that these measures of community characteristics from the NLSS are reliable measures of the more general circumstances surrounding a child subsequently observed in the DHS. To account for differences in lag lengths and potential observed and unobserved heterogeneity in trends across time and space we use survey year and birth year controls. Use of these data did not require institutional review because respondents previously provided informed consent and were rendered anonymous before the data were released to us for analysis.
Our dependent variables are the child’s height-for-age z-score (HAZ) and weight-for-height z-score (WHZ). Z-scores measure the dispersion of the indicator as standard deviations around a reference population median, and are calculated as:
$$ {z}_i=\frac{x_i-\overline{x}}{\sigma_x} $$
(1)
where x
i is the individual observation and x̅ and σ
x are the median and the standard deviation of the reference population. Z-scores were calculated using the WHO’s current Child Growth Standards reference population [
21]. Our use of continuous z-score outcomes is noteworthy because many studies use a binary dependent variable to indicate stunting (HAZ < − 2.0) or wasting (WHZ < − 2.0) [
15,
16,
22,
23]. Z-score cutoffs (e.g. -2.0 for stunting and wasting or − 3.0 for severe stunting or severe wasting) can mask important information about the entire distribution of outcomes and their use discards information about that distribution, a fact recognized at the time z-scores were introduced by the WHO [
24]. Elsewhere [
25‐
27] it has been argued that the widely-accepted − 2.0 cutoff is arbitrary, with little biological basis for a threshold. Using a continuous measure in place of a binary indicator allows us to capture the intensity of growth faltering in the population. Z-scores used in this analysis are distributed normally, although plots of z-scores against quantiles of the normal distribution do reveal slight departures from normality in the extreme tails of the distributions, but not to a degree that is detrimental to the analysis or amenable to correction via a monotonic transformation of the data.
Among
immediate determinants, we include a large set of child-level variables that have been shown to be correlated with child growth in Nepal and elsewhere. These include the child’s age (in months), sex, and twin status, as well as two indicators of acute disease symptoms (diarrhea in the two weeks prior to anthropometric measurement and fever in the same period) as these are known to place demands on a body’s physical resources [
16]. Given the importance of breastfeeding patterns in determining nutrition, health and physical growth [
2,
15], we include a binary variable indicating whether a child was being breastfed at the time of measurement, along with the total number of months of breastfeeding. In further recognition of the importance of a mother’s status and education [
17,
27‐
33], as well as natal and perinatal health in early childhood development [
2,
34‐
37], we also include a set of maternal characteristics that are tied to children. These include a woman’s body mass index (BMI), her age at birth (in years), her education (in years), and a binary indicator of her hand-washing opportunities (coded as one if a place for handwashing with running water was available in the household, and zero otherwise).
We also include the squares of child’s age and breastfeeding duration to allow for the possibility that the relationship between HAZ and these time variables is nonlinear. This could be the case if, for example, households are, on average, better at providing nutrition for younger and older children compared to children in the middle range of ages in our sample, or if breastfeeding after a certain age is a less effective way of delivering nutrition. Including the squares of these terms allows the marginal effect of the variable in question to depend on the value of that variable as well as the estimated coefficients, so that if the relationship between HAZ and the variable changes across the variable’s range, we can detect that difference when we fit the regression.
At the household level, we account for several
underlying determinants. One is membership in the Dalit caste. While the caste system was officially abolished in Nepal 1962, evidence suggests continued discrimination, which may affect a child’s status in ways not captured by the other variables included at this level [
38]. We control for economic status via a wealth index, measured as the household’s quintile value on an index of wealth generated by DHS analysts applying weights to observed household assets using principal components analysis. Elsewhere, this has been used as a measure of household socioeconomic status [
17,
18,
33]. A substantial body of research suggests that economic wellbeing has a positive effect on children’s nutritional status and growth [
15,
31,
32,
39]. We also include indicators of water and fuel sources, the former in recognition of the importance of waterborne diseases to nutrition and health [
40,
41], and the latter in recognition of the potential importance of indoor air quality for upper respiratory health and child growth [
42,
43]. Indoor air pollution from tobacco smoke and the burning of biomass fuels is common in Nepal and have health effects with implications for child growth [
44,
45]. We therefore include an indicator for the type of fuel used (one if the household used biomass too cook; zero otherwise). We also include altitude (in meters above sea level) as a control variable. We expect altitude to control for multiple factors that could impact growth. Altitude and linear growth are likely to be negatively correlated due to remoteness, and also because the reduced oxygen content of air at altitude may impair growth [
46,
47].
We also incorporate community-level
basic determinants. Previous multilevel regression work on child mortality and stunting included distance to the nearest health facility, community-level rates of education attainment, and infrastructure [
29]. Our expectation is that omitting higher-level factors could lead to mistaken inference regarding point estimates on child- and household-level variables, and mask the importance of non-nutrition interventions of interest to policy makers. Recent work from Nepal, for example, demonstrates the importance of food markets in mitigating the effects of climate on linear growth [
10], and the role of transportation infrastructure in moderating food prices [
48] and explaining patterns of child growth [
49,
50].
All district-level variables are derived from either the NLSS or from Nepal census data. Because child and household-level food consumption variables are not available in the DHS, we measure the percentage of NLSS respondents who reported their food consumption within the last month as inadequate. Food shortages are determined at least partially by factors which affect all households in a district, such as weather, soil characteristics, and food prices. We also include a measure of market access (a commercialization ratio computed as the proportion of NLSS households in a district that reported selling some amount of their agricultural output). We include an indicator of access to healthcare (the median reported distance to the nearest hospital, in minutes on foot) and a measure of community-level hygiene (the percentage of Village Development Committees (VDCs) in a district that were declared open defecation free at the time of the survey). Finally, to control for overall social conditions, we include an ethnicity indicator (the percentage of a district’s population that belongs to a marginalized ethnic or caste group, calculated from census data), and a measure of gender equity (calculated from census data as the ratio of female students to total students in a district). Descriptive statistics for all variables are presented in Table
1. These statistics are included primarily for reference, but some summaries merit particular attention. First, we note the quite low average HAZ values, with a mean of − 1.88, implying that the average child is very close to the stunting cutoff, a fact that underscores the urgency of understanding undernutrition in this context. Average levels of maternal education are also extremely low, which is concerning given the importance of this variable in the literature. It is, however, worth noting that the average child is breastfed for about a year, approximately consistent with WHO guidelines, a positive outcome for this particular period in children’s lives.
Table 1
Descriptive statistics for all variables used in the regressions
Child level (n = 7572) |
HAZ | Standard deviations | −1.88 | 1.35 | −5.96 | 4.59 |
WHZ | Standard deviations | −0.79 | 1.08 | −4.94 | 4.07 |
Age | Months | 30.0 | 17.1 | 0 | 59 |
Age2 | Months2 | 1193 | 1054 | 0 | 3481 |
Twin status | 0/1 indicator | 0.01 | 0.10 | 0 | 1 |
Female | 0/1 indicator | 0.49 | 0.5 | 0 | 1 |
Breastfeeding | 0/1 indicator | 59.3% | 49.1% | 0 | 1 |
Breastfeeding duration | Months | 12.1 | 14.5 | 0 | 59 |
Breastfeeding duration2 | Months2 | 356 | 605 | 0 | 3481 |
Fever in past two weeks | 0/1 indicator | 0.19 | 0.39 | 0 | 1 |
Diarrhea in past two weeks | 0/1 indicator | 0.13 | 0.34 | 0 | 1 |
Mother’s education | Years | 2.8 | 3.8 | 0 | 14 |
Hand washing access | 0/1 indicator | 0.62 | 0.48 | 0 | 1 |
Mother’s BMI | BMI value | 20.6 | 2.7 | 14.0 | 36.9 |
Mother’s age at birth | Years | 24.9 | 5.9 | 13 | 47 |
Household level (n = 5450) |
Wealth | Quintile (1–5) | 2.7 | 1.4 | 1 | 5 |
Water purification | 0/1 indicator | 0.13 | 0.34 | 0 | 1 |
Altitude | Meters | 836 | 730 | 46 | 3189 |
Ethnicity (Dalit) | 0/1 indicator | 0.163 | 0.37 | 0 | 1 |
Biomass fuel use | 0/1 indicator | 0.87 | 0.34 | 0 | 1 |
District level (n = 75) |
Food short | % of households | 26.6% | 18.7% | 0.0% | 91.7% |
Educational Equity | % girls in schools | 48.7% | 3.3% | 37.7% | 53.8% |
Marginal | % of households | 47.0% | 19.1% | 6.1% | 85.7% |
Commercial sales | % of households | 45.1% | 20.9% | 0.0% | 91.7% |
Hospital distance | minutes by foot | 403 | 642 | 5 | 3600 |
Open defecation prevalence | % VDC’s ODF free | 11.1% | 21.9% | 0% | 100% |
Merging data from different surveys conducted over different time frames, as we do here, is not ideal, but given the limited availability of data, and the fact that the DHS does not include the data we need to relate child growth to local the social and economic conditions we emphasize, it is necessary. Certain factors mitigate concerns about this approach, however. First, we note that districts in Nepal are quite small compared to the top-level subnational administrative units in other countries; as of the 2011 census, the most populous district by far was Kathmandu, with around 1.7 million residents, a population scale more comparable to Indian districts or U.S. counties than to states in either country. At this scale, we are confident that measures of the local conditions we emphasize are relevant for children’s nutritional outcomes, and while we would prefer to use data at the village or municipality level, the data necessary to do this are, to our knowledge, either nonexistent or inaccessible. In a nationally representative survey like the NLSS, we expect sample means and medians at the district level to act as reasonably good estimators of the population analogs, and we restrict our analysis to measures of central tendencies of variables, which should reflect general social and economic conditions. We therefore expect that, while our approach may introduce noise, it is unlikely to introduce bias. To test this conjecture, we conducted Kolmogorov-Smirnov tests comparing residuals from regressions which include only variables derived from the DHS to residuals from regressions which include the district data. If the non-DHS variables were systematically correlated with the residuals, we would see differences between these distributions. We fail to reject the null hypothesis of no difference in all cases at the 95% confidence level, however.
Empirical strategy
Using multilevel models for z-scores has conceptual and technical advantages. When the level of observation at which the dependent variable occurs is nested within other levels—for example children nested in households and districts—including higher-level characteristics as child-level predictors can lead to the misstatement (generally understatement) of standard errors, as one value will be replicated across all members of the same group. With a multilevel model, the value is applied once, at the group level, and information from the pooled regression can help generate reliable estimates even for groups with very low numbers of first-level observations [
51]. Using multilevel models also allows us to include error terms at each level, which makes it possible to track changes in variance at each level across models. Taken together, these properties give multilevel models a substantial advantage over classical regression models when dealing with hierarchically structured data, like those analysed here [
15].
The specific form of our multilevel regression models is given by eqs. (
2,
3, and
4):
$$ {Z}_i={\alpha}_{jk}+\beta {X}_i+{e}_i\kern1.25em i=1,\dots, I $$
(2)
$$ {\alpha}_{jk}={\gamma}_0^j+{\gamma}_k+{e}_j\kern0.1em \mathrm{for}\kern0.1em j=1,\dots, J,k=1,\dots, K $$
(3)
$$ {\gamma}_k={\lambda}_0^k+{\lambda}_k{D}_k+{e}_k\kern0.1em \mathrm{for}\kern0.2em k=1,\dots, K $$
(4)
where
Zi is the z-score for child
i in household
j in district
k,
αjk and
β are intercept and coefficient vectors for individual-level variables
Xi,
\( {\gamma}_0^j \) is a household-specific intercept, and
γk are district-level intercepts, each of which is a function of district-level variables
Dk,district-level coefficients
λk, and the district-level intercepts
\( {\lambda}_0^k \). Finally,
ei,
ej, and
ek are error terms at each level. In this specification,
αjk does not vary in household characteristics, but including a household level allows us to estimate household intercept terms and variance components. The expanded variance terms allow us to account for variance arising at child, household and district levels. We model a child’s z-score as a function of variables specific to the child (including characteristics of the mother and household). We model variance at the district level as a function of district-level variables. We account for household-level variance, but given the low ratio of children under age five to households, the dataset does not support inclusion of separate household-level covariates at the household level.