The Differential Impacts of Human Capital and Infrastructure on the Sustainable Development Goals

ABSTRACT:   

This study looks at country-level data to explore the dynamics among human capital, infrastructure, and a country’s progress toward the United Nations Sustainable Development Goals (SDGs). Utilizing the confirmatory factor analysis method, I develop a new Infrastructure Index and combine it with the World Bank’s dataset on Human Capital Index to evaluate the relative impact of these factors on a country’s SDG scores. My findings affirm the integral roles of both human capital and infrastructure in the sustainable development context. However, a stronger correlation between human capital and the SDG Index suggests that policymakers seeking to advance the sustainability agenda should prioritize investments in human capital over infrastructure. Moreover, the study uncovers nuanced relationships between these indicators and specific SDGs. Human capital has a significant association with SDG 5 (Gender Equality), whereas infrastructure does not. Both human capital and infrastructure affect SDG 1 (No Poverty), with no statistical difference between their effects. Interestingly, while human capital correlates more strongly with SDG 13 (Climate Action), this relationship is negative due to the larger carbon footprint of more developed economies. These findings can inform policy decisions for goal-specific sustainable development strategies.

I. INTRODUCTION: 

The central framework in the global development agenda is based on the 2030 Agenda for Sustainable Development, which “provides a shared blueprint for peace and prosperity for people and the planet, now and into the future.” It is undersigned by all UN Member States. Hundred-ninety-one countries have committed to achieving measurable progress on these goals by 2030. The Agenda constitutes seventeen interlinked Sustainable Development Goals (SDGs) that encompass a very wide variety of objectives. The seventeen SDGs are broken down into hundred-sixty-nine targets and two-hundred-thirty-two indicators to measure progress.

Measuring progress

One of the challenges in the SDG framework is measuring the progress in order to inform the policy. SDGs are successors to the Millennium Development Goals (MDGs), which consisted of 8 goals and 18 targets, 14 of which could be assessed quantitatively. MDGs were adopted in 2000, and all the countries from around the world committed to achieving these goals within 15 years. By the end of 2015, only three and a half of the 14 measurable targets were achieved. In 2023, we are at the half-way mark of the 2030 Agenda. According to the latest reports, the international community is behind schedule to achieving the SDG’s, partially due to the impact of the COVID-19.[1] In the given context, one of the most important questions is to find what policy interventions would be most effective to advance progress towards the SDGs.

What interventions are most effective?

Investments in both human capital and infrastructure are critical for achieving the sustainable development goals. These are both interdependent and complimentary domains in the international development space. However, policymakers working on specific developmental objectives are often forced to prioritize one over the other due to the limited nature of resources. This research analyzes country-level data from the United Nations and the World Bank to estimate the relationship between the overall SDG Index of a country and its performance on the Human Capital Index and Infrastructure Index. I will also examine the impact of human capital and infrastructure on SDG 1 (No Poverty), SDG 5 (Gender Equality), and SDG 13 (Climate Action). Below I provide more information about each one of the concepts analyzed in this research.

SDG Index

SDG Index is a composite indicator developed by the United Nations that weighs in the effects of development metrics across all the SDG metrics. It estimates countries’ performance on a scale from 0 to 100, and usually, Scandinavian countries, such as Finland, Denmark, Sweden, and Norway, achieve the highest rankings with scores > 80.[2] The 2022 Report includes the SDG indexes for 163 countries, among which the Central African Republic and South Sudan have the lowest scores, sub-40.

SDG 1: No Poverty

The first goal in the UN SDG framework calls to “end poverty in all its forms everywhere.” SDG 1 aims to ensure that everyone, regardless of their circumstances, has equal access to opportunities and resources for a quality life. It calls for comprehensive strategies to end poverty that include social protection systems and measures to build the resilience of the poor and those in vulnerable situations. The three main metrics of SDG 1 are: poverty headcount ratio at $1.90/day (%), poverty headcount ratio at $3.20/day (%), and poverty rate after taxes and transfers (%).

SDG 5: Gender Equality

Gender equality is fundamentally important for achieving the Sustainable Development Goals for several reasons. First, it is a matter of human rights. Everyone, regardless of gender, should have equal access to health, education, economic opportunities, and political representation. Second, gender equality is pivotal for economic growth, as women constitute half of the world’s potential human capital, and studies consistently show that societies that discriminate by gender tend to experience less economic growth and slower poverty reduction. The SDG 5: Achieve Gender Equality and Empower all Women and Girls incorporates the following metrics: the ratio of female-to-male mean years of education received (%), the ratio of female-to-male labor force participation rate (%), seats held by women in national parliament (%), gender wage gap (% of male median wage).[3]

SDG 13: Climate Action  

SDG 13 calls for immediate action to combat climate change and its impacts. The Goal underscores the critical need for the global community to address the pressing issue of climate change. Recognizing that climate change is not just an environmental issue but also a significant threat to social and economic development, this goal calls for urgent action to reduce greenhouse gas emissions, build resilience, and improve adaptive capacity to climate-induced impacts. The metrics of SDG 13 include CO₂ emissions from fossil fuel combustion and cement production (tCO2/capita), CO₂ emissions embodied in imports (tCO₂/capita), CO₂ emissions embodied in fossil fuel exports (kg/capita), Carbon Pricing Score at EUR60/tCO₂ (%, worst 0-100 best).[4]

Statistical Performance Index

The Statistical Performance Index (SPI) evaluates the performance of national statistical systems based on the aggregate of five pillars of statistical capacity: data use, data services, data products, data sources, and data infrastructure. The SPI is a weighted average of the statistical performance indicators.

Human Capital Index

Human capital is sometimes referred to as soft infrastructure.[5] Without thriving human capital, nations cannot achieve their development goals, highlighting its central role in international development. It is widely acknowledged that improvements in human capital lead to increased productivity, which in turn spurs economic growth. Education and health, the two main components of human capital, have a direct impact on a country’s development trajectory. In 2018, the World Bank developed the Human Capital Index as a metric to measure and evaluate the quality and potential of human capital in a country. The HCI enables policymakers to identify strengths, weaknesses, and areas for improvement in human capital development. The HCI is based primarily on three components:

  1. Child survival: This component considers that not all children survive to start formal education and looks at the under-5 mortality rate.
  2. Education: This section combines information on the quality and quantity of education. The number of years a child is expected to complete school by age 18, considering current enrollment rates, measures the quantity of education. The quality is assessed using harmonized test scores from international student achievement testing programs.
  3. Public health: This component uses two proxies for the overall health environment – adult survival rates (the percentage of 15-year-olds who will survive until age 60) and healthy growth among children under 5, measured by stunting rates.[6]

Infrastructure Index

According to the Merriam-Webster dictionary: Infra- means “below,” so the infrastructure is the “underlying structure” of a country and its economy, the fixed installations that it needs in order to function.”[7] Public infrastructure provides the basic physical systems and structures, such as water supply, sewers, electrical grids, roads, bridges, and telecommunications, among others. High-quality infrastructure ensures the provision of fundamental necessities, advances safety, and enhances the quality of life. Infrastructure also facilitates the exchange of reliable information, increases productivity, creates more job opportunities, and fosters overall economic growth.

Unlike the Human Capital Index, there is no internationally recognized index that would indicate the level of public infrastructure in a given country. The objective of the UN SDG 9 is to “Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation.”[8] However, for the purposes of this research, it is not the best pointer because it includes indicators, such as Expenditure on Research and Development, Female share of graduates from Science, Technology, Engineering, and Mathematics (STEM) programs, but does not include indicators for access to electricity, water supplies, etc. However, there are seven SDG indicators across four different sustainable development goals that are related directly to the public infrastructure:

IndicatorDescriptionSDG
1. Access to basic water servicesThe percentage of the population using at least a basic drinking water service, such as drinking water from an improved source, provided that the collection time is not more than 30 minutes for a round trip, including queuing.SDG 6: Ensure availability and sustainable management of water and sanitation for all
2. Access to basic sanitation servicesThe percentage of the population using at least a basic sanitation service, such as an improved sanitation facility that is not shared with other households.
3. Access to electricityThe percentage of the population who has access to electricity.SDG 7: Ensure access to affordable, reliable, sustainable and modern energy for all
4. Adult population with bank accountsThe percentage of adults, 15 years and older, who report having an account (by themselves or with someone else) at a bank or another type of financial institution, or who have personally used a mobile money service within the past 12 months.SDG 8: Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all
5. Internet penetration The percentage of the population who used the Internet from any location in the last three months. Access could be via a fixed or mobile network.SDG 9: Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation
6. Transportation systems  The percentage of the surveyed population that responded “satisfied” to the question “In the city or area where you live, are you satisfied or dissatisfied with the public transportation systems?”.SDG 11: Make cities and human settlements inclusive, safe, resilient and sustainable

Hypotheses:

The question driving this research is to find the differences in the effects of human capital and infrastructure on SDG scores. So, I have constructed the following hypotheses:

H0:There is no statistical difference in the effects of Human Capital and Infrastructure on SDG Index
H1:There is a statistical difference in the effects of Human Capital and Infrastructure on SDG Index
H2:There is a statistical difference in the effects of Human Capital and Infrastructure on SDG 1: No Poverty
H3:There is a statistical difference in the effects of Human Capital and Infrastructure on SDG 5: Gender Equality
H4:There is a statistical difference in the effects of Human Capital and Infrastructure on SDG 13: Climate Action

II. METHODS

Merging the data sets

I merge the World Bank Human Capital Index and the UN Sustainable Development 2022 datasets with the Country name as the unique identifier. When I drop the rows with missing HCI Index or the SDG Index values, the number of entries in my data frame reduces from 201 to 141. Part of the reason is that UN SDG data also includes geographic Regions (such as “East and South Asia” or “Latin America and the Caribbean”) and Income categories (such as “Low-income Countries” or “Upper-middle-income Countries”) under the Country variable. With that being said, there are also missing values in both data sets. Nonetheless, we still have 141 complete data rows, which is sufficient for us to proceed with our analysis.   

Factor Analysis  

Public infrastructure is a broad concept which we cannot easily observe and measure. In statistical terms, it is a latent variable, which refers to “concepts that cannot be measured directly but can be assumed to relate to a number of measurable manifest variables.”[9] I use the factor analysis technique, which allows me to account for various dimensions of the public infrastructure (such as water, electricity, internet, etc.) and output one variable. Factor Analysis is often used for constructing a new index, as it explores and uncovers the underlying relationships between observed manifest variables and unobserved latent variables.

KMO Test

The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is a statistic that indicates the proportion of variance in the variables. The KMO values range from 0 to 1, with higher values indicating a better fit for factor analysis. The individual KMO values for each variable tell us how well each variable fits with all the others. Variables with a KMO less than 0.5 might not be suited for factor analysis as they do not correlate well with the other variables. As we see from the below output, the MSA values of all my variables are 0.8 or above, which brings the overall MSA score to 0.87, which is a positive sign. 

Kaiser-Meyer-Olkin (KMO) Test results

Model 1

So, I keep all six manifest variables to construct a model that will estimate the infrastructure index. In the first model, the Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) are both above the 0.95 threshold, which indicates an appropriate fit. However, the Root Mean Square Error of Approximation is 0.099, above the maximum threshold of 0.08.

Infrastructure Index Model 1: fit estimates

Changing model specifications

Since the fit of the first model is not satisfactory, I change the specifications of the model based on the modification indices and theoretical considerations. Modification indices are a measure of how much the overall model chi-square would be expected to decrease if a particular parameter were freely estimated in the model. In other words, it provides a suggestion on how to improve the model fit. If we look at the modification indices between our variables, we notice that the relationship between sdg8_bank and sdg9_internet is disproportionately stronger than any other in our data. Most likely due to a strong correlation between the indicator for internet penetration and the percentage of the adult population who have bank accounts.

Modification Index table (descending order)

Model 2

So, I add a special path to my model, which accounts for the dependency between ‘SDG 8 bank accounts’ and ‘SDG 9 Internet’. I also create a path between ‘SDG 6 Water’ and ‘SDG 6 Sanitation’ because academic literature dictates that there is usually a strong dependency between these two variables. When I check the fit indices, the new model with two special paths performs much better than the previous one. Both the CFI and TLI values are > 0.99, and the RMSEA has decreased to 0.056.

Infrastructure Index Model 2: fit estimates

The Infrastructure Index

I use the model to estimate the infrastructure index for all 141 countries in our dataset. I also use the min-max normalization technique to transform the index into a new scale from 0 to 1. This method scales the values by subtracting the minimum value and dividing by the range of the original values (i.e., the difference between the maximum and minimum values). So, they maintain the same variance and proportions but on a scale from 0 to 1.

Estimating the Impact on the SDG Index 

Both the Human Capital Index and the Infrastructure Index are ratio-level measures. When both independent variables are ratio level measures, “regression and correlation analysis are the standard techniques for measuring relationships and testing hypotheses.”[10] My main hypothesis is to test whether human capital or infrastructure makes a bigger impact on the overall SDG Score of a country. So, I construct a multivariate regression model with HCI and the Infrastructure Index as independent variables and then explore the beta coefficients of the model to understand which index has a stronger effect on the SDG Score.

III. RESULTS

Summary of the Multivariate Regression Model

Multivariate Regression Model

Besides the Human Capital Index and Infrastructure Index, I also have the Statistical Performance Index as an explanatory variable in my model. As mentioned earlier, it helps to account for some of the possible shortcomings in the data. We can see that all three variables and the model have a very high level of statistical significance, with p = 0. The R-squared value is not very important for us because we are looking at a descriptive model versus a predictive model. However, in any case, the multiple R-squared value is 0.91, which means that approximately 91% of the variability in the outcome variable can be explained by the predictor variables.

Model assessment: regression diagnostics

1. Test for Linearity

Before I proceed further with my analysis of the findings, I need to test the assumptions to validate that a linear regression model was a suitable approach. First, I look for linearity and equal variance in the below Residuals vs Fitted plot. Upon visual examination, there are no substantial deviations in the red line, which confirms that the relationship is linear between our explanatory and response variables.

2. Test for homoscedasticity

In the below plot, we can also observe that the vertical spread of the residuals is equally distributed, which means the error term does not vary much as values of the outcome variable change. So, our model passes the test for homoscedasticity as well.

3. Testing for Independence of residuals

Based on the observations from the below “Residuals vs Leverage” plot, our model passes the test of independence of residuals as well. Large residual values on this plot would suggest that the model is not explaining some aspects of the data. Our model does not have any standardized residual values above 1. In R programming language, I double-checked and confirmed no observations with Cook’s distance value above 1.

4. Testing for Normality of the error distribution

We can tell whether the error terms are normally distributed based on the observations from the below Q-Q plot. We want the residuals to be as close to the diagonal line as possible. However, generally, we rarely have real data where errors are perfectly normally distributed. So, some deviations are expected, and overall, it seems like our model passes the normality test. However, to double-check, I also apply the Shapiro-Wilk test.

The null hypothesis for the Shapiro-Wilk test is that the data is normally distributed. In this case, the p-value for the Shapiro-Wilk test is way above the significance level, which means that we cannot reject the null hypothesis, and the data is normally distributed.

5. VIF Score

Last but not least, since we are dealing with a multiple linear regression model, we need to make sure there is no multicollinearity. So, we apply the VIF Score test. “A rough rule of thumb is that variance inflation factors [VIF] greater than 10 give some cause for concern.” (Vehklahti p.93) As we can see from the below table, the VIF scores for all three of our independent variables are below 5. These scores indicate some multicollinearity but are safely within an acceptable range.

VIF Scores:

Beta coefficient analysis

After we have confirmed that model meets all 5 assumptions of a multivariate regression model, we can proceed with the analysis of the model. In order to estimate the impact of each individual variable on the SDG Index, we can look at the beta coefficients. The standardized beta coefficients allow us to compare the effects of the variables on the same scale, regardless of the units of measurement. Below are the beta coefficients of our linear multivariate regression model. We notice that the beta coefficient for hci_ind (Human Capital) is larger than the coefficient for infr_ind (Infrastructure). This suggests that Human Capital has a stronger impact on the output variable, the SDG Index.

Beta coefficients of the Multivariate Regression model

However, we also need to make sure the difference between the two beta coefficients is statistically significant. I run the below linear hypothesis test, which is based on the null hypothesis that there is no difference between the effects of the two indices: hci_ind and infr_ind.

Linear hypothesis test

The associated p-value (Pr(>F) = 0.0001545) is far below 0.05, indicating strong evidence to reject the null hypothesis that the coefficients for hci_ind (Human Capital Index) and infr_ind (Infrastructure Index) are the same. So, the data provides strong evidence that the effect of Human Capital on the sdg_ind (SDG Index) is different from the effect of Infrastructure (infr_ind) on sdg_ind.

Next, I explore the relationship between Human Capital Index, Infrastructure Index and specific Sustainable Development Goals: SDG 1: No Poverty; SDG 5: Gender Equality; and SDG 13: Climate Action. I construct a multivariate multiple regression model with three left-hand variables, indicators for SDG 1, SDG 5, and SDG 13.

Response SDG 1:

Based on the initial observation of the model summary, we can conclude that both human capital and infrastructure have a significant effect on poverty. However, we will need to explore further if there is a statistical difference between the effects of the two variables. Upon closer examination of the two beta coefficients, we find no statistically significant difference between the effects of the two explanatory variables.

Linear hypothesis test

Response SDG 5:

When we look at the response of SDG 5, we notice that Human Capital Index has a statistically significant impact on SDG 5, whereas Infrastructure Index does not. The value of the coefficient magnitude for hci_ind (52.15) is also larger than the coefficient for infr_ind (-8.30). Based on these observations, we can conclude that there is a statistical difference in the effects of human capital and infrastructure on SDG 5.

Response SDG 13:

The summary of the response to SDG 13 suggests that, once again, infrastructure does not have a statistically significant effect, but the impact of human capital is significant. So, we can claim that human capital has a statistically more significant effect on SDG 13 Climate action. However, we should also note that the coefficients are negative, which means there is a negative correlation between human capital and SDG 13. This is consistent with the basic correlations of the indicators in our dataset (please, see the Correlation Matrix table). I discuss these findings further in the conclusion.

Correlation Matrix

IV. CONCLUSION

Our findings confirm once again that both human capital and infrastructure are essential for the sustainable development of countries. They are both fundamentally important factors foretelling a country’s level of development. With that being said, based on our results, we can reject the Null Hypothesis that there is no statistical difference in the effect of human capital and infrastructure on a country’s SDG Index. The statistical analysis suggests a stronger inter-dependency between human capital and the SDG Index than with infrastructure. So, policy-makers facing the dilemma of choosing between investments in human capital and infrastructure should prioritize human capital if their goal is to advance the overall sustainable development agenda in the country.

However, we also found that Human Capital Index and the Infrastructure Index may have different levels of impact on specific objectives within the UN SDG framework. We discovered that human capital is a statistically significant indicator of a country’s performance on SDG 5: Gender equality, whereas infrastructure is not. We also established that while both indicators have a significant impact on a country’s performance on SDG 1: No poverty, there is no statistically significant difference between the effects of human capital and infrastructure on the poverty levels of a country. Last but not least, we figured that compared to infrastructure, there is a stronger inter-dependency between human capital and SDG 13: Climate action. However, there is a negative correlation between human capital and a country’s performance on climate indicators. This should not come as a surprise because developed countries with higher Human Capital Indexes produce far more carbon footprint than developing countries.[11] It is another reminder that developed countries should transition to more sustainable solutions.  

V. WORKS CITED

Guterres urges countries to recommit to achieving SDGs by 2030 deadline. (2023, April 25). UN News. https://news.un.org/en/story/2023/04/1136017

Johnson, J. B., & Joslyn R. A. (1991). Political Science Research Methods: Second Edition. Congressional Quarterly Inc.

Merriam-Webster. (n.d.). Infrastructure. In Merriam-Webster.com dictionary. https://www.merriam-webster.com/dictionary/infrastructure

Sachs, J.D., Lafortune, G., Kroll, C., Fuller, G., Woelm, F. (2022). From Crisis to Sustainable Development: the SDGs as Roadmap to 2030 and Beyond. Sustainable Development Report 2022. https://dashboards.sdgindex.org/downloads

The Investopedia Team. (2023, February 7). Infrastructure: Definition, Meaning, and Examples. Investopedia. https://www.investopedia.com/terms/i/infrastructure.asp

World Bank Group. (2023). The Human Capital Project: Frequently Asked Questions. In World Bankhttps://www.worldbank.org/en/publication/human-capital/brief/the-human-capital-project-frequently-asked-questions

The World Bank Group. (2020, September 23). Data Catalog. Human Capital Index. https://datacatalog.worldbank.org/search/dataset/0038030 

The world’s top 1% of emitters produce over 1000 times more CO2 than the bottom 1% – Analysis – IEA. (n.d.). International Energy Agency. 

https://www.iea.org/commentaries/the-world-s-top-1-of-emitters-produce-over-1000-times-more-co2-than-the-bottom-1

United Nations. (n.d.). The 17 goals: Sustainable Development. United Nations. https://sdgs.un.org/goals

Vehkalahti, K., & Everitt, B. S. (2020). Multivariate Analysis for the Behavioral Sciences: Second Edition. CRC Press.


[1] Guterres urges countries to recommit to achieving SDGs by 2030 deadline. (2023, April 25). UN News. 

[2] Sachs, J.D., Lafortune, G., Kroll, C., Fuller, G., Woelm, F. (2022). From Crisis to Sustainable Development: the SDGs as Roadmap to 2030 and Beyond. Sustainable Development Report 2022.

[3] United Nations. (n.d.). The 17 goals: Sustainable Development. United Nations. https://sdgs.un.org/goals

[4] Ibid

[5] The Investopedia Team. (2023, February 7). Infrastructure: Definition, Meaning, and Examples. Investopedia.

[6] World Bank Group. (2023). The Human Capital Project: Frequently Asked Questions.

[7] Merriam-Webster. (n.d.). Infrastructure. In Merriam-Webster.com dictionary.

[8] United Nations. (n.d.). The 17 goals: Sustainable Development. United Nations.

[9] Vehkalahti, K., & Everitt, B. S. (2020), p. 295

[10] Johnson, J. B., & Joslyn R. A. (1991), p. 319.

[11] The world’s top 1% of emitters produce over 1000 times more CO2 than the bottom 1% – Analysis – IEA. (n.d.). International Energy Agency

COVID-19, Chat-GPT and International Development Assistance

The COVID-19 pandemic created an incredibly challenging situation in the international development space. On the one hand, developing countries were going through an extraordinary crisis, which increased the global demand for international development assistance. On the other hand, COVID-19 slowed down the economies of the donor countries, which faced new socio-economic challenges at home. This article is a summary of statistical research that looks at how the Gross National Income of the Organization for Economic Co-operation and Development (OECD) member states changed between 2018 and 2021 and correlates it with the amount of money they invested in Official Development Assistance (ODA) during this period. The statistical analysis shows that despite economic challenges and increased demand for social assistance at home, OECD countries actually increased their international development assistance spending.

Artificial intelligence chatbot, ChatGPT, was first released on November 30, 2022, and that was right around the time I was working on this research. So, I registered on the ChatGPT website and asked it to write a paragraph about the impact of Covid-19 on official development assistance. Within 5 seconds, ChatGPT wrote a persuasive paragraph arguing that “many donor countries have faced economic challenges and have had to redirect funds towards domestic priorities, such as healthcare and support for businesses and individuals affected by the pandemic.” It concluded that “ODA funding has been reduced, which has had a detrimental effect on the ability of recipient countries to address development challenges and achieve their development goals.” The text was fluid, logically coherent, and not plagiarized. However, the analysis of actual data showed that despite its sound reasoning, ChatGPT was wrong.  

I collected all the data from the website of the Organization for Economic Co-operation and Development (OECD), which is an international forum that brings together 38 of the world’s most economically advanced countries to exchange best practices, tackle common problems and contribute to global peace and development. OECD countries account for 18% of the world population but 63% of the global GDP and about 95% of the Official Development Assistance. I gathered data for a four-year period from 2018 to 2021 and looked at mainly two indicators: 1. data on the Gross National Income, as it is one of the most important indicators of a country’s economic performance, and 2. ODA represents the amount of money a donor country spends on international development assistance. 

Between 2020 and 2019, the GNI of OECD countries shrank by $1,3 trillion, which means an average of $34 billion reduction per country. However, contrary to GNI, ODA spending of the OECD countries increased by $6,6 billion, which breaks down to a $193 million average per country. Consistent with these findings, the average ODA, as a percentage of GNI, increased from 0,373% in 2019 to 0,399% in 2020.      

Conclusion 1 

So, the statistical analysis showed that despite the negative impact of COVID-19 on their domestic economies, the OECD countries increased the amount of money spent on Official Development Assistance. This analysis leaves us with a positive and encouraging message that in times of need, the international community is able to come together and take action for the common good beyond national borders. 

Conclusion 2 

It is also another reminder about the limitations of large language models, such as Chat GPT. There are several generative AI models which can produce somewhat original text, images, and other data, based on a statistical analysis of information on the internet. For example, Chat GPT, the most successful of these models, was trained on 570GB of information (approximately 385 million pages on Microsoft Word), allowing it to generate human-like seamless outputs within seconds. However, due to associated costs, the model is pre-trained, which means there is a cut-off date for source information. In addition, all the misinformation, fake news, and biased texts found online are also fed into training the model. The engineers at Open AI are taking measures to tackle the issue of misinformation, but that is a tall order, even if you have the best intentions. So, while humans can investigate and find the truth, it is a much more elusive task for pre-trained AI models. 

P.S. Most of the OECD countries still need to catch up to their commitment to spend 0.7% of their GNI on development assistance.

The Evidence-Based Policymaking Act and Privacy

Abstract:

The legislative act on The Foundations of Evidence-Based Policymaking created a framework for the centralization of statistical information collected by dozens of US federal agencies across the country and imposed responsibilities for sharing that data within the government, as well as with researchers and private entities. One of the main outcomes of the act is expected to be a National Secure Data Service, which will promote collaboration, help to avoid duplication, and minimize public expenditure on data collection and processing. Most importantly, it will improve the government efficiency by restructuring the national statistical ecosystem to better inform policy decisions. However, the centralization of federal data foretold by EBPA creates new privacy risks and vulnerabilities, which is why in the 1960s similar idea of a National Data Center was rejected in Congress. Back then, the debate around data centralization ended with the passing of the Privacy Act of 1974. A semi-century later, the data centralization idea has been approved, but no changes were made to the privacy legislation. This paper argues that, while EBPA is a positive step forward, it needs additional privacy safeguards that could be provided by revising the Privacy Act of 1974, which was last updated in 1988.    

1. Background

This section looks at why the government collects data, how its institutional and technical capacity to process data has changed over time, and its consequential impact on the public debate around privacy. 

1.1 Purpose  

The corpse idea behind the Foundations of Evidence-Based Policymaking Act of 2018 (EBPA) is to create metrics for analyzing the government’s policy decisions and thus improve the federal government’s effectiveness. According to title 44 of the U.S. Code, the term evidence means “information produced as a result of statistical activities conducted for a statistical purpose” (44 USC 3561: Definitions). However, not all statistics are the same, and relying on bad data can do more harm than good. So, EBPA intends to increase not only the quantity of data supplied for informing policy decisions but also the quality.   

1.2 Why do governments collect data?

The collection of certain data types is essential for a government to carry out its basic functions. As far back as five-six thousand years ago, ancient governments in Babylonia and Egypt collected some primitive forms of census data. Early governments needed the census data mainly for taxation and military recruitment. However, with the emergence of democratic states, census data became a crucial element of political representation. In the United States, holding a decennial census is embedded in the Constitution. Article 1, Section 2 of the Constitution mentions that “the actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States and within every subsequent Term of ten Years” (The National Constitution Center). The Nation’s Founders intended to equally divide the seats in Congress among the States and their populations. The initial benchmark was one Congress representative for every 30 thousand residents (Gauthier). (Today, that number hovers around 700 000) Consequently, census data was necessary to advance democratic governance.

1.3 Changing capacity to process data

However, producing quality data did not come so easily, as it requires institutional capacity building, training of professional staff, a certain level of public awareness, and resources to provide for all this. The first census in the U.S. took place in 1790 and counted the total population as 3,929 214 (A Timeline of Census History). Then, both President George Washington and Secretary of State Jefferson expressed skepticism and thought it was undercounted.  Until 1840, State Secretaries were put in charge of organizing the decennial census, which was a temporary assignment. In 1849, Congress established a census board to oversee data collection, and the responsibility for census data shifted from the Department of State to the Department of Interior (DOI). And only in 1902, the Census Bureau became a permanent agency under the DOI (A Timeline of Census History).

This gradual shift from a temporary ad-hoc group of amateurs to a permanent government bureaucracy happened parallel to the increasing complexity of census operations and the government’s growing demand for quality data. It is noteworthy that two other federal statistical agencies were established before the Census Bureau. One is the National Center For Education Statistics, founded in 1867, and the other is the U.S. Bureau of Labor Statistics, founded in 1884. Today, overall, the U. S. has thirteen principal Federal Statistical Agencies and more than 90 federal organizations that engage in statistical activities. Please, see Table 1 for the full list of thirteen principal statistical agencies in the U.S.   

Table 1: 13 Principal Federal Statistical Agencies in the U.S.:

AgencyGoverning bodyFounded
Bureau of Economic AnalysisDepartment of Commerce1972
Bureau of Justice StatisticsDepartment of Justice1979
Bureau of Labor StatisticsDepartment of Labor1884
Bureau of Transportation StatisticsDepartment of Transportation1992
Census BureauDepartment of Commerce1903
Economic Research ServiceDepartment of Agriculture1961
Energy Information AdministrationDepartment of Energy1977
National Agricultural Statistics ServiceDepartment of Agriculture1961
National Center for Education StatisticsDepartment of Education1867
National Center for Health StatisticsDepartment of Health and Human Services1960
National Center for Science and Engineering StatisticsIndependent1950
Office of Research, Evaluation and StatisticsSocial Security Administration1935
Statistics of IncomeDepartment of Treasury1916

One of the forces driving the increasing demand for quality data was the transition of the U.S. to a welfare state. By definition, a welfare state means “a state that is committed to providing basic economic security for its citizens by protecting them from market risks associated with old age, unemployment, accidents, and sickness” (Weir). In order to efficiently allocate resources and provide targeted assistance, the state needed more complex and accurate databases on individual citizens. A turning point became the Social Security Act of 1935, which was part of the New Deal, a series of government programs in response to the Great Depression. At the time, the United States was the only modern industrial country that did not have a social security system.

One of the main provisions of the Social Security Act was the creation of the Social Security Number (SSN), which assigned a unique 9-digit number to every U.S. citizen, as well as a permanent and temporary resident. Over time the social weight and public perception of the SSN changed. Carolyn Puckett, working for the Office of Research, Evaluation, and Statistics at the Social Security Administration, wrote in 2009 that “created merely to keep track of the earnings history of U.S. workers for Social Security entitlement and benefit computation purposes, it. [SSN] has come to be used as a nearly universal identifier” (Puckett). It turned into the primary method for public services to identify citizens and organize the individual records.

1.4 Changing perceptions

As the number of entries about citizens started going up in the following years, various concerns about its privacy implications started emerging. In her 2018 book “The Known Citizen: A History of Privacy in Modern America,” Sarah E. Igo, Professor of History and the Dean of Strategic Initiatives for the College of Arts and Science at Vanderbilt University, writes that with the passage of the Social Security Act of 1935, “questions about how thoroughly the state ought to know its own people became less theoretical” (Igo, p. 57). Professor Igo writes that until the 1930s, public perception was that the government tracked only the troubled citizens and marginal communities to maintain public order. However, in the New Deal era, the government’s administrative tracking captured even more privileged citizens, and “being known to the government” became “increasingly constitutive of citizenship itself: a necessary exchange for steady employment, increased economic security, and free movement across borders” (Igo, p. 56).

Initial public reactions to the newly instituted Social Security programs were largely positive, especially during the years of World War II, when it enabled the government to efficiently identify and provide assistance for war veterans and wounded warriors. Some people went even as far as tattooing their social security numbers on their bodies to make sure they would not forget their nine digits. However, in the following decades, especially as the economic crisis and war waded into history, public debate about government databases shifted into a new phase. On the one hand, some government bureaucrats and social scientists believed that increasing public data records’ quantity and quality would lead to more efficient social and economic policies. On the other hand, many civil society activists and legal scholars were voicing concerns that swelling volumes of databases on citizens was an invasion of privacy.

1.5. The National Data Center

In this context, the story of the failed National Data Center in the 1960s is especially noteworthy and extremely relevant to the debate around the Evidence-Based Policy Act adopted in 2018. It started with a request from a group of social scientists, who in 1965 “recommended that the federal government develop a national data center that would store and make available to researchers the data collected by various statistical agencies” (Kraus, p. 1). The ensuing political turmoil is captured very eloquently in the “Statistical Déjà vu: The National Data Center Proposal of 1965 and Its Descendants” paper Rebecca Kraus wrote in 2013. On one side, some social scientists believed that “government programs designed to address social issues, such as civil rights, housing, employment, welfare, education, and poverty” could be improved if the academic community had access to the public data generated by the federal government (Kraus, p. 4). On the other hand, privacy advocates were concerned about the potential risks and vulnerabilities such a center would create. The proposal of the National Data Center lost fume in 1970, when the Bureau of the Budget, which led the research behind it, was reorganized into the Office of Management and Budget.[1]

2. The Commission

2.1 Formation of the Commission

In March 2016, Speaker of the House Paul Ryan and Senator Patty Murray put forward the bipartisan Evidence-Based Policymaking Commission Act of 2016, which President Barack Obama signed within the same month. It laid the foundation for the establishment of the U.S. Commission on Evidence-Based Policymaking (CEP), directed to “consider how to strengthen government’s evidence-building and policymaking efforts,” as well as “study how the data that government already collects can be used to improve government programs and policies,” and present its findings and recommendations to the Congress and the President.  

2.2 Bipartisan initiative

It is worth underlining the bipartisan nature of this initiative. Two congress leaders, Democratic Senator from Washington State Patty Murray and Republican Speaker of the House of Representatives from Wisconsin Paul Ryan, had established good relations back in 2013 when they achieved breakthrough success with the Bipartisan Budget Act of 2013. The bill allowed Congress to avert a government shutdown and, in the long run, to save close to $23 billion. Patty Murray and Paul Ryan had made only small compromises to achieve the breakthrough, and both were applauded for the ensuing agreement. Three years later, they built on this success and initiated the CEP. During the introduction of the commission’s findings, Senator Murray said that “No matter what side of the aisle you’re on, we should all agree that government should work as efficiently as possible for the people it serves” (U. S. Senator Patty Murray). At the same time, Ryan Paul remarked that “Patty and I have long advocated for a way to better measure the federal government’s effectiveness—and this bill puts those efforts into action” (U. S. Senator Patty Murray).

2.3 Composition of the Commission

Consequently, the Commission was comprised of individuals who did not have strong political affiliations. They were mostly academics, some with prior experience in the federal government, one current employee of the U.S. Office of Management and Budget, and three from the private sector. Two of the commission’s fifteen members are well-known privacy advocates: Paul Ohm, a Professor of Law at the Georgetown University Law Center, and Latanya Sweeney, Professor of the Practice of Government and Technology at the Harvard Kennedy School. They are both well recognized for their research and publications on privacy law and policy. Paul Ohm’s position is that “data can be either useful or perfectly anonymous but never both.” Latanya Sweeney was a graduate student at the Massachusetts Institute of Technology in 1997 when she reidentified the Massachusetts Governor Bill Weld connecting his publicly accessible records to his anonymized medical records (Meyer). This made a big public impact and led to new legal restrictions on the disclosure of protected health information under the Health Insurance Portability and Accountability Act, known as HIPAA. So, within the CEP, Ohm and Sweeney advocated for additional frictions in accessing the government databases and adding layers of privacy protections.

Table 2: Members of the U.S. Commission on Evidence-Based Policymaking

 NameAffiliation
  1Commissioner and Chair Katharine G. Abraham  University of Maryland
  2Commissioner and Co-Chair Ron Haskins  Brookings Institution
 Commissioners: 
3Sherry GliedNew York University
4Robert M. GrovesGeorgetown University
5Robert HahnUniversity of Oxford
6Hilary HoynesUniversity of California, Berkeley
7Jeffrey LiebmanHarvard University
8Bruce D. MeyerUniversity of Chicago
9Paul OhmGeorgetown University
10Nancy PotokU.S. Office of Management and Budget
11Kathleen Rice MosierFaegre Baker Daniels, LLP
12Robert SheaGrant Thornton, LLP
13Latanya SweeneyHarvard University
14Kenneth R. TroskeUniversity of Kentucky
15Kim R. WallinD.K. Wallin, Ltd.

However, on the other side of the debate were social scientists who believed that access to more data would improve both the quality of academic research and the efficiency of the government’s public policy. Consequently, there were many heated debates within the commission. The CEP held its first meeting in July 2016 and presented its final report in September 2017. During this period, they surveyed 209 Federal offices that work with evidence (data), invited 49 witnesses, held meetings with 40 organizations, hosted three public hearings, and reviewed comments from 350 respondents in the Federal Register (Bipartisan Policy Center). When the time came, they were able to present a final document that was undersigned unanimously by all commission members.

2.4 Recommendations

The final report of the commission, titled The Promise Of Evidence-Based Policymaking, was presented to the public on September 7, 2017. It included 22 specific recommendations that fell under four categories: 1. Improving Secure, Private, and Confidential Data Access; 2. Modernizing Privacy Protections for Evidence Building; 3. Implementing the National Secure Data Service; 4. Strengthening Federal Evidence-Building Capacity.In the 138-page document, the word privacy is used – 390 times, secure – 183 times, and confidential – 12. Overall, the report recognizes that “the country’s laws and practices are not currently optimized to support the use of data for evidence building, nor in a manner that best protects privacy” and suggests several measures to address this issue (Commission on Evidence-Based Policymaking).   

2.5 The National Secure Data Service

One of the central ideas in the report is the establishment of a National Secure Data Service (NSDC), a kind of a successor to the idea of the National Data Center from the 1960s. Back then, during one of the Congressional hearings, economist Richard Ruggles had remarked that “although the emphasis in the privacy hearings was mainly on the possible danger of centralizing records, they also brought out that in some instances, the centralization of files can result in increasing the protection of individual privacy in situations where there have been flagrant abuses” (Kraus, p. 21). Building on this premise, members of the CEP believed that creating a centralized data center could enhance both the quality of data and privacy standards. The report suggested that the NSDC could learn from the expertise and institutional knowledge of the Center for Administrative Records Research and Applications (CARRA) and the Center for Economic Studies (CES) under the Census Bureau, which have been carrying out similar functions.

3. The Legislation

3.1 Passing into law

The Foundations for Evidence-Based Policymaking Act passed the House of Representatives on November 15, 2017. About eleven months later, the Senate approved the bill, as amended, by a unanimous vote. In January 2019, the President signed the “Foundations for Evidence-Based Policymaking Act of 2018” into law (Legislative Bulleting).The final act, which is about thirty pages long, makes only seven references to privacy, but it creates clear boundaries for the use of public data, assigns responsible parties for handling and protection of databases, and assumes legal penalties for the violations of the act’s provisions. Overall, the Act presents several progressive and innovative approaches to handling public data, but whether a sufficient level of privacy protections supplements these new practicesrequires a closer examination.

3.2 Legal Amendments

It goes without saying that the act is not built in a vacuum but rather supplements a complex system of pre-existing rules and regulations. The full title of the EBPA is: “to amend titles 5 and 44, United States Code, to require Federal evaluation activities, improve Federal data management, and for other purposes.” Title 5 of the U.S. Code is about “Government Organization And Employees,” and it contains regulations, such as The Freedom of Information Act (FOIA) adopted in 1967 and the Privacy Act of 1974. FOIA provides the American citizens the right to request access to records from any federal agency, given it does not violate certain privacy and confidentiality rules (Branscomb).[2] Privacy Act of 1974 established “a code of fair information practices that govern the collection, maintenance, use, and dissemination of information about individuals that is maintained in systems of records by federal agencies” (Privacy Act of 1974). EBPA did not make any changes either in FOIA or the Privacy Act but complimented title 5 of the U.S. Code with additional provisions about federal government data handling practices.

Title 44 of the U.S. Code is about “Public Printing and Documents” and covers all the archives, registries, and records managed by the federal government. Most provisions of the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) also fall under title 44. CIPSEA was part of the broader E-Government Act of 2002, and it established uniform confidentiality standards to protect the data collected by federal statistical agencies. The purpose was to avoid opportunities for triangulating data points and reidentifying respondents based on data shared by various statistical agencies. The Evidence-based Policymaking Act repealed CIPSEA 2002 and instead reauthorized CIPSEA 2018, with the overall intention of providing more opportunities to use public data for statistical purposes and imposing more responsibilities for risk aversion (Ruyle).  

The EBPA also passed into law the “Open, Public, Electronic, and Necessary Government Data Act,” also known as the OPEN Government Data Act. Since 2009, the U.S. General Services Administration has been running a website Data.gov, which publishes for public access machine-readable datasets produced by the executive branch of the national government (Data.Gov). In March 2017, House democratic representative Derek Kilmer from Washington State proposed the OPEN Government Data Act that would expand the coverage of the data.gov and require “open government data assets made available by federal agencies (excluding the Government Accountability Office, the Federal Election Commission, and certain other government entities) to be published as machine-readable data… when not otherwise prohibited by law” (H.R.1770 – 115th Congress). All in all, EBPA was not an out of the blue, disruptive legislature, but rather another step towards open data and evidence-based policymaking that was plugged into the pre-existing legal infrastructure.

3.3 Statistical purpose

A top priority in the text of the EBPA is ensuring that only anonymized aggregate data will be shared to protect the confidentiality of respondents. One of the most frequently used terms is “statistical purpose” (mentioned 35 times), which according to the title 44 of the U.S. Code, means “the description, estimation, or analysis of the characteristics of groups, without identifying the individuals or organizations that comprise such groups” (44 USC 3561: Definitions). For example, collecting and processing data on the overall number of traffic incidents in Washington DC falls under statistical purposes. However, if the data is used to calculate car insurance rates adjusted for individual drivers in Washington DC, that would be a non-statistical use. For most social research and public policy purposes, aggregate data is sufficient. For example, if the unemployment rate among the Hispanic population is higher than other groups, then the government can initiate a tailored policy approach targeting specifically that group. However, when very large quantities of data are centralized in one place and various bits and parts are shared on public platforms, it creates opportunities for reverse tracking the data points, making meaningful connections, and reconstructing certain parts of the database not meant for public disclosure.

3.4 Risks and Responsibilities

From this standpoint, EBPA puts a big responsibility on the heads of federal agencies and will hold them accountable for determining “risks and restrictions related to the disclosure of personally identifiable information, including the risk that an individual data asset in isolation does not pose a privacy or confidentiality risk but when combined with other available information may pose such a risk.” Additionally, the law establishes the position of Evaluation Officer in each agency, whom the head of the agency will designate without regard to political affiliation. The main function of the Evaluation Officer will be to “continually assess the coverage, quality, methods, consistency, effectiveness, independence, and balance of the portfolio of evaluations, policy research, and ongoing evaluation activities of the agency.”

However, EBPA centralizes the data generated by all federal agencies. So to minimize the risks of reidentification, there is a need for interagency coordination. For this purpose, the law expands the functions and the institutional scope of the Interagency Council on Statistical Policy (ICSP), established under section 3504 (e)(8) of title 44, which designates the head of the Office of Management and Budget as head of the Council. In the 1980s, ICSP was an informal group that brought together representatives from federal statistical agencies to coordinate their activities, but it was authorized by statute as a formal council in 1995 (The Structure of the Federal Statistical System).. The Paperwork Reduction Act of 1995 has put the OMB, namely its Office of Information and Regulatory Affairs (OIRA) division, in charge of coordinating the U.S. Federal statistical system (Statistical Programs & Standards). The head of OIRA’s Statistical and Science Policy Office is also the Chief Statistician of the U.S.,[3] who hosts the meetings of the ICSP on a monthly basis. Under the EBPA, heads of statistical units or other officials with appropriate expertise from other federal agencies will also join the ICSP, which will have more responsibilities.  

3.5 Upcoming assessment 

The new law also establishes the position of Chief Data Officer (CDO) in each agency, who are “responsible for lifecycle data management,” as well as managing “data assets of the agency, including the standardization of data format, sharing of data assets in accordance with applicable law,” among fourteen other duties outlined in the law. Furthermore, section § 3520A of the EBPA provisions the establishment of the Chief Data Officer Council, which also falls under the OMB, but is separate from the ICSP. It is a temporary council that brings together representatives from 39 federal agencies. The CDO Council is assigned a number of tasks to complete before January 2025 (when it will disintegrate) (About Us. Federal CDO Council): “1. establish Governmentwide best practices for the use, protection, dissemination, and generation of data; 2. promote and encourage data sharing agreements between agencies; 3. identify ways in which agencies can improve upon the production of evidence for use in policymaking,” etc. So, certain provisions of EBPA are still in the assessment phase, and it will take a couple more years for EBPA to fully unpack.

4. Privacy

4.1 What is privacy?

The word “privacy” traces its roots to the Latin word privus, which means separate or single. The Merriam-Webster dictionary offers two definitions for privacy: 1. the quality or state of being apart from company or observation; 2. freedom from unauthorized intrusion. However, there are different approaches to privacy in the scholarly community and, consequently, different definitions. Generally, the significance and value of privacy may change depending on the social, political, and cultural circumstances, which has made it an elusive concept for a consensus definition. Nonetheless, the debate around privacy has been trending since the mid-twentieth century. It is not likely to end anytime soon, as modern technologies move us into uncharted territories with new friction points.

4.2 Privacy as a human right

A popular privacy perspective views it as a fundamental human right, or “the right to be left alone,” protected by law (MacCarthy, 2017). This approach recognizes an individual’s right to personal physical and informational space protected from external intrusion. The United States Constitution provides certain privacy protections. The Constitution’s fourth Amendment states: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated.” The Fifth Amendment provides conditional protections for private information by creating the right against self-incrimination. In the U.S. Common Law system, there are a number of court cases, such as Griswold v Connecticut, Lawrence v Texas, among others, that broaden the scope of these constitutional privacy protections. Additionally, the United States has dozens of legislation providing sectoral privacy protections. For example, the aforementioned HIPAA provides privacy protections for medical records, or the Family Educational Rights and Privacy Act legally restricts access to student records.

4.3 Privacy as a harm

Another privacy perspective is to view it as a right to be protected from harm. This approach shifts the focus of privacy debate from an individual to the level of society and asks, what the implications of data collection are on the society overall. This leaves a smaller space for personal information protection, which applies only when there is direct and tangible harm to the individual. Privacy as a harm framework emerged in the 1970s when one of its pioneers, Richard Posner wrote that we have two economic goods, “privacy” and “prying,” and expanding the privacy protections of individuals while contracting the rights of organizations collecting data is against our common interests (Posner, 1978). More recently, Howard Beales and Timothy Muris made a case for privacy in the harm framework by highlighting the example of credit score reporting since “collecting financial information about individuals has made loans more accessible to general public” (Beales & Muris, 2008). So, the social benefit of more accessible loans trumps the individual’s right to withhold financial information. This approach also prioritizes data protection over data collection and emphasizes the right to be protected from harmful externalities of data versus the data collection itself.

4.4 Privacy in social context

The most recent addition to the privacy debate was made by Hellen Nissenbaum, Professor at Cornell University. In her 2007 book, “Privacy in Context” Nissenbaum laid out a new privacy framework, which integrates elements from both the human rights and harm frameworks. Nissenbaum builds on the premise that privacy is a social construct, so its interpretation and application may vary depending on the social circumstances. From this vantage point, structured social factors such as canonical activities, roles, norms, and values, define the optimal degree of data access and visibility (Nissenbaum, p. 17). For example, the doctor you are visiting may have access to your medical records, but an insurance company may not. A basic quality of the social context framework is that privacy does not stop the flow of information but facilitates the information flow to some stakeholders while restricting it for others (MacCarthy, 2017). It is hard to disagree with Nissenbaum that privacy is a social construct, the value of which tends to change across geographic space, time, and other conditional factors. Especially nowadays, data has become omnipresent, and a uniform, rigid approach to all privacy issues cannot be the solution moving forward. Sometimes privacy is an inalienable human right. Other times there is a common public interest in sharing certain pieces of information that would be otherwise considered private. So, the social context framework offers a matrix that is broad, structured, and flexible enough to be applied across the privacy landscape.

4.5 Reasonable expectation of privacy

One of the most common reference points in the debate about privacy is the notion of “reasonable expectation of privacy.” It traces back to the seminal Supreme Court case Katz v. United States, which took place in the 1960s when the debate around the first National Data Center was ongoing. The Court’s decision expanded the Fourth Amendment privacy protections to include “what [a person] seeks to preserve as private, even in an area accessible to the public.” In concurrence with the final decision, Justice John Harlan established a two-part privacy test, which relies on the subjective expectations of the individual under query and the objective expectations of privacy by society as a whole. However, Hellen Nissenbaum, along with other contemporary privacy scholars, believes that due to the impact of modern disruptive technologies, the binary approach to privacy of inside/outside, secret/not secret or expected/not expected is somewhat outdated. Nissenbaum writes that previously “people could count on going unnoticed and unknown in public arenas; they could count on disinterest in the myriad scattered details about them” (Nissenbaum, p. 116), but now it has become far more complicated. New technologies allow capturing myriad details or data points about us into centralized databases, which adds new layers to the privacy debate where little details make big differences.

5. Analysis

This section takes a closer look at the implications of EBPA by asking what the risks and opportunities are in the centralization of federal data from a privacy standpoint. 

5.1 Data Centralization

Contrary to one of the top recommendations in the CEP report, the EBPA did not establish a National Secure Data Service, but it did create frameworks for interagency coordination and data centralization. We discussed earlier the expanded role of the ICSP and the temporary Chief Data Officers Council. EBPA also established another temporary council, Advisory Committee on Data for Evidence Building, that brings together Evaluation Officers, Chief Data Officers, and other managers responsible for data handling across the federal statistical system. Currently, the Advisory Committee is administered by the Census Bureau and the Bureau of Economic Analysis (BEA) under the Department of Commerce and works closely with the Office of Management and Budget. (Advisory Committee on Data for Evidence Building). In its Year 1 report, published in October 2021, the Advisory Committee has already affirmed the need for the establishment of the National Secure Data Service, as proposed by the CEP (Advisory Committee on Data for Evidence Building: Year 1 Report).   

5.2 Advantages of the NSDS

It is hardly surprising because, from the beginning, one of the top priorities behind the EBPA was the creation of a centralized command and control mechanism over all the data the federal government generates. When CEP started its first meetings for the research on EBPA, it had sixteen talking points, four of which were about the NSDS. It included points such as “tiered access with a NSDS,” or the role of the NSDS in the federal evidence ecosystem (CEP report, p. 123). To follow up on an earlier discussion, in its final recommendations, CEP proposed that it would enable the OMB to create higher standards for data collection and protection, which could be applied across the country. So, the same level of national database protection principles would be applied to data from either small rural communities or large metropolitan areas. NSDS would also help curtail duplicative efforts and improve the efficiency of the statistical agencies. Consequently, it would also decrease the expenses on federal data and reduce the burden on the public.

5.3 Risks and vulnerabilities

However, the federal government is handling very large volumes of data on a routine basis, and the centralization of so much statistical information within the hands of one center creates new privacy risks. First, it may change the public perception of federal data and potentially create a burden on civic life. Second, increasing public access to federal statistics increases the risk to data confidentiality, and EBPA creates obligations for making significant amounts of datasets publicly available.  

5.4 Panopticon view

During the first round of privacy debates in the 1960s, Democratic Congressman from New Jersey, Cornelius Gallagher said that improving government efficiency promised by the idea of a National Data Center “would be paid for at the far greater expense of weakening the right to privacy of all American citizens” (Kraus, p. 11). While a privacy scholar Vance Packard concluded his Congressional testimony by noting that “my own hunch is that Big Brother, if he ever comes to these United States, may turn out to be not a greedy power seeker, but rather a relentless bureaucrat obsessed with efficiency” (Kraus, p. 10). As we discussed earlier, privacy is a social construct, and its social value and impact may change depending on the circumstances. For example, we might feel comfortable sharing our medical records with the hospital, educational records with the employer, and income statements with the Internal Revenue System, but it creates a different reality when someone is able to put it all together. It gives an impression that someone knows about you as much as you do, and you are no longer in charge of your privacy. This kind of public opinion is detrimental to civic life, even if it is not based on true facts, as perception becomes a reality, and people inhibit their freedom of self-expression.

One of the most influential philosophers of the 20th century, Michael Foucault, put forward the concept of panopticism. Its central argument is that people change their behavior, even when there is a modest chance that those in the position of power could watch them. Originally panopticon was a constructional design plan for prisons proposed by English philosopher Jeremy Bentham in the late 18th century. The idea is that the prison floor is designed in a circular form, where the prison guard sits in the very center and can see all the inmates, but they cannot see the guard. Foucault articulated that this creates a power dynamic, where inmates become their own surveillance because they do not know when they could be monitored.

5.5 Privacy: statistics vs. surveillance

However, there is an important distinction between the statistical analysis foretold by the EBPA and the type of surveillance assumed by the panopticon approach. Surveillance focuses on specific targets, whereas statistics processes aggregate data, and as we mentioned earlier, EBPA puts a heavy emphasis that data will be used for statistical purposes only. Part B of the Act is titled “Confidential Information Protection” and has several safeguards against abuses of the federal databases. For example, it suggests that those who handle the data will take a pledge of confidentiality and will be liable in front of the law for a Class E felony and could be

imprisoned for up to 5 years and/or fined up to $250 000.[4] EBPA also obliges the statistical agencies to clearly distinguish any information that could be used for non-statistical purposes and provide public notice about the actual purpose of the data. However, there are loopholes in the legislation about what will be the mechanisms and conditions for public communication. A 2019 survey by the Pew Research Center showed that 64% of Americans are concerned over the government’s use of public data, while 78% do not understand what government does with the collected data (Auxier, et al). It would be good to have more legal encouragement for the executive agencies like the NSDS to prioritize public accountability and engagement. 

5.6 Privacy vs. confidentiality vs. anonymity  

People working for the NSDS will also face technical challenges in preserving the confidentiality and anonymity of data. First, let us look at the distinctions between privacy, confidentiality, and anonymity. Confidentiality and anonymity are only about a person’s actions and data, but privacy is also about the person (Privacy and Confidentiality). For example, whether someone may ask you personal questions is a matter of privacy. However, whether they can share your responses with another person is a question of confidentiality. Confidentiality implies that the surveyor knows your identity but will not share it outside a certain social group. Anonymity refers to a condition where even the primary surveyor does not know or register your identity. Both confidentiality and anonymity fall under the bigger umbrella of privacy, but neither captures its full meaning.

5.7 Privacy legislations

Not only in the United States but around the world, privacy regulations do not apply to anonymized data. For example, European Union’s well-known privacy law, the General Data Protection Regulation, has a provision that states that “The principles of data protection should apply to any information concerning an identified or identifiable natural person… this Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes” (Recital 26: Not Applicable to Anonymous Data). The United States Privacy Act of 1974 also has a specific exemption for “statistical record,” which means “a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identifiable individual” (Privacy Act of 1974). EBPA complies with this provision of the privacy law. However, in half a century since the Privacy Act was passed, many changes have happened both in the statistical science and technical capacity of machines to process data.

5.8 Reidentification

We have mentioned earlier the new methods and techniques for reidentification by triangulating data points from several anonymized data sets. Paul Ohm, Georgetown Law professor and a member of the CEP, wrote in his 2010 paper that “Reidentification science disrupts the privacy policy landscape by undermining the faith we have placed in anonymization… advances in reidentification expose these promises as too often illusory” (Ohm, Paul ). To avoid the traps of reconstruction algorithms, statistical experts have developed several data protection mechanisms. For example, for many years, the Census Bureau, forced to publish blocks of its data sets, has been using various noise-infusion techniques, such as ‘swapping,’ ‘blank-and-impute,’ ‘partially synthetic data’ and most recently, differential privacy (boyd and Sarathy, p. 7). These approaches preserve the integrity of the datasets and maintain their full value for most purposes without compromising confidentiality. However, in very few scenarios, these methods could result in minor deviations since data sets are manipulated. These manipulation methods cannot be shared publicly because that would undermine the confidentiality of the datasets. Consequently, these disclosure control methods create friction between data users and the Census Bureau.

6. Recommendations:

6.1 Study the impact on civic activism

In the early 1970’s Advisory Committee on Automated Personal Data Systems was established under the Department of Health, Education, and Welfare to research the potentially harmful consequences of automated personal data systems, effective safeguards to protect against those negative consequences, as well as “policy and practice relating to the issuance and use of Social Security Numbers” (U. S. Department of Health, Education and Welfare). The Committee published its final report titled “Records, Computers, and the Rights of Citizens” in 1973, which had a ripple effect on privacy laws and regulations around the world for the following decades. In the United States, it laid the foundations for the Fair Information Practice Principles, applied by the Federal Trade Commission to the private sector, and made an impact on the Privacy Act of 1974.

Much has changed since the 1970s, and more changes will come after the EBPA is fully unrolled. Now, the federal government needs to conduct a similar study to assess the impact of the EBPA and data centralization on civic activism and freedom of expression. The United States is by far the biggest experiment in human history, testing the power of a society built on individual liberties. One of the cornerstones of America’s success story is the value and emphasis it puts on freedom of self-expression. Even nominal burdens on privacy and civil liberties could be a very high cost to pay for the promises of EBPA. 

6.2 Revisiting the legislation

The findings of that report should be built into revising the Privacy Act of 1974. The latest change to the legislation was made in 1988 when Congress passed the Computer Matching and Privacy Protection Act, which requires that federal agencies “enter into written agreements with other agencies or non-Federal entities before disclosing records for use in computer matching programs.” On the official online database of the Congress (Congress.gov), there are 1,200 bills that have the word privacy in the title. Most of them have not passed the House floor, but it shows how complicated is the legal terrain on privacy in the United States. Ninety-four privacy bills were introduced in 1973-74, and then there have been, on average, twenty bills on privacy initiated every year.

Current legislation puts a heavy burden on the statistical agencies to respond to three competing demands. They have to produce good quality data, but they also have to protect the privacy of their respondents. Now, they are also obliged to make these datasets publicly available, which forces them to use various techniques such as differential privacy. However, that makes certain data consumers unhappy, as we can see from the experience of the Census Bureau. So, it would be good to relieve the statistical agencies of some of this burden and provide legal tools and justifications for the privacy protections applied to public datasets.

7. Conclusion

Over the years, U.S. federal statistical agencies have accumulated tremendous institutional expertise and technical capacity to produce large-scale, high-quality data. Now EBPA is rallying up the forces of the federal statistical agencies into a cohesive unit to provide a numerical insight into the performance of the executive branch. It will create an administrative mechanism for informing the government’s policy decisions, as well as a public accountability mechanism since large segments of the government data will be made publicly accessible. However, it also consolidated all the statistical information of the federal government into centralized databases, which creates new privacy risks and vulnerabilities. EBPA is yet to be fully unrolled, but one of its main consequences is expected to be the establishment of the NSDS, which will have an enormous weight on its shoulders as it will need to satisfy several competing demands. On both ends of the line, NSDS will be working with and for the American people, so it is very important to keep them informed and understand public impact and expectations. It is the right time for the U.S. government to conduct a study on the impact of centralized, automated databases on civic life, akin to the one conducted in 1973, and incorporate that into updating the privacy legislation.

References:

About Us. (2020). Federal CDO Council. https://www.cdo.gov/about-us/

Advisory Committee on Data for Evidence Building. (2022). U.S. Bureau of Economic Analysis (BEA). https://www.bea.gov/evidence

Advisory Committee on Data for Evidence Building: Year 1 Report. (2021, October). Office of Management and Budget. https://www.bea.gov/system/files/2021-10/acdeb-year-1-report.pdf

Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2020, August 17). Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information. Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/

A Timeline of Census History. United States Census Bureau.

https://www.census.gov/history/img/timeline_census_history.bmp

Beales, Howard, & Muris, Timothy. “Choice or Consequences: Protecting Privacy in Commercial Information.” 75 U. Chi. L. Rev. 109 2008 pp. 109-120

Bipartisan Policy Center. Frequently Asked Questions Related to the Commission on Evidence-Based Policymaking’s Report. (2019, March). https://bipartisanpolicy.org/download/?file=/wp-content/uploads/2019/03/CEP-FAQs.pdf

boyd, d. & Sarathy, J. “Differential Perspetives: Epistemic Disconnects Surrounding the US Census Bureau’s Use of Differential Privacy”

Branscomb, Anne (1994). Who Owns Information?: From Privacy To Public Access.

Commission on Evidence-Based Policymaking. (2017, September). THE PROMISE OF EVIDENCE-BASED POLICYMAKING. Bipartisan Policy Center. https://bipartisanpolicy.org/download/?file=/wp-content/uploads/2019/03/Appendices-e-h-The-Promise-of-Evidence-Based-Policymaking-Report-of-the-Comission-on-Evidence-based-Policymaking.pdf

Data.Gov. (2022) About. https://data.gov/about/

Dr. Latanya Sweeney’s Home Page. (2021). http://latanyasweeney.org/

Gauthier, J. H. S. (2021). 1790 Overview – History – U.S. Census Bureau. United States Census Bureau. https://www.census.gov/history/www/through_the_decades/overview/1790.html

H.R.1770 – 115th Congress (2017–2018): OPEN Government Data Act. Congress.Gov | Library of Congress. https://www.congress.gov/bill/115th-congress/house-bill/1770

Igo, S. E. (2020). The Known Citizen: A History of Privacy in Modern America. Harvard University Press.

Mark MacCarthy, (2017). “Privacy Policy and Contextual Harm” 13 I/S: Journal of Law and Policyhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=3093253

Nissenbaum, Helen (2007). Privacy in Context. Stanford University Press. Kindle Edition

44 USC 3561: Definitions. Office of the Law Revision Counsel (2022).https://uscode.house.gov/view.xhtml?req=(title:44%20section:3561%20edition:prelim)%20OR%20(granuleid:USC-prelim-title44-section3561)&f=treesort&edition=prelim&num=0&jumpTo=true

The National Constitution Center (2022). The Constitution – Full Text. https://constitutioncenter.org/interactive-constitution/full-text

Paul Ohm. (n.d.). PaulOhm.Com. https://www.paulohm.com/

Puckett, C. (2009, July 1). The Story of the Social Security Number. Social Security Administration Research, Statistics, and Policy Analysis. https://www.ssa.gov/policy/docs/ssb/v69n2/v69n2p55.html

U. S. Senator Patty Murray (2017, November 1). Senator Murray, Speaker Ryan Introduce Evidence-Based Policymaking Legislation. https://www.murray.senate.gov/senator-murray-speaker-ryan-introduce-evidence-based-policymaking-legislation/

Legislative Bulletin (2019). The President Signs H.R. 4174, “Foundations for Evidence-Based Policymaking Act of 2018.” Social Security Administration

https://www.ssa.gov/legislation/legis_bulletin_021519.html

Meyer, M. (2018, October 31). Law, Ethics & Science of Re-identification Demonstrations. Harvard Law Petrie Flom Center. https://blog.petrieflom.law.harvard.edu/symposia/law-ethics-science-of-re-identification-demonstrations/

Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization” (August 13, 2009). UCLA Law Review, Vol. 57, p. 1701, 2010. Available at SSRN: https://ssrn.com/abstract=1450006

Posner, Richard. “The Right of Privacy.” Georgia Law Review 393 (1978) pp. 393 – 404

Privacy Act of 1974. (2021, April 30). Department of Justice. https://www.justice.gov/opcl/privacy-act-1974

Privacy and Confidentiality. (n.d.). CHOP Research Institute. https://irb.research.chop.edu/privacy-and-confidentiality

“Privacy.” Merriam-Webster.com Dictionary, Merriam-Webster, https://www.merriam-webster.com/dictionary/privacy. Accessed 3 May. 2022.

Public Law No: 115–435. Foundations for Evidence-Based Policymaking Act of 2018 Congress.gov. (2019). https://www.congress.gov/bill/115th-congress/house-bill/4174

Recital 26: Not Applicable to Anonymous Data. General Data Protection Regulation. (2016).

Ruyle, M. (2019, March 1). New Law Offers Reforms to Improve Access to Data, Confidentiality Protections | Amstat News. Magazine of the American Statistical Association. https://magazine.amstat.org/blog/2019/02/01/law-improves-data-confidentiality/

Statistical Programs & Standards. (2021, December 22).The White House.

https://www.whitehouse.gov/omb/information-regulatory-affairs/statistical-programs-standards/

The Structure of the Federal Statistical System. (n.d.). The White House. https://obamawhitehouse.archives.gov/omb/inforeg_statpolicy/bb-structure-federal-statistical-system

Understanding Confidentiality and Anonymity. (n.d.). The Evergreen State College. https://www.evergreen.edu/humansubjectsreview/confidentiality

U. S. Department of Health, Education and Welfare. (1973, July). Records, Computers and the Rights of Citizens. DHEW Publication. https://www.justice.gov/opcl/docs/rec-com-rights.pdf

Weir, M (2001). Welfare State. International Encyclopedia of the Social & Behavioral Sciences

https://doi.org/10.1016/B0-08-043076-7/01094-9


[1] General Service Administration proposed a similar idea in the 1970s to create an inter-connected network of federal government data systems, which did not succeed either. 

[2] They have a very user-friendly website operated by the Department of Justice at https://www.foia.gov/

[3] OIRA is also in charge of the cost-benefit analysis laid out in the President’s Executive Order 12866

[4] It is important to note that the Privacy Act of 1974 imposed note more than $5000 fine, which in today’s money equals around $30 000: “Any member, officer, or employee of the Commission… who knowing that disclosure of the specific material is so prohibited, willfully discloses the material in any manner to any person or agency not entitled to receive it, shall be guilty of a misdemeanor and fined not more than $5,000.”

Podcast | Data & Truth with danah boyd

The topic of this episode is data and truth. There is a popular saying that we live in a data driven world? But where is data driving us? According to some estimates the amount of data generated over the next 3 years will be more than the amount of data created over the past 30 years. We have immersed ourselves in zettabytes of data to minimize uncertainty, make sense of the world around us and validate every step we take. But how reliable is all this data and can it really help us find the truth? In this episode we look for the answers to this and other questions with prominent scholar Prof danah boyd, whose research examines the intersection between technology and society. She’s a partner researcher at Microsoft, the founder of the well-known non-profit research institute Data & Society, as well as a Distinguished Visiting Professor at Georgetown University, where she taught a graduate course on Data and Politics of Evidence.

A Critical Review of UNEP’s Food Waste Index

Its Impact and Limitations on Sustainable Consumption Policies

I. Introduction

Sustainable consumption is one of the priority areas in the international development agenda. In 2015, 193 UN member states undersigned the 2030 Agenda for Sustainable Development, which consists of seventeen interlinked Sustainable Development Goals. It is a comprehensive development framework that also focuses on “responsible consumption and production.” However, it is a strategic-level document, which did not take into account the operational-level challenges for developing indicators to measure the progress towards these goals. In 2021, United Nations Environment Program published its first Food Waste Index (FWI) report, which is presented as the most comprehensive report on global food waste and made many news headlines.[1][2] The UNEP has done an enormous job building the groundwork for producing global data on food waste, but the organization attributes low or very low confidence level to nearly 80% of the data used to construct the FWI. Given the context, the FWI is not a reliable benchmark for either measuring progress or informing adequate policy decisions.

II. Background

In September 2015, at the landmark UN Sustainable Development Summit in New York, countries worldwide agreed on a post-2015 global development agenda “to achieve a better and more sustainable future for all people and the world by 2030.”[3] They agreed on 17 Sustainable Development Goals, which are broken down into 169 SDG Targets, which in turn have 232 unique indicators (as of February 2022) to track progress.[4] Particularly, SDG 12 focuses on “responsible consumption and production,” which is about “decoupling economic growth from environmental degradation, increasing resource efficiency and promoting sustainable lifestyles.”[5] There are eight targets under SDG 12, which mainly focus on national policies and big-scale producers, but two of them are about consumer behavior and thus fall within the scope of our research. Target 12.3: reduce food losses along production and supply chains and halve global per capita food waste at the retail and consumer levels;[6] and, 12.8: promote universal understanding of sustainable lifestyles.

SDG Target 12.3 has two indicators: the Food Loss Index produced by Food and Agriculture Organization of the UN and Food Waste Index produced by the UN Environment Programme (UNEP). The Food Loss Index (FLI) measures the percentage of food loss from production up to (but not including) retail level. Food Waste Index (FWI) focuses on the percentage of food wasted at the retail and consumption stages. Since the focus of this paper is on sustainable consumption, I will take a closer look at the Food Waste Index, analyze the data behind it, and assess its impact.

After carefully examining the datasets used for the Food Waste Index, I concluded that existing data are not reliable enough for measuring the progress towards SDG Target 12.3, and advancing tailored policy interventions. However, these conclusions should not undermine the importance of the food waste issue, since every data point, every study and observation demonstrate that there is a significant food waste problem both in economically developed and underperforming countries. It is a major concern, as hundreds of millions around the world suffer from malnutrition, since their caloric intake falls below minimum energy requirements.[7] That is also the reason, why we need to understand the limitations of currently available data.

III. Data Analysis

UNEP worked together with a non-profit organization based in the United Kingdom the Waste and Resources Action Program (WRAP) to produce its first Food Waste Index in 2021, which is considered the “most comprehensive report into global food waste in homes.”[8] The report was published in 2021, but the numbers represent the situation in 2019. According to the report, 17% of all food that reaches retail ends up in the dumpster. Of that number, households are accountable for 61% of food waste, food service industry (restaurants) for 26% and retail for 13%.[9]

These are staggering numbers and to put them in perspective, they mean that roughly 931 million tonnes of food is wasted every year, which is more than the total consumption in a country as big as India. If we combine Food Waste Index with Loss Index, it would mean that more than a third of all food is either lost or wasted somewhere along the chain, which also accounts for nearly 10% of global carbon emissions. However, what if we scratch the surface and look behind the report into the raw data[10] that shaped this report. How reliable are the food waste numbers?

Authors of the report acknowledge that it is very challenging to collect data on food waste and admit that they have high-quality data from only 14 countries,[11] while they have medium confidence in reports from 42 countries. The dataset of the report lists 233 geographic units (mainly UN Member states), and has assigned no estimate, very low confidence or low confidence for data estimates for 183 of them, or 79%.[12] The below pie chart presents a visual breakdown of the data source confidence levels:[13]

Evidently, there is not much confidence in the credibility of the reported figures. The authors of the report also elaborate that overall, they were able to collect 152 data points from 54 countries and then extrapolated that data to calculate the estimates for other geographic areas where data was not available. However, even the credibility of those available data points can be questioned. For example, Poland is assigned a medium confidence level, even though the data source for Poland is a small study by local civil society actors. “The Pilot Study of Characteristics of Household Waste Generated in Suburban Parts of Rural Areas” (Steinhoff-Wrześniewska, Aleksandra), mentions that:   

21 households, representing 83 people, were audited. None of them were involved in agricultural production. They were provided with three bags for sorting (bio-waste, hygenic waste, all other waste) and had waste collected in each of the four seasons. It is unclear for how long during each season the measurement took place. As a result of small sample size and unknown length, we cannot have high confidence in the estimate.

Population of Poland is 38 million and only 15 million of it lives in rural areas, while 61% reside in urban centers. A sample of only 21 households from suburban parts of rural Poland observed over undefined periods of time is not a strong representative of food management habits across the whole country.

The question is whether these numbers can serve as a reliable metrics to measure the progress or calibrate policy actions. SDG Target 12.3 aims to halve the global per capita food waste by 2030. According to UNEP’s 2021 Index average food waste per household equals 79 kg a year in high-income countries equals, 76 kg in upper middle-income countries, 91 kg in lower middle-income countries, while the data for low-income countries is insufficient. For example, the 2021 Food Waste Index Report mentions that “The next questionnaire will be sent to Member States in September 2022, and results will be reported to the SDG Global Database by February 2023.” What if the next report shows that annual food waste per household in upper middle-income countries is 86 kg. It would lead to the conclusion that the food waste in this category of countries is increasing, while in fact, the number could have been decreasing. American biochemist Erwin Chargaff once said: “I thought it was the task of the natural sciences to discover the facts of nature, not to create them.” Relying on inaccurate data for measuring progress could set in motion mismatched policy interventions and do more harm than good.

IV. Theoretical Framework

There are no easy shortcuts to producing global data, such as Food Waste Index. It requires the formation of a specific global knowledge infrastructure focused around food waste. It entails standardizing measurements and processes, disciplining staff and synchronizing reporting timelines. Achieving this subject specific institutional interoperability on a global scale, requires significant amounts of money and resources. So, I explain the current shortcomings of the Food Waste Index, by looking at the global knowledge infrastructure behind it and reference mainly these two scholarly works for theoretical backup: A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming, by Paul Edwards, and Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life, by Martha Lampland and Susan Leigh Star.

Food Waste Index is not a legitimate scientific fact, because there is no well-founded knowledge infrastructure behind it. In his book “A Vast Machine”, Paul Edwards writes that “an established fact is one supported by an infrastructure,”[14] and elaborates that “knowledge infrastructures comprise robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds.”[15] If we get rid of the infrastructure, we are left with claims and facts that can neither be backed up nor verified.

In modern world, infrastructures are all around us and we use them on a daily basis, without paying much attention, unless there is a problem with them and/or we have to change them.[16] For example, behind the tap water we use, there is a complex infrastructure of plumbing and water regulation. In a similar fashion, global data requires an elaborate knowledge infrastructure that consists of national communities of scientists, government bureaucrats, and civil society activists, who understand each other, can inform and keep each other accountable. These communities need physical facilities, such as offices and laboratories, as well as legal space to conduct their work with respect to intellectual property.[17] They require mediums of communication such as conferences, journals, web portals etc., to exchange knowledge and keep up to date.

However, most importantly, for these national information eco-systems to reach beyond their borders and co-produce global data, they need standardized methods and measures. The amount of reported food waste can change depending on how countries define food waste, when they measure it and what factors they take into account. For example, according to the UNEP, “food waste is defined as edible parts and associated inedible parts going directly to the following destinations: landfill, controlled combustion, litter discards/refuse, compost/aerobic digestion, land application, co/anaerobic digestion, sewer, but does not include food waste used for biomaterial/processing, animal feed or not harvested.[18] In some countries associated inedible parts of the food used for compost, might not be considered food waste. A more accurate report, should also take into account seasonal fluctuations of food waste.

V. UNEP’s Food Waste Index

Bottom line up front, there is no global knowledge infrastructure around food waste and UNEP did not have the resources to build it up in the given time frame. UNEP has been working on food waste reduction since 2013, when it launched the global campaign Think Eat Save, but it became a priority task for UNEP only in 2019, following the UN Environment Assembly Resolution 4/2, which mandated UNEP to accelerate global action on food waste reduction.[19]  

Established in 1972 and headquartered in Nairobi, Kenya UNEP has around 860 staff members worldwide.[20] The mission statement of the UNEP, which is celebrating its 50th anniversary this year, “is to provide leadership and encourage partnership in caring for the environment by inspiring, informing, and enabling nations and peoples to improve their quality of life without compromising that of future generations.”[21] By default the top priority for UNEP has been to lead the international efforts against climate change.

In 2013, UNEP in partnership with the Food and Agriculture Organization of the UN (FAO) launched the Save Food Initiative and its subcomponent program “Think Eat Save: Reduce Your Footprint.” Primary goal of the FAO established in 1945 is to “achieve food security that people have regular access to enough high-quality food to lead active, healthy lives.”[22] In 2011, FAO had released its estimates that nearly 1/3 of the world’s food was lost or wasted every year, which lead to their joint Save Food Initiative with UNEP two years later.

So, until recently food waste data was tangled with research into food loss and fell under the prerogative of FAO. The inherent structure of the UN system and the scheme for resource distribution, incentivizes UN agencies to compete for more responsibilities and programmatic oversight. In a 2019 survey by the UN Office of Internal Oversight Services, 80% of UNEP staff “noted that there was critical competition for donor sources with other UN entities.”[23] This institutional contest between FAO and UNEP could potentially explain why between 2015 and 2019, no organization was assigned as a custodian for Food Waste Index.

The first-time that food waste showed up in UNEP’s program of work and budget was in biennial 2018-2019, approved by the UN Environmental Assembly of the UNEP (UNEA) in May 2016.[24] It includes planned work outputs such as “Within sustainable food and agriculture policy frameworks, urban planning and/or existing sustainable consumption strategies, technical and policy guidance provided to public and private actors to measure, prevent and reduce food waste and increase the uptake of sustainable diet strategies and activities,” as well as “Outreach and communication campaigns to raise awareness of citizens (particularly young people) on the benefits of shifting to more sustainable consumption and production practices.” Their previous work plan for 2016-2017, proposed in 2014, had no mention of food waste.[25]

In May 2016, UNEA also adopted a resolution on “Prevention, reduction and reuse of food waste,” which requests the UNEP Executive Director “in cooperation with the Food and Agriculture Organization to “continue to raise awareness of the environmental dimensions of the problem of food waste, and of potential solutions and good practices for preventing and reducing food waste and promoting food reuse and environmentally sound management of food waste.”[26] However, UNEP became the custodian of the Food Waste Index only in 2019, and solidified itself as the lead agency on tackling food waste pursuant to the UNEA Resolution 4/2.[27]

In 2019, UNEP received a new Executive Director Inger Anderson, a competent professional who is well versed both in sustainable development and food security issues. She has more than 30 years of experience in international development organizations, which include her roles as Vice President of the World Bank for Sustainable Development and Head of the CGIAR Fund Council.[28] CIAGR is the Consortium of International Agricultural Research Centers, which brings together international organizations engaged in research about food security. Her predecessor came from a diplomatic background and was asked to resign as a result of an internal audit. Media reports, citing the leaks from the internal audit documents, mentioned that the head of UNEP spent “$500,000 on air travel and hotels in just 22 months, and was away 80% of the time.”[29] So, positive changes happened in the organization under the new leadership and Food Waste Index became one of the top priorities for UNEP.

When UNEP was first assigned as a custodian in 2019, Food Waste Index was still classified as a Tier 3 indicator by the UN’s Inter-agency and Expert Group on SDG Indicators (IAEG-SDGs). The UN breaks down all SDG indicators into 3 Tiers:

Tier 1: Indicator is conceptually clear, has an internationally established methodology and standards are available, and data are regularly produced by countries for at least 50 per cent of countries and of the population in every region where the indicator is relevant.
Tier 2: Indicator is conceptually clear, has an internationally established methodology and standards are available, but data are not regularly produced by countries.

Tier 3: No internationally established methodology or standards are yet available for the indicator, but methodology/standards are being (or will be) developed or tested.”

Tier classifications change over time as the quality of data for indicators improves. For example, as of February 2022, IAEG-SDG lists 136 Tier I indicators, 91 Tier II indicators and 4 indicators that have multiple tiers (different components of the indicator are classified into different tiers),[30] while in September 2016, there were 81 Tier I indicators, 57 Tier II indicators and 88 Tier III indicators.[31]  According to the IAEG reports Food Waste Index was upgraded from Tier III to Tier to II within 2 years.

When we look at the work plan of the UN Environment Program for 2020-2021, it has 7 subprograms, and collecting data for Food Waste Index falls under the Subprogram 6, which is about Resource Efficiency. In 2020-2021, UNEP allocated $95.6 million to the Subprogram 6, which means roughly $48 million per annum. It had 114 staff members working towards the 20 planned work outputs under the Resource Efficiency subprogram.

Mainly these work outputs were geared towards developing the information infrastructure for delivering the SDG indicators. For example, “Resource use assessments and related policy options are developed and provided to countries to support planning and policy-making, including support for the application and monitoring of relevant SDG indicators.” Or, “Database services providing enhanced availability and accessibility of life cycle assessment data are provided through an interoperable global network, methods for environmental and social indicators and the ways to apply them in decision-making.”[32] Most of these programmatic activities are about capacity development, technical assistance, training, policy support, etc.

As a result of UNEP’s active engagement, the number of countries that have a common global measurement approach for consistent reporting under SDG 12.3 increases every year. On average UNEP adds around 10 countries a year to their list of countries compatible for food waste reporting. This shows that UNEP is on the right track on building the knowledge infrastructure for a more reliable global Food Waste Index.

UNEP’s methodology for data collection is to send out Questionnaire on Environment Statistics (Waste Section) to National Statistical Offices and Ministries of Environment. If the respective authorities from these countries do not respond, then UNEP refers to alternative sources for information. However, we should be clear eyed that national executive agencies that collaborate with UNEP are not politically neutral entities and their responses to questionnaires can be subject to political interests of their respective governments.[33] So, these agencies might have the capacity to produce reliable numbers, but not the intention. For this reason, it would benefit the credibility of the food waste index, if UNEP increases its engagement with civil society organizations that can serve as alternative sources of reporting on food waste.

VI. Conclusion

The 2021 Report on Food Waste Index, does not just provide us with numbers about food waste, but it also informs us about the state of the knowledge infrastructure around food waste. The formation of a knowledge infrastructure is a lengthy and complicated process. Institutional resources of the UN system, its global reach, and modern technologies have enabled UNEP to make tremendous progress towards building this infrastructure, within a very short period of time. However, it is still unclear, when UNEP will be able to produce reliable global data on food waste. UNEP can draw many valuable lessons from their 2021 report on food waste, but it should not be used as a benchmark for progress, since it could lead to many misplaced conclusions down the road.

Looking into the future the importance of sustainable consumption will only increase. Over the course of the past century, humanity experienced unprecedent growth in global wealth and food production. Surging food production rates create enormous pressure on the environment, even though hundreds of millions are still not getting their fair share. One of the big reasons for this failure is the food waste problem. Unfortunately, until recently food waste issue has been largely neglected and calculating exactly how much food is wasted has remained an elusive target. If UNEP stays consistent with its action plan, global Food Waste Index will become increasingly more reliable, as more and more countries will be able to plug into the global knowledge infrastructure on food waste. However, there is a lot of work ahead. In the meantime, I would like to reiterate the call of the UNEP Executive Director Inger Anderson’s opening message in the 2021Food Waste Index Report, “let us all shop carefully, cook creatively and make wasting food anywhere socially unacceptable.”


[1] “U.N. Report Says 17% of Food Wasted at Consumer Level.” U.S., Reuters, 4 Mar. 2021,

[2] Merchant, Natalie. “Global Food Waste Twice the Size of Previous Estimates.” World Economic Forum, 26 Mar. 2021.

[3] Sustainable Development. (2022). UN Department of Economic and Social Affairs. https://sdgs.un.org/

[4] “Measuring Progress towards the Sustainable Development Goals.” Our World in Data, SDG Tracker, sdg-tracker.org. Accessed 5 Mar. 2022.

[5] Sustainable consumption and production policies. (2022). UNEP – UN Environment Programme.

[6] UNEP Food Waste Index Report 2021. (2021). UNEP – UN Environment Programme. https://www.unep.org/resources/report/unep-food-waste-index-report-2021

[7] Roser, M. (2019, October 8). Hunger and Undernourishment. Our World in Data. https://ourworldindata.org/hunger-and-undernourishment

[8] “New UNEP Report Developed in Collaboration with WRAP Reveals True Scale of Global Food Waste.” The Waste and Resources Action Programme, 2021, wrap.org.uk/FoodWasteIndex.

[9] UNEP Food Waste Index Report 2021. (2021). UNEP – UN Environment Programme.

[10] SDG Indicators Database. (2021). UN Department of Economic and Social Affairs. https://unstats.un.org/sdgs/UNSDG/IndDatabasePage

[11] According to the UNEP Food Waste Index Report 2021, countries with high-quality data on food waste are Australia, Austria, Canada, China, Denmark, Estonia, Germany, Ghana, Italy, Malta, the Netherlands, New Zealand, Norway, the Kingdom of Saudi Arabia, Sweden, the United Kingdom and the United States.

[12] “Food Waste Index Level 1 Annex.” UNEP- UN Environment Program, 2021, wedocs.unep.org/bitstream/handle/20.500.11822/35355/FWD.xlsx.

[13] Ibid

[14] Edwards, P. N. (2013). A Vast Machine, p. 22

[15] Edwards, P. N. (2013). A Vast Machine, p. 17

[16] Lampland, Martha, and Susan Leigh Star. Standards and Their Stories.

[17] Ibid

[18] UNEP Food Waste Index Report 2021. (2021), p. 14

[19] “Promoting Sustainable Practices and Innovative Solutions for Curbing Food Loss and Waste.” United Nations Environment Assembly, UNEP – UN Environment Programme, Mar. 2019, wedocs.unep.org/bitstream/handle/20.500.11822/28499/English.pdf.

[20] UNEP | International Organizations. (2005). IGPN – International Green Purchasing Network. http://www.igpn.org/global/interorg/unep.html

[21] “About UN Environment Programme.” UNEP – UN Environment Programme, http://www.unep.org/about-un-environment. Accessed 5 Mar. 2022.

[22] “About FAO.” Food and Agriculture Organization of the United Nations, http://www.fao.org/about/en. Accessed 5 Mar. 2022.

[23] Ivanova, Maria (Feb 23, 2021). The Untold Story of the World’s Leading Environmental Institution: UNEP at Fifty, p. 62

[24] “Programme of Work and Budget for the Biennium 2018‒2019.” United Nations Environment Assembly, UNEP – UN Environment Program, May 2016

[25] “Proposed Biennial Programme of Work and Budget for 2016–2017.” United Nations Environment Assembly, UNEP – UN Environment Programme, June 2014

[26] “Prevention, Reduction and Reuse of Food Waste.” United Nations Environment Assembly, UNEP – UN Environment Program, May 2016.

[27] “Promoting Sustainable Practices and Innovative Solutions for Curbing Food Loss and Waste.” United Nations Environment Assembly, UNEP – UN Environment Programme, Mar. 2019.

[28] Inger Andersen. (2019). UNEP – UN Environment Program

[29] Carrington, D. (2018, November 20). UN environment chief resigns after frequent flying revelations. The Guardian.

[30] “Tier Classification for Global SDG Indicators.” UN Statistics Division, Feb. 2019,

[31] “Tier Classification for Global SDG Indicators.” UN Statistics Division, Sept. 2016,

[32] “Proposed Programme of Work and Budget for the Biennium 2020‒ 20211.” UN Environment Assembly, p. 98

[33] In her book “Shades of Citizenship,” Melissa Nobles presents a very illuminating discussion about the impact of the political interests of the data collecting agencies on the data they produce

Can AI be creative?

More than 2000 years ago, Plato made several interesting references to the notion of creativity, in the Socratic dialogues. In Meno, Socrates claims that “when poets produce truly great poetry, they do it not through knowledge or mastery, but rather by being divinely “inspired” by the Muses”. In another dialogue, Socrates contemplates the origins of new knowledge, which can be interpreted as creative thinking. Socrates wondered how can existing knowledge evolve into new ideas. When asked by Meno, “will we say, of a painter, that he makes something?”, Socrates responded, “no, he merely imitates”.

AI can be very good at imitating and learning from the creative works of humans. The below painting of the Healy Hall at Georgetown University, was produced by the Deep Dream Generator, an AI project sponsored by Google. I put in an image of Healy Hall, chose the “Starry Night” painting of Van Gogh as an overlay, and the program put out this painting within a minute. I find it aesthetically pleasing, but I understand it is not a completely original work. Nonetheless, do not all students of art learn by imitation? Can Artificial Intelligence learn to be truly creative?

AI-generated Painting of Healy Hall at Georgetown University, Washington D.C.

“Creative souls and glory seem,
Submissive and subtle and soft and serene.”

These two lines were produced by another Google project AI poem generator when I put in my keyword, creativity. The algorithm has learned to write poems “by reading over 25 million words written by the 19th-century poets.” Compare that to the below poem written by Lord Byron in 1816 during the First Industrial revolution.

“As the Liberty lads o’er the sea
Bought their freedom, and cheaply, with blood,
So we, boys, we
Will die fighting, or live free,
And down with all kings but King Ludd!”
– Lord Byron, 1816

Creativity is a challenging concept to define, but it is not difficult to recognize. Clearly, on a creativity scale, AI falls far behind Byron. By the way, Byron was not a Luddite but had sympathies for their cause. (Luddites were a radical anti-technology movement in 19th century England.) Interestingly, Lord Byron is also the father of Ada Lovelace, who is often described as the world’s first computer programmer. Lovelace is credited for creating the first algorithm that was put to use in her friend Charles Babbage’s Analytical Engines. Lovelace also proposed that “until a machine can originate an idea that it wasn’t designed to, it can’t be considered intelligent in the same way humans are.”  

In 2001, this approach inspired a group of engineers led by Selmer Bringsjord to come up with the Lovelace test, which many computer scientists consider a better replacement for the outdated Turing test. A computer can pass the Lovelace test only if it produces an outcome it was not programmed to. For example, a novel idea or an original painting. However, there is one more condition of the Lovelace test: the software output should surprise the human designer of the program. She should not be able to tell how the program achieved that outcome.

To this day, it is an open question whether any AI can pass the Lovelace test. In 1997, World Chess Champion Garry Kasparov (originally from my hometown Baku) lost to chess-playing supercomputer Deep Blue. Many people believe that mastering chess is associated with creative thinking. Deep Blue was calculating between 100 and 200 million positions on a 64-square chessboard, but it was following grammatical boundaries prescribed by its designers. The scientists behind Deep Blue at Carnegie Mellon University cannot beat the world champion in chess, but their brainchild can. Deep Blue’s victory over Kasparov marked a major milestone in the development of AI, but it did not prove that AI can be creative.

Maybe the challenge is that creativity belongs in the arts domain, and we are trying to explain it scientifically. Albert Einstein famously said “It would be possible to describe everything scientifically, but it would make no sense. It would be a description without meaning—as if you described a Beethoven symphony as a variation of wave pressure.” The founder of psychoanalysis, Sigmund Freud believed that pain and repression are necessary ingredients for creativity. Does this mean we will have to teach AI to experience pain, so it can be creative?

Humans have been creative since the beginning of days, but across the globe, ancient cultures did not have a word to express creativity. The modern notion of human creativity emerged only in the age of Enlightenment in Europe, and it became a popular catchfrase during the 20th century. People applied it to the course of history and identified it as one of the driving forces behind our evolution. Various studies have demonstrated that even some animals have creative potential, but none of them can be a rival to human creativity. Now, recent breakthroughs in technology have inspired many ideas about the prospective of machines to compete with human creativity. However, there is no conclusive answer due to two reasons: there is no clear philosophical definition of creativity and AI is rapidly evolving.  

References

Devlin, E. (2019, May 2). Create a personalized poem, with the help of AI. Google. https://www.blog.google/outreach-initiatives/arts-culture/poemportraits/

Kaufman, S. B. (2014, May 12). The Philosophy of Creativity. Scientific American Blog Network. https://blogs.scientificamerican.com/beautiful-minds/the-philosophy-of-creativity/

Miller, A. I. (2020, February 1). Machines have learned how to be creative. What does that mean for art? Salon. https://www.salon.com/2020/02/01/machines-have-learned-how-to-be-creative-what-does-that-mean-for-art/

Pearson, J. (2014, July 8). Forget Turing, the Lovelace Test Has a Better Shot at Spotting AI. Vice. https://www.vice.com/en/article/pgaany/forget-turing-the-lovelace-test-has-a-better-shot-at-spotting-ai

Plato. The Republic. (1998). The Project Gutenberg. https://www.gutenberg.org/files/1497/1497-h/1497-h.htm