The Differential Impacts of Human Capital and Infrastructure on the Sustainable Development Goals

June 5, 2023May 26, 2025 hpanahovLeave a comment

ABSTRACT:

This study looks at country-level data to explore the dynamics among human capital, infrastructure, and a country’s progress toward the United Nations Sustainable Development Goals (SDGs). Utilizing the confirmatory factor analysis method, I develop a new Infrastructure Index and combine it with the World Bank’s dataset on Human Capital Index to evaluate the relative impact of these factors on a country’s SDG scores. My findings affirm the integral roles of both human capital and infrastructure in the sustainable development context. However, a stronger correlation between human capital and the SDG Index suggests that policymakers seeking to advance the sustainability agenda should prioritize investments in human capital over infrastructure. Moreover, the study uncovers nuanced relationships between these indicators and specific SDGs. Human capital has a significant association with SDG 5 (Gender Equality), whereas infrastructure does not. Both human capital and infrastructure affect SDG 1 (No Poverty), with no statistical difference between their effects. Interestingly, while human capital correlates more strongly with SDG 13 (Climate Action), this relationship is negative due to the larger carbon footprint of more developed economies. These findings can inform policy decisions for goal-specific sustainable development strategies.

I. INTRODUCTION:

The central framework in the global development agenda is based on the 2030 Agenda for Sustainable Development, which “provides a shared blueprint for peace and prosperity for people and the planet, now and into the future.” It is undersigned by all UN Member States. Hundred-ninety-one countries have committed to achieving measurable progress on these goals by 2030. The Agenda constitutes seventeen interlinked Sustainable Development Goals (SDGs) that encompass a very wide variety of objectives. The seventeen SDGs are broken down into hundred-sixty-nine targets and two-hundred-thirty-two indicators to measure progress.

Measuring progress

One of the challenges in the SDG framework is measuring the progress in order to inform the policy. SDGs are successors to the Millennium Development Goals (MDGs), which consisted of 8 goals and 18 targets, 14 of which could be assessed quantitatively. MDGs were adopted in 2000, and all the countries from around the world committed to achieving these goals within 15 years. By the end of 2015, only three and a half of the 14 measurable targets were achieved. In 2023, we are at the half-way mark of the 2030 Agenda. According to the latest reports, the international community is behind schedule to achieving the SDG’s, partially due to the impact of the COVID-19.[1] In the given context, one of the most important questions is to find what policy interventions would be most effective to advance progress towards the SDGs.

What interventions are most effective?

Investments in both human capital and infrastructure are critical for achieving the sustainable development goals. These are both interdependent and complimentary domains in the international development space. However, policymakers working on specific developmental objectives are often forced to prioritize one over the other due to the limited nature of resources. This research analyzes country-level data from the United Nations and the World Bank to estimate the relationship between the overall SDG Index of a country and its performance on the Human Capital Index and Infrastructure Index. I will also examine the impact of human capital and infrastructure on SDG 1 (No Poverty), SDG 5 (Gender Equality), and SDG 13 (Climate Action). Below I provide more information about each one of the concepts analyzed in this research.

SDG Index

SDG Index is a composite indicator developed by the United Nations that weighs in the effects of development metrics across all the SDG metrics. It estimates countries’ performance on a scale from 0 to 100, and usually, Scandinavian countries, such as Finland, Denmark, Sweden, and Norway, achieve the highest rankings with scores > 80.[2] The 2022 Report includes the SDG indexes for 163 countries, among which the Central African Republic and South Sudan have the lowest scores, sub-40.

SDG 1: No Poverty

The first goal in the UN SDG framework calls to “end poverty in all its forms everywhere.” SDG 1 aims to ensure that everyone, regardless of their circumstances, has equal access to opportunities and resources for a quality life. It calls for comprehensive strategies to end poverty that include social protection systems and measures to build the resilience of the poor and those in vulnerable situations. The three main metrics of SDG 1 are: poverty headcount ratio at $1.90/day (%), poverty headcount ratio at $3.20/day (%), and poverty rate after taxes and transfers (%).

SDG 5: Gender Equality

Gender equality is fundamentally important for achieving the Sustainable Development Goals for several reasons. First, it is a matter of human rights. Everyone, regardless of gender, should have equal access to health, education, economic opportunities, and political representation. Second, gender equality is pivotal for economic growth, as women constitute half of the world’s potential human capital, and studies consistently show that societies that discriminate by gender tend to experience less economic growth and slower poverty reduction. The SDG 5: Achieve Gender Equality and Empower all Women and Girls incorporates the following metrics: the ratio of female-to-male mean years of education received (%), the ratio of female-to-male labor force participation rate (%), seats held by women in national parliament (%), gender wage gap (% of male median wage).[3]

SDG 13: Climate Action

SDG 13 calls for immediate action to combat climate change and its impacts. The Goal underscores the critical need for the global community to address the pressing issue of climate change. Recognizing that climate change is not just an environmental issue but also a significant threat to social and economic development, this goal calls for urgent action to reduce greenhouse gas emissions, build resilience, and improve adaptive capacity to climate-induced impacts. The metrics of SDG 13 include CO₂ emissions from fossil fuel combustion and cement production (tCO2/capita), CO₂ emissions embodied in imports (tCO₂/capita), CO₂ emissions embodied in fossil fuel exports (kg/capita), Carbon Pricing Score at EUR60/tCO₂ (%, worst 0-100 best).[4]

Statistical Performance Index

The Statistical Performance Index (SPI) evaluates the performance of national statistical systems based on the aggregate of five pillars of statistical capacity: data use, data services, data products, data sources, and data infrastructure. The SPI is a weighted average of the statistical performance indicators.

Human Capital Index

Human capital is sometimes referred to as soft infrastructure.[5] Without thriving human capital, nations cannot achieve their development goals, highlighting its central role in international development. It is widely acknowledged that improvements in human capital lead to increased productivity, which in turn spurs economic growth. Education and health, the two main components of human capital, have a direct impact on a country’s development trajectory. In 2018, the World Bank developed the Human Capital Index as a metric to measure and evaluate the quality and potential of human capital in a country. The HCI enables policymakers to identify strengths, weaknesses, and areas for improvement in human capital development. The HCI is based primarily on three components:

Child survival: This component considers that not all children survive to start formal education and looks at the under-5 mortality rate.
Education: This section combines information on the quality and quantity of education. The number of years a child is expected to complete school by age 18, considering current enrollment rates, measures the quantity of education. The quality is assessed using harmonized test scores from international student achievement testing programs.
Public health: This component uses two proxies for the overall health environment – adult survival rates (the percentage of 15-year-olds who will survive until age 60) and healthy growth among children under 5, measured by stunting rates.[6]

Infrastructure Index

According to the Merriam-Webster dictionary: Infra- means “below,” so the infrastructure is the “underlying structure” of a country and its economy, the fixed installations that it needs in order to function.”^[7] Public infrastructure provides the basic physical systems and structures, such as water supply, sewers, electrical grids, roads, bridges, and telecommunications, among others. High-quality infrastructure ensures the provision of fundamental necessities, advances safety, and enhances the quality of life. Infrastructure also facilitates the exchange of reliable information, increases productivity, creates more job opportunities, and fosters overall economic growth.

Unlike the Human Capital Index, there is no internationally recognized index that would indicate the level of public infrastructure in a given country. The objective of the UN SDG 9 is to “Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation.”[8] However, for the purposes of this research, it is not the best pointer because it includes indicators, such as Expenditure on Research and Development, Female share of graduates from Science, Technology, Engineering, and Mathematics (STEM) programs, but does not include indicators for access to electricity, water supplies, etc. However, there are seven SDG indicators across four different sustainable development goals that are related directly to the public infrastructure:

Indicator	Description	SDG
1. Access to basic water services	The percentage of the population using at least a basic drinking water service, such as drinking water from an improved source, provided that the collection time is not more than 30 minutes for a round trip, including queuing.	SDG 6: Ensure availability and sustainable management of water and sanitation for all
2. Access to basic sanitation services	The percentage of the population using at least a basic sanitation service, such as an improved sanitation facility that is not shared with other households.
3. Access to electricity	The percentage of the population who has access to electricity.	SDG 7: Ensure access to affordable, reliable, sustainable and modern energy for all
4. Adult population with bank accounts	The percentage of adults, 15 years and older, who report having an account (by themselves or with someone else) at a bank or another type of financial institution, or who have personally used a mobile money service within the past 12 months.	SDG 8: Promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all
5. Internet penetration	The percentage of the population who used the Internet from any location in the last three months. Access could be via a fixed or mobile network.	SDG 9: Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation
6. Transportation systems	The percentage of the surveyed population that responded “satisfied” to the question “In the city or area where you live, are you satisfied or dissatisfied with the public transportation systems?”.	SDG 11: Make cities and human settlements inclusive, safe, resilient and sustainable

Hypotheses:

The question driving this research is to find the differences in the effects of human capital and infrastructure on SDG scores. So, I have constructed the following hypotheses:

H₀:	There is no statistical difference in the effects of Human Capital and Infrastructure on SDG Index
H₁:	There is a statistical difference in the effects of Human Capital and Infrastructure on SDG Index
H₂:	There is a statistical difference in the effects of Human Capital and Infrastructure on SDG 1: No Poverty
H₃:	There is a statistical difference in the effects of Human Capital and Infrastructure on SDG 5: Gender Equality
H₄:	There is a statistical difference in the effects of Human Capital and Infrastructure on SDG 13: Climate Action

II. METHODS

Merging the data sets

I merge the World Bank Human Capital Index and the UN Sustainable Development 2022 datasets with the Country name as the unique identifier. When I drop the rows with missing HCI Index or the SDG Index values, the number of entries in my data frame reduces from 201 to 141. Part of the reason is that UN SDG data also includes geographic Regions (such as “East and South Asia” or “Latin America and the Caribbean”) and Income categories (such as “Low-income Countries” or “Upper-middle-income Countries”) under the Country variable. With that being said, there are also missing values in both data sets. Nonetheless, we still have 141 complete data rows, which is sufficient for us to proceed with our analysis.

Factor Analysis

Public infrastructure is a broad concept which we cannot easily observe and measure. In statistical terms, it is a latent variable, which refers to “concepts that cannot be measured directly but can be assumed to relate to a number of measurable manifest variables.”[9] I use the factor analysis technique, which allows me to account for various dimensions of the public infrastructure (such as water, electricity, internet, etc.) and output one variable. Factor Analysis is often used for constructing a new index, as it explores and uncovers the underlying relationships between observed manifest variables and unobserved latent variables.

KMO Test

The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is a statistic that indicates the proportion of variance in the variables. The KMO values range from 0 to 1, with higher values indicating a better fit for factor analysis. The individual KMO values for each variable tell us how well each variable fits with all the others. Variables with a KMO less than 0.5 might not be suited for factor analysis as they do not correlate well with the other variables. As we see from the below output, the MSA values of all my variables are 0.8 or above, which brings the overall MSA score to 0.87, which is a positive sign.

Kaiser-Meyer-Olkin (KMO) Test results

Model 1

So, I keep all six manifest variables to construct a model that will estimate the infrastructure index. In the first model, the Comparative Fit Index (CFI) and Tucker-Lewis Index (TLI) are both above the 0.95 threshold, which indicates an appropriate fit. However, the Root Mean Square Error of Approximation is 0.099, above the maximum threshold of 0.08.

Infrastructure Index Model 1: fit estimates

Changing model specifications

Since the fit of the first model is not satisfactory, I change the specifications of the model based on the modification indices and theoretical considerations. Modification indices are a measure of how much the overall model chi-square would be expected to decrease if a particular parameter were freely estimated in the model. In other words, it provides a suggestion on how to improve the model fit. If we look at the modification indices between our variables, we notice that the relationship between sdg8_bank and sdg9_internet is disproportionately stronger than any other in our data. Most likely due to a strong correlation between the indicator for internet penetration and the percentage of the adult population who have bank accounts.

Modification Index table (descending order)

Model 2

So, I add a special path to my model, which accounts for the dependency between ‘SDG 8 bank accounts’ and ‘SDG 9 Internet’. I also create a path between ‘SDG 6 Water’ and ‘SDG 6 Sanitation’ because academic literature dictates that there is usually a strong dependency between these two variables. When I check the fit indices, the new model with two special paths performs much better than the previous one. Both the CFI and TLI values are > 0.99, and the RMSEA has decreased to 0.056.

Infrastructure Index Model 2: fit estimates

The Infrastructure Index

I use the model to estimate the infrastructure index for all 141 countries in our dataset. I also use the min-max normalization technique to transform the index into a new scale from 0 to 1. This method scales the values by subtracting the minimum value and dividing by the range of the original values (i.e., the difference between the maximum and minimum values). So, they maintain the same variance and proportions but on a scale from 0 to 1.

Estimating the Impact on the SDG Index

Both the Human Capital Index and the Infrastructure Index are ratio-level measures. When both independent variables are ratio level measures, “regression and correlation analysis are the standard techniques for measuring relationships and testing hypotheses.”[10] My main hypothesis is to test whether human capital or infrastructure makes a bigger impact on the overall SDG Score of a country. So, I construct a multivariate regression model with HCI and the Infrastructure Index as independent variables and then explore the beta coefficients of the model to understand which index has a stronger effect on the SDG Score.

III. RESULTS

Summary of the Multivariate Regression Model

Multivariate Regression Model

Besides the Human Capital Index and Infrastructure Index, I also have the Statistical Performance Index as an explanatory variable in my model. As mentioned earlier, it helps to account for some of the possible shortcomings in the data. We can see that all three variables and the model have a very high level of statistical significance, with p = 0. The R-squared value is not very important for us because we are looking at a descriptive model versus a predictive model. However, in any case, the multiple R-squared value is 0.91, which means that approximately 91% of the variability in the outcome variable can be explained by the predictor variables.

Model assessment: regression diagnostics

1. Test for Linearity

Before I proceed further with my analysis of the findings, I need to test the assumptions to validate that a linear regression model was a suitable approach. First, I look for linearity and equal variance in the below Residuals vs Fitted plot. Upon visual examination, there are no substantial deviations in the red line, which confirms that the relationship is linear between our explanatory and response variables.

2. Test for homoscedasticity

In the below plot, we can also observe that the vertical spread of the residuals is equally distributed, which means the error term does not vary much as values of the outcome variable change. So, our model passes the test for homoscedasticity as well.

3. Testing for Independence of residuals

Based on the observations from the below “Residuals vs Leverage” plot, our model passes the test of independence of residuals as well. Large residual values on this plot would suggest that the model is not explaining some aspects of the data. Our model does not have any standardized residual values above 1. In R programming language, I double-checked and confirmed no observations with Cook’s distance value above 1.

4. Testing for Normality of the error distribution

We can tell whether the error terms are normally distributed based on the observations from the below Q-Q plot. We want the residuals to be as close to the diagonal line as possible. However, generally, we rarely have real data where errors are perfectly normally distributed. So, some deviations are expected, and overall, it seems like our model passes the normality test. However, to double-check, I also apply the Shapiro-Wilk test.

The null hypothesis for the Shapiro-Wilk test is that the data is normally distributed. In this case, the p-value for the Shapiro-Wilk test is way above the significance level, which means that we cannot reject the null hypothesis, and the data is normally distributed.

5. VIF Score

Last but not least, since we are dealing with a multiple linear regression model, we need to make sure there is no multicollinearity. So, we apply the VIF Score test. “A rough rule of thumb is that variance inflation factors [VIF] greater than 10 give some cause for concern.” (Vehklahti p.93) As we can see from the below table, the VIF scores for all three of our independent variables are below 5. These scores indicate some multicollinearity but are safely within an acceptable range.

VIF Scores:

Beta coefficient analysis

After we have confirmed that model meets all 5 assumptions of a multivariate regression model, we can proceed with the analysis of the model. In order to estimate the impact of each individual variable on the SDG Index, we can look at the beta coefficients. The standardized beta coefficients allow us to compare the effects of the variables on the same scale, regardless of the units of measurement. Below are the beta coefficients of our linear multivariate regression model. We notice that the beta coefficient for hci_ind (Human Capital) is larger than the coefficient for infr_ind (Infrastructure). This suggests that Human Capital has a stronger impact on the output variable, the SDG Index.

Beta coefficients of the Multivariate Regression model

However, we also need to make sure the difference between the two beta coefficients is statistically significant. I run the below linear hypothesis test, which is based on the null hypothesis that there is no difference between the effects of the two indices: hci_ind and infr_ind.

Linear hypothesis test

The associated p-value (Pr(>F) = 0.0001545) is far below 0.05, indicating strong evidence to reject the null hypothesis that the coefficients for hci_ind (Human Capital Index) and infr_ind (Infrastructure Index) are the same. So, the data provides strong evidence that the effect of Human Capital on the sdg_ind (SDG Index) is different from the effect of Infrastructure (infr_ind) on sdg_ind.

Next, I explore the relationship between Human Capital Index, Infrastructure Index and specific Sustainable Development Goals: SDG 1: No Poverty; SDG 5: Gender Equality; and SDG 13: Climate Action. I construct a multivariate multiple regression model with three left-hand variables, indicators for SDG 1, SDG 5, and SDG 13.

Response SDG 1:

Based on the initial observation of the model summary, we can conclude that both human capital and infrastructure have a significant effect on poverty. However, we will need to explore further if there is a statistical difference between the effects of the two variables. Upon closer examination of the two beta coefficients, we find no statistically significant difference between the effects of the two explanatory variables.

Linear hypothesis test

Response SDG 5:

When we look at the response of SDG 5, we notice that Human Capital Index has a statistically significant impact on SDG 5, whereas Infrastructure Index does not. The value of the coefficient magnitude for hci_ind (52.15) is also larger than the coefficient for infr_ind (-8.30). Based on these observations, we can conclude that there is a statistical difference in the effects of human capital and infrastructure on SDG 5.

Response SDG 13:

The summary of the response to SDG 13 suggests that, once again, infrastructure does not have a statistically significant effect, but the impact of human capital is significant. So, we can claim that human capital has a statistically more significant effect on SDG 13 Climate action. However, we should also note that the coefficients are negative, which means there is a negative correlation between human capital and SDG 13. This is consistent with the basic correlations of the indicators in our dataset (please, see the Correlation Matrix table). I discuss these findings further in the conclusion.

Correlation Matrix

IV. CONCLUSION

Our findings confirm once again that both human capital and infrastructure are essential for the sustainable development of countries. They are both fundamentally important factors foretelling a country’s level of development. With that being said, based on our results, we can reject the Null Hypothesis that there is no statistical difference in the effect of human capital and infrastructure on a country’s SDG Index. The statistical analysis suggests a stronger inter-dependency between human capital and the SDG Index than with infrastructure. So, policy-makers facing the dilemma of choosing between investments in human capital and infrastructure should prioritize human capital if their goal is to advance the overall sustainable development agenda in the country.

However, we also found that Human Capital Index and the Infrastructure Index may have different levels of impact on specific objectives within the UN SDG framework. We discovered that human capital is a statistically significant indicator of a country’s performance on SDG 5: Gender equality, whereas infrastructure is not. We also established that while both indicators have a significant impact on a country’s performance on SDG 1: No poverty, there is no statistically significant difference between the effects of human capital and infrastructure on the poverty levels of a country. Last but not least, we figured that compared to infrastructure, there is a stronger inter-dependency between human capital and SDG 13: Climate action. However, there is a negative correlation between human capital and a country’s performance on climate indicators. This should not come as a surprise because developed countries with higher Human Capital Indexes produce far more carbon footprint than developing countries.[11] It is another reminder that developed countries should transition to more sustainable solutions.

V. WORKS CITED

Guterres urges countries to recommit to achieving SDGs by 2030 deadline. (2023, April 25). UN News. https://news.un.org/en/story/2023/04/1136017

Johnson, J. B., & Joslyn R. A. (1991). Political Science Research Methods: Second Edition. Congressional Quarterly Inc.

Merriam-Webster. (n.d.). Infrastructure. In Merriam-Webster.com dictionary. https://www.merriam-webster.com/dictionary/infrastructure

Sachs, J.D., Lafortune, G., Kroll, C., Fuller, G., Woelm, F. (2022). From Crisis to Sustainable Development: the SDGs as Roadmap to 2030 and Beyond. Sustainable Development Report 2022. https://dashboards.sdgindex.org/downloads

The Investopedia Team. (2023, February 7). Infrastructure: Definition, Meaning, and Examples. Investopedia. https://www.investopedia.com/terms/i/infrastructure.asp

World Bank Group. (2023). The Human Capital Project: Frequently Asked Questions. In World Bank. https://www.worldbank.org/en/publication/human-capital/brief/the-human-capital-project-frequently-asked-questions

The World Bank Group. (2020, September 23). Data Catalog. Human Capital Index. https://datacatalog.worldbank.org/search/dataset/0038030 

The world’s top 1% of emitters produce over 1000 times more CO2 than the bottom 1% – Analysis – IEA. (n.d.). International Energy Agency.

https://www.iea.org/commentaries/the-world-s-top-1-of-emitters-produce-over-1000-times-more-co2-than-the-bottom-1

United Nations. (n.d.). The 17 goals: Sustainable Development. United Nations. https://sdgs.un.org/goals

Vehkalahti, K., & Everitt, B. S. (2020). Multivariate Analysis for the Behavioral Sciences: Second Edition. CRC Press.

[1] Guterres urges countries to recommit to achieving SDGs by 2030 deadline. (2023, April 25). UN News.

[2] Sachs, J.D., Lafortune, G., Kroll, C., Fuller, G., Woelm, F. (2022). From Crisis to Sustainable Development: the SDGs as Roadmap to 2030 and Beyond. Sustainable Development Report 2022.

[3] United Nations. (n.d.). The 17 goals: Sustainable Development. United Nations. https://sdgs.un.org/goals

[4] Ibid

[5] The Investopedia Team. (2023, February 7). Infrastructure: Definition, Meaning, and Examples. Investopedia.

[6] World Bank Group. (2023). The Human Capital Project: Frequently Asked Questions.

[7] Merriam-Webster. (n.d.). Infrastructure. In Merriam-Webster.com dictionary.

[8] United Nations. (n.d.). The 17 goals: Sustainable Development. United Nations.

[9] Vehkalahti, K., & Everitt, B. S. (2020), p. 295

[10] Johnson, J. B., & Joslyn R. A. (1991), p. 319.

[11] The world’s top 1% of emitters produce over 1000 times more CO2 than the bottom 1% – Analysis – IEA. (n.d.). International Energy Agency

Podcast | Voices in the Code: book discussion with the author

February 22, 2023June 19, 2023 hpanahovLeave a comment

Automated decision-making systems or algorithms are playing an increasingly significant role in public administration and civil rights space. In his book “Voices in the Code: A Story About People, Their Values, and the Algorithm They Made,” David Robinson investigates and contextualizes the story of the Kidney Allocation System, which as a result of cross-disciplinary collaboration among surgeons, clinicians, data scientists, public officials, advocates, and patients, over the course of 10 years, evolved into a relatively inclusive and accountable decision-making technology. Through this story, the author discusses the most fundamental issues related to the design and management of public-interest algorithms.

Impact of climate indicators on the carbon footprint of data centers

December 14, 2022June 7, 2024 hpanahovLeave a comment

by Huseyn Panahov and Ryan Powers

1. Introduction

Carbon emissions are usually associated with the fossil fuel and transportation industries, yet our online activities also have a significant carbon footprint. It may seem counterintuitive, but data centers account for around 2% of all global greenhouse gas emissions. It is roughly in line with the global airline industry, and not far behind the chemical and petrochemical industry. Parallelly with the digital revolution, the demand for data centers continues to increase. While many industry leaders in the data center business have pledged to zero carbon emissions by 2030, these server farms still need gigantic amounts of energy to operate. In this research we have collected data about 41 data centers owned by Google and Oracle. We looked primarily at the power efficiency of the data centers and the climate indicators in the local geography. Our findings show that every 10 degrees Fahrenheit drop in the temperature translates to 0.006 point improvement in the Power Usage Effectiveness of the data centers. (1.0 is an ideal PUE indicator, whereas globally most data centers have a PUE around 1.8)

2. Background

There are 2,749 data centers from nearly 3,000 service providers in the United States, and about 5 thousand more around the rest of the world. With no alternative technology on the horizon, data centers are here to stay and will continue to grow in numbers. The three most important factors affecting data center energy efficiency are: design, power source, and climate. Data center design and more importantly equipment age affects power consumption as older servers and cooling systems operate at lower efficiency. Power is typically from a combination of renewable and non-renewable energy sources, and facilities that derive a greater share of power from renewable sources are more efficient. Most state-of-the-art facilities built by the largest providers (Google, Oracle, Facebook etc.) run up to 100% on renewable energy. This is not the case when considering the entire data center population. Finally, climate impacts energy efficiency predominantly because cooler, more temperate climates require less of a data center cooling system.

We sought to measure the energy efficiency of data centers accounting for external climate factors like wind, temperature, and precipitation. Cooling processes to regulate server temperature are the most energy intensive, and our hypothesis was that in colder climates you would observe more efficient energy consumption compared to hotter climates. Next, we present our methodology, data analysis, results, and areas for further research.

3. Methodology

While there are thousands of data centers around the world, most of them do not share information about their energy consumption. We were fortunate to find open information about 22 data centers operated by Google and 19 by Oracle. These are two tech industry leaders and they operate very energy efficient data centers. This means that the impact of local climate factors is even more significant on an average data center than in our study.

Data centers require large amounts of energy and electricity to power and cool the servers. Consequently, choosing the right location for a data center is a complex task, which requires consideration of local temperatures, power infrastructure, environmental architecture, in addition to business factors such as land price, legal environment, and skilled workforce.

There are a number of factors that impact a company’s decision to identify a location for a data center. Below are some of the most important factors:

Table 1: Decision-making factors for choosing a data center location

Non-environmental factors	Description
1. Availability of trained workforce	On average a large data center employs between 50 and 500 employees. They usually need trained workforce who can operate the technology and respond to emergencies.
2. Proximity to the customer base	The shorter the distance between the data center and the main customer base, the less chances for incidents along the route
3. Availability and price of land	Large data centers usually require anywhere between 100’000 and 5’000’000 square feet of land.
4. Tax privileges	On average tech companies invest between $300 million and $3 billion to construct a large database. They provide both short term employment opportunities during the construction phase and long-term jobs after the launch.
5. Security	Are there conflicts or other security vulnerabilities in the area?
6. Rule of law	Can tech companies rely on fair judicial procedures?
7. Energy infrastructure	This can be both environmental and non-environmental, but data centers need large amounts of electric power to remain operational 24/7.

Environmental factors	Description
1. Energy infrastructure	Does the existing energy infrastructure rely on renewable power sources or fossil fuels?
2. Potential for producing renewable energy	Wind speed, sunny days, precipitation
3. Water resources	Besides energy, operating a large data center also requires access to large amounts of water. Water Usage Effectiveness (WUE) is the industry metric to measure the efficiency of data centers in utilizing the water resources
4. Average Temperature	Average temperature
5. Temperature variance	How much temperature varies in various time intervals

Our study focuses solely on environmental factors, specifically how local climate conditions impact a data center’s power efficiency. We are looking for empirical evidence that data centers located in colder climates have higher power efficiency. Then, building up on this analysis we recommend what climate zones would be optimal locations for large data centers.

Every year an increasing number of tech companies release sustainability reports, which analyzes and summarizes the environmental impact of their business operations. However, most companies offer only aggregate numbers and do not make publicly available the datasets that shape those analyses. Big tech companies, such as Amazon and Microsoft, do not share even the locations due to safety considerations. Consequently, availability of data was one of the main factors that shaped this research.

In our project we look at the data centers of two multinational tech companies Google and Oracle. They have made publicly available both the locations of their data centers, as well as the Power Usage Effectiveness (PUE) indicator for each data center. Power Usage Effectiveness is the industry metric to estimate the power efficiency of a data center. Lower PUE means better power efficiency. The lowest possible PUE level is 1.0, which means 100% power efficiency. For most data centers the PUE level varies between 1.2 and 3.0, whereas the industry average is 1.8.

We collected PUE indicators for 38 data centers owned and operated by Google and Oracle and spread across 16 countries and 14 US states. Next, we looked up the various climate indicators for each location at a county or city level. Consequently, we built a dataset with 17 data points for each location, which accounted for local temperature variance, seasonal temperature, average temperatures, precipitation, wind speed, cloudiness, and solar power potential. Please, see the below list for our list of variables:

Table 2: List of variables

#	Variable	Description
1	State	Country or US State where the database located
2	Database location	Location of the database
3	Company	Company that owns the database
4	PUE	Power Usage Effectiveness
5	Temp_variance	The difference between highest and lowest temperatures (max of high monthly average – min of low monthly average) in a given location * *
6	Temp_annual	Average annual temperature **
7	Temp_halfyear_warm	Average temperature Apr – Sep (6 months)
8	Temp_halfyear_cold	Average temperature Oct – Mar (6 months)
9	Temp_winter	Average temperature for Dec – Jan – Feb
10	Temp_spring	Average temperature for Mar – Apr – May
11	Temp_summer	Average temperature for Jun – Jul – Aug
12	Temp_fall	Average temperature for Sep – Oct – Nov
13	Rain_annual	Sum of monthly rain averages. Measured in inches
14	WindSpeed_annual	Average of monthly wind speeds. Measured in mph
15	SolarPower_annual	Average Daily Incident Shortwave Solar Energy for the whole year . Measured in kWh
16	SolarPower_summer	Average Daily Incident Shortwave Solar Energy for Apr – Sep
17	SolarPower_winter	Average Daily Incident Shortwave Solar Energy for Oct – Mar
18	Cloudy_annual	% of the time the weather is cloudy in a year
19	Cloudy_summer	% of the time the weather is cloudy in warmer months: Apr – Sep
20	Cloudy_winter	% of the time the weather is cloudy in colder months: Oct – Mar
* All temperatures are measured in Fahrenheit ** For Australia and Chile, the data points were flipped

4. Data Analysis

3.1 Statistical descriptions

In our dataset we have 20 variables, of which 17 are numeric and 3 are strings. We do not have any missing variables, because we constructed this dataset by hand. Let us look at basic statistical descriptions of our numeric variables.

Table 3: Statistical description of numeric variables

Based on these descriptions we can tell that climate conditions across the data centers in our dataset are quite diverse. For example, the amount of annual rainfall in inches varies between 8 and 73 inches depending on the location. The annual wind speed in these locations varies between 5 mph and 14 mph. The annual temperature varies between 42- and 82-degrees Fahrenheit.

Average temperature at a given location is 59 degrees Fahrenheit. However, considering that in an average data center there are 100 ‘000 servers, where each server emits 1200 BTU heat per hour, which would increase the indoor temperature by 213 degrees Fahrenheit without a proper Heating, Ventilation and Air Conditioning (HVAC) system. Generally, it has been considered that the optimal ambient temperature for most technologies, including servers in the data centers, is 68-75 degrees Fahrenheit.^[1] More recently, some companies have introduced new servers that have higher heat tolerance at 81 degrees Fahrenheit.^[2] With all things considered, it would be reasonable to assume that optimal indoor temperature for an average data center today is 72 degrees Fahrenheit. So, even in coldest locations there is a need for electric power to cool down the internal temperature, as well as to power up the technology.

As we can see the PUE values in our dataset vary from 1.06 to 1.78, while the average PUE is equal to 1.26. So, the average PUE value in our dataset is about 30% lower than the industry average PUE, which equals 1.8. This means that overall, the dependance on climate factors is likely to be higher for data centers than in our data set, because higher power efficiency (lower PUE) also means relatively less dependence on climate.

Picture 1: PUE distribution

3.2 Correlations

Now, let us look at the correlation between our numeric variables. We can construct a heatmap. We can see that there is a strong negative correlation -0.73 between SolarPower_annual and Cloud_annual, which validates the credibility of our dataset. Naturally, there should be a negative correlation between cloudy weather and the potential for solar power. However, most importantly we want to check the correlations between PUE and other variables.

We want to identify which variables have the most significant correlation with PUE. We notice that there is a positive correlation between PUE and annual temperature average (Temp_annual). There is an even stronger Temp_halfyear_cold (temperature for October through March), and PUE at the level of 0.4.

Picture 2: Heat-map of numeric variables

There is also a moderately strong relationship at the level of 0.4 between Temp_winter (average temperature for December, January, February) and PUE. If we look at this relationship separately, we notice that if the average winter temperature in a given location is 30 degrees Fahrenheit or below, then the PUE is most likely to be sub 1.2.

Picture 3: PUE vs Winter temperature

3.3 Building the model

For constructing our final model, we picked only one independent variable: the temperature for the cold half of the year. We could use more variables, but it could lead to multicollinearity and undermine the efficiency of our analysis. We built two models: a linear regression and a decision tree model.

Picture 4: Linear and Decision Tree models

The above visualization on Picture 4 represents the outputs of our models. The orange line represents a linear regression model, while red dots represent the outputs of a decision tree model. Below are the performance indicators of our models. Generally, we want to pick the model with lower Root Mean Square Error (RMSE) and higher r-squared. We notice that in this regard the decision tree model performs much better than the linear model. However, considering that decision tree models tend to be overfitting, we could choose either one of the models. (Note: because we have a very small dataset, we did not split it into training and test data).

R-squared is not as important in this case, because we are not building a predictive model and the difference between the RMSE values is not very significant, so we could choose either one of the models.

5. Conclusions

If we look up the coefficients of the linear regression model, we get the following numbers:

This means that the relationship between PUE and Temp_halfyear_cold, is as follows:

PUE = 0.943 + 0.006 x [Temp_halfyear_cold]

Based on this formula, we can suggest that every 10 degrees Fahrenheit decrease in the temperature for the cold half of the year, leads to a 0.06 decrease in PUE. Based on this formula, average winter temperature of 10 degrees Fahrenheit would mean PUE = 1.03 (0.943 + 0.006*10), a near perfect level of power efficiency. However, we understand that there are few places on earth with such low temperatures and they might not be the best locations for data centers due to a number of other reasons, discussed above.

This analysis provides empirical evidence that data centers have better power efficiency and lower carbon footprint in climates with lower temperatures. It shows that there is a moderately strong relationship between winter temperatures and PUE, and every 10 degrees increase in temperature could lead to about 0.06 decrease in power efficiency.

6. Limitations and future research

Our research was limited by the data we could access. Oracle and Google are two large companies that happen to uniformly report PUE metrics, as most do not. This limiting factor led to us not being able to compare them to other peers such as Facebook, Equinix, Microsoft. Furthermore, Oracle and Google already have a commitment to sustainable data centers, and thus we were unable to incorporate other companies with perhaps less sustainable practices into our dataset.

The PUE metric could also be considered a limitation. It is a metric designed for easy reporting and industry comparison, rather than true efficiency measurement. The input data for the calculation can and does vary company to company, given that no industry regulation mandates how it is measured and reported.

Future research could explore many different avenues. First, our model was not predictive since our dataset was so small. With a larger data set, one could predict the optimal climate for a data center. From there, we could have measured the PUE and carbon emissions differentials by relocating a data center to a more optimal location. Additionally, with more companies represented in the data, we could control for variables like market share, capital expenditures, and investments in renewable energy. Finally, a more robust analysis could identify a superior metric to PUE in measuring and comparing data center efficiency.

7. Sources

Ambient Temperature and Why it Matters for Data Centers. (2022, December 1). History-Computer. https://history-computer.com/ambient-temperature-and-why-it-matters-for-data-centers/

Benoit, R. (2022, February 9). An Updated Look at Data Center Temperature and Humidity. AVTECH. https://avtech.com/articles/4957/updated-look-recommended-data-center-temperature-humidity/

Google. (n.d.). Data Centers. Google. Retrieved December 14, 2022, from https://www.google.com/about/datacenters/

Oracle Cloud Data Center regions and locations. Oracle. (n.d.). Retrieved December 14, 2022, from https://www.oracle.com/cloud/cloud-regions/data-regions/

Siddik, M. A., Shehabi, A., & Marston, L. (2021). The environmental footprint of data centers in the United States. Environmental Research Letters, 16(6), 064017. https://doi.org/10.1088/1748-9326/abfba1

The Weather Year Round Anywhere on Earth – Weather Spark. (n.d.). https://weatherspark.com

United States of America: Data center market overview. Cloudscene. (n.d.). Retrieved December 14, 2022, from https://cloudscene.com/market/data-centers-in-united-states/all

[1] Ambient Temperature and Why it Matters for Data Centers. (2022, December 1). History-Computer.

[2] Benoit, R. (2022, February 9). An Updated Look at Data Center Temperature and Humidity. AVTECH.

COVID-19, Chat-GPT and International Development Assistance

December 10, 2022May 26, 2025 hpanahovLeave a comment

The COVID-19 pandemic created an incredibly challenging situation in the international development space. On the one hand, developing countries were going through an extraordinary crisis, which increased the global demand for international development assistance. On the other hand, COVID-19 slowed down the economies of the donor countries, which faced new socio-economic challenges at home. This article is a summary of statistical research that looks at how the Gross National Income of the Organization for Economic Co-operation and Development (OECD) member states changed between 2018 and 2021 and correlates it with the amount of money they invested in Official Development Assistance (ODA) during this period. The statistical analysis shows that despite economic challenges and increased demand for social assistance at home, OECD countries actually increased their international development assistance spending.

Artificial intelligence chatbot, ChatGPT, was first released on November 30, 2022, and that was right around the time I was working on this research. So, I registered on the ChatGPT website and asked it to write a paragraph about the impact of Covid-19 on official development assistance. Within 5 seconds, ChatGPT wrote a persuasive paragraph arguing that “many donor countries have faced economic challenges and have had to redirect funds towards domestic priorities, such as healthcare and support for businesses and individuals affected by the pandemic.” It concluded that “ODA funding has been reduced, which has had a detrimental effect on the ability of recipient countries to address development challenges and achieve their development goals.” The text was fluid, logically coherent, and not plagiarized. However, the analysis of actual data showed that despite its sound reasoning, ChatGPT was wrong.

I collected all the data from the website of the Organization for Economic Co-operation and Development (OECD), which is an international forum that brings together 38 of the world’s most economically advanced countries to exchange best practices, tackle common problems and contribute to global peace and development. OECD countries account for 18% of the world population but 63% of the global GDP and about 95% of the Official Development Assistance. I gathered data for a four-year period from 2018 to 2021 and looked at mainly two indicators: 1. data on the Gross National Income, as it is one of the most important indicators of a country’s economic performance, and 2. ODA represents the amount of money a donor country spends on international development assistance.

Between 2020 and 2019, the GNI of OECD countries shrank by $1,3 trillion, which means an average of $34 billion reduction per country. However, contrary to GNI, ODA spending of the OECD countries increased by $6,6 billion, which breaks down to a $193 million average per country. Consistent with these findings, the average ODA, as a percentage of GNI, increased from 0,373% in 2019 to 0,399% in 2020.

Conclusion 1

So, the statistical analysis showed that despite the negative impact of COVID-19 on their domestic economies, the OECD countries increased the amount of money spent on Official Development Assistance. This analysis leaves us with a positive and encouraging message that in times of need, the international community is able to come together and take action for the common good beyond national borders.

Conclusion 2

It is also another reminder about the limitations of large language models, such as Chat GPT. There are several generative AI models which can produce somewhat original text, images, and other data, based on a statistical analysis of information on the internet. For example, Chat GPT, the most successful of these models, was trained on 570GB of information (approximately 385 million pages on Microsoft Word), allowing it to generate human-like seamless outputs within seconds. However, due to associated costs, the model is pre-trained, which means there is a cut-off date for source information. In addition, all the misinformation, fake news, and biased texts found online are also fed into training the model. The engineers at Open AI are taking measures to tackle the issue of misinformation, but that is a tall order, even if you have the best intentions. So, while humans can investigate and find the truth, it is a much more elusive task for pre-trained AI models.

P.S. Most of the OECD countries still need to catch up to their commitment to spend 0.7% of their GNI on development assistance.

Blockchain solution to advance conscious consumerism

October 30, 2022June 20, 2023 hpanahovLeave a comment

Introduction

A couple of hundred years ago, an average person’s food source radius was around 10 miles. Today our food basket is a product of a complex web of farmers, freighters, trailers, retailers, and suppliers, that stretches into thousands of miles long supply chains. In the last few years, many ideas have emerged about how to improve the status quo in the supply chain domain. When the COVID-19 pandemic put the global supply chains to the test, it exposed some fundamental shortcomings, especially in the food industry, and emphasized the need for new solutions. One of the most innovative and widely discussed ideas is the application of blockchain technologies, which has captured the public imagination since 2009. This paper looks at the application of this new technology in the food industry from the perspective of customers and discusses its potential perspective and cons.

Blockchain

Blockchain is software that creates a network of computers for the authentication and verification of digital documents. It is also called Distributed Ledger Technology due to its core idea, which is for various blocks of information in a chain of computers to keep tabs on each other or maintain autonomous ledgers of transactions to avoid duplications or double-spending. The concept of a blockchain protocol was first put forward in 1972 by an American computer scientist David Lee Chaum in his dissertation work “Computer Systems Established, Maintained, and Trusted by Mutually Suspicious Groups,” where he argued that “a number of organizations who do not trust one another can build and maintain a highly secured computer system,” or network of vaults, they can all trust (Chaum, 1972). However, this idea did not attract much attention from any industry until the release of Bitcoin cryptocurrency decades later. When the unknown author (or authors), under the pseudonym Satoshi Nakomoto, published the white papers for Bitcoin in October 2008, the blockchain technology that enables it stirred a lot of attention. Even people skeptical about cryptocurrency became intrigued about other potential applications of the technology that powers it.

Supply Chain

One of the most notable books about the global impact of this technology is titled Blockchain Revolution, written by Don and Alex Tapscott. In the book, first published in 2016, the authors remark, “We often get asked, “What is the next big killer app for blockchain?… There is no better candidate than the global supply chain, an industry that runs two-thirds of the global economy” (Tapscott & Tapscott, p. 5). Modernization of supply chain operations and management is long overdue, and in recent years exciting ideas have emerged about using blockchain applications to address some of the most salient issues in this domain. In a short period, dozens of new startups and several large corporations started looking for ways to leverage the potential of blockchain applications for improving supply chain management. For example, in 2017, Walmart, along with some of its biggest suppliers in the food category, such as Dole, Kroger, McCormick, Nestlé, Tyson Foods, and Unilever, developed a partnership with IBM to use blockchain for food traceability (Sristy, 2017). Since the COVID-19 crisis, which exposed some of the fundamental issues in the status quo of the supply chains, especially in the food industry, the appetite for innovative solutions has grown even larger.

What would it mean for consumers?

Government and businesses have a common interest in the consumer, with the private sector wanting to earn repeat business from customers and the government wanting products to be safe for consumption. If blockchain becomes a mainstream solution in supply chains, it would create more transparency and make it easier to track food products not only for business operators and government regulators but also for customers. According to Walmart, in 2016, it took their food safety specialists 6 days, 18 hours, and 26 minutes to track where a package of sliced mangoes came from (Sristy, 2017). However, with the help of blockchain applications, they were able to cut it down to 2 seconds. It is as easy as scanning a barcode, and it means even customers in a store can pull out a phone, scan a product and see the entire life story of a given product.

Considering the increasing customer interest in learning more about their food basket, blockchain could mean a win both for consumers and businesses who want to gain a marketing edge. According to a report released in 2022 by FMI – Food Industry Association and NielsenIQ, 81% of shoppers say “transparency is important or extremely important to them both online and in-store” (NielsenIQ & The Food Industry Association, 2022). This growing demand for more information is why we see more and more labels on products such as Animal Welfare Approved, Cruelty-Free, Certified Naturally Grown, Fair Trade, etc. Today, there are hundreds of food certification labels in the US that non-profit, public organizations issue. Most of these entities are well-intentioned, but they do not always have the resources/capacity to enforce high standards. For example, a Peruvian Fairtrade-certified coffee producer told Financial Times, “There is no way to enforce, control and monitor – in a remote rural area of a developing country – how much a small farmer is paying his temporary workers” (Weitzman, 2006). So blockchain could help these certificate-issuing entities to keep the customers better informed.

One successful blockchain startup Banqu, used in 50 countries, makes even the smallest farmers in remote areas bankable, as they receive a text message with a unique code after every transaction, which they present to local banks and then receive their money. It creates unprecedented visibility for small farmers and significantly reduces the opportunities for fraud. We live in an interconnected world, where small farmers in rural areas of distant countries play a role in shaping our daily food basket. Banqu (a wordplay of bank and you) suggests that with their application, “you will know who harvests your crops, who collects your waste for recycling, and who mines your gems and minerals,” and claims that companies that partner with them perform better on climate action, poverty alleviation and human rights (Banqu, n.d.). Blockchain could shed light of transparency along the entire line of the supply chains and make visible even the smallest farmers all the way upstream to the final destination consumers.

Conscious consumerism or consumer activism are trends that are expected to grow in influence over time. New generations are more likely to leverage their individual purchasing choice and mobilize over social media platforms for collective action against corporate practices they disagree with. According to LendingTree, in 2020, 38% of Americans boycotted a company, mainly due to a political or covid-19 related issue. This number was 51% and 52% for Gen Z and millennials, while it averaged 22% and 16% for baby boomers and the silent generation (Holmes, T. 2020). As one example, in 2020, consumers from around the world mobilized behind the #payup campaign that forced large brands such as Gap, Levi’s, Zara, Nike, Nike, H&M, and Ralph Lauren, among others, to pay up wages to employees in Bangladesh, as they were facing cancellations of orders north of $15 billion (Bobb, 2020). Thus, there is growing customer interest in the products and how those products reach the store shelves, and blockchain could be an effective method both for consumers and civil society organizations to investigate that process and make their choice.

Limitations

Blockchain is a powerful technology with tremendous potential for driving positive changes in supply chain management, but it is not a solution that works magically on its own. There are a number of limitations that need to be addressed. First, we are better informed about the successful applications of blockchain, whereas most blockchain initiatives have failed (Alighieri, 2019). So, businesses need to study both successes and failures in order to identify the optimal implementation plan.

The second shortcoming is that blockchain is not the only solution, and there could be less complex remedies to address the current issues in supply chains – the argument being that blockchain is a big burden with high alternative costs and low benefits. For example, in 2017, McKinsey released a study titled “Blockchain technology for supply chains—A must or a maybe?”, which suggests that “well-managed central databases with good data management, combined with supply-chain visualization and analytical prowess, can be achieved at scale today,” so there is “a good-enough solution without blockchain” (Alicke et al., 2017). However, blockchain generates more reliable real-time data that can be verified, with the cost to attain this data decreasing as more adopt blockchain.

Lastly, another potential impact of blockchain technology is that it could create closed ecosystems that are not inclusive and skew the level playing field for smaller businesses. In the federal government, the Department of Homeland Security (DHS) has already taken the initiative to lead the efforts against the potential “walled gardens” effects of blockchain solutions. According to the Science and Technology Directorate under the DHS: “The challenge with blockchain technology is the potential for the development of “walled gardens” or closed technology platforms that do not support common standards for security, privacy, and data exchange… this would limit the growth and availability of a competitive marketplace of diverse.” (Blockchain Portfolio, 2022). To avoid the trap of “walled gardens,” there is a need for both public-private partnerships, as well as cross-disciplinary collaboration among legal experts, computer scientists, and business experts.

Conclusion

In the last several years, we have experienced profound changes in last-mile delivery, where customers can track their orders live with updates at every stage of the transit process. However, the upstream supply chain networks have mostly stayed the same for decades. Farmers, container carriers, railways, and large trucks have relied on traditional methods and paper transactions to manage their operations. Blockchain applications could be the turning for stakeholders in the upstream supply chains to step into the modern age. A distributed ledger technology can deliver several benefits to businesses, such as higher risk resiliency due to food traceability. Widespread application of blockchain technologies would also facilitate the job of federal agencies to enforce higher food standards and respond to crises, such as food-borne illness outbreaks. Finally, customers also stand to benefit from more transparent food supply chains, especially in light of trending conscious consumerism.

References

Alicke, K., Davies, A., & Leopoldseder, M. (2017, September 12). Blockchain technology for supply chains–A must or a maybe? McKinsey. Retrieved October 11, 2022, from https://www.mckinsey.com/capabilities/operations/our-insights/blockchain-technology-for-supply-chainsa-must-or-a-maybe

Alighieri, D. (2019, May 20). Why Enterprise Blockchain Projects Fail. Forbes. Retrieved October 12, 2022, from https://www.forbes.com/sites/dantedisparte/2019/05/20/why-enterprise-blockchain-projects-fail

Banqu. (n.d.). BanQu: About. Supply Chain Software – Blockchain Platform. Retrieved October 11, 2022, from https://banqu.co/

Blockchain Portfolio. (2022, January 8). Blockchain Portfolio | Homeland Security. Retrieved October 10, 2022, from https://www.dhs.gov/science-and-technology/blockchain-portfolio

Bobb, B. (2020, July 10). Garment Workers Are Finally Getting Paid The Billions They’re Owed From Brands Like Gap and Levi’s. Vogue. Retrieved October 11, 2022, from https://www.vogue.com/article/remake-payup-campaign-social-media-garment-workers-wages-gap

Chaum, David Lee. 1972. Computer Systems Established, Maintained and Trusted by Mutually Suspicious Groups. University of California, Berkeley

Holmes, T. (2020, July 20). 38% of Americans Are Currently Boycotting a Company, and Many Cite Political and Coronavirus Pandemic-Related Reasons. Lending Tree. Retrieved October 11, 2022, from https://www.lendingtree.com/credit-cards/study/boycotting-companies-political-pandemic-reasons/

Hyperledger Foundation. (n.d.). Walmart Case Study – Hyperledger Foundation. Hyperledger Foundation. Retrieved October 10, 2022, from https://www.hyperledger.org/learn/publications/walmart-case-study

Iakovou, E. I., & White III, C. (2022, September 14). A data-sharing approach for greater supply chain visibility. Brookings Tech Stream. Retrieved October 6, 2022, from https://www.brookings.edu/techstream/a-data-sharing-approach-for-greater-supply-chain-visibility/

NielsenIQ and FMI – The Food Industry Association. (2022, January 25). Transparency in an evolving omnichannel world. NielsenIQ. Retrieved October 11, 2022, from https://nielseniq.com/global/en/insights/report/2022/transparency-in-an-evolving-omnichannel-world/

Sristy, A. (2017, August). Blockchain in the food supply chain – What does the future look like? Walmart One. Retrieved October 11, 2022, from https://one.walmart.com/content/globaltechindia/en_in/Tech-insights/blog/Blockchain-in-the-food-supply-chain.html

Tapscott, D., & Tapscott, A. (2018). Blockchain Revolution: How the Technology Behind Bitcoin and Other Cryptocurrencies Is Changing the World. Penguin Publishing Group.

Walmart Case Study – Hyperledger Foundation. (n.d.). Hyperledger Foundation. Retrieved October 10, 2022, from https://www.hyperledger.org/learn/publications/walmart-case-study

Weitzman, H. (2006, September 8). The bitter cost of ‘fair trade’ coffee. Financial Times. Retrieved October 11, 2022, from https://www.ft.com/content/d191adbc-3f4d-11db-a37c-0000779e2340

The Evidence-Based Policymaking Act and Privacy

June 1, 2022May 26, 2025 hpanahovLeave a comment

Abstract:

The legislative act on The Foundations of Evidence-Based Policymaking created a framework for the centralization of statistical information collected by dozens of US federal agencies across the country and imposed responsibilities for sharing that data within the government, as well as with researchers and private entities. One of the main outcomes of the act is expected to be a National Secure Data Service, which will promote collaboration, help to avoid duplication, and minimize public expenditure on data collection and processing. Most importantly, it will improve the government efficiency by restructuring the national statistical ecosystem to better inform policy decisions. However, the centralization of federal data foretold by EBPA creates new privacy risks and vulnerabilities, which is why in the 1960s similar idea of a National Data Center was rejected in Congress. Back then, the debate around data centralization ended with the passing of the Privacy Act of 1974. A semi-century later, the data centralization idea has been approved, but no changes were made to the privacy legislation. This paper argues that, while EBPA is a positive step forward, it needs additional privacy safeguards that could be provided by revising the Privacy Act of 1974, which was last updated in 1988.

1. Background

This section looks at why the government collects data, how its institutional and technical capacity to process data has changed over time, and its consequential impact on the public debate around privacy.

1.1 Purpose

The corpse idea behind the Foundations of Evidence-Based Policymaking Act of 2018 (EBPA) is to create metrics for analyzing the government’s policy decisions and thus improve the federal government’s effectiveness. According to title 44 of the U.S. Code, the term evidence means “information produced as a result of statistical activities conducted for a statistical purpose” (44 USC 3561: Definitions). However, not all statistics are the same, and relying on bad data can do more harm than good. So, EBPA intends to increase not only the quantity of data supplied for informing policy decisions but also the quality.

1.2 Why do governments collect data?

The collection of certain data types is essential for a government to carry out its basic functions. As far back as five-six thousand years ago, ancient governments in Babylonia and Egypt collected some primitive forms of census data. Early governments needed the census data mainly for taxation and military recruitment. However, with the emergence of democratic states, census data became a crucial element of political representation. In the United States, holding a decennial census is embedded in the Constitution. Article 1, Section 2 of the Constitution mentions that “the actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States and within every subsequent Term of ten Years” (The National Constitution Center). The Nation’s Founders intended to equally divide the seats in Congress among the States and their populations. The initial benchmark was one Congress representative for every 30 thousand residents (Gauthier). (Today, that number hovers around 700 000) Consequently, census data was necessary to advance democratic governance.

1.3 Changing capacity to process data

However, producing quality data did not come so easily, as it requires institutional capacity building, training of professional staff, a certain level of public awareness, and resources to provide for all this. The first census in the U.S. took place in 1790 and counted the total population as 3,929 214 (A Timeline of Census History). Then, both President George Washington and Secretary of State Jefferson expressed skepticism and thought it was undercounted. Until 1840, State Secretaries were put in charge of organizing the decennial census, which was a temporary assignment. In 1849, Congress established a census board to oversee data collection, and the responsibility for census data shifted from the Department of State to the Department of Interior (DOI). And only in 1902, the Census Bureau became a permanent agency under the DOI (A Timeline of Census History).

This gradual shift from a temporary ad-hoc group of amateurs to a permanent government bureaucracy happened parallel to the increasing complexity of census operations and the government’s growing demand for quality data. It is noteworthy that two other federal statistical agencies were established before the Census Bureau. One is the National Center For Education Statistics, founded in 1867, and the other is the U.S. Bureau of Labor Statistics, founded in 1884. Today, overall, the U. S. has thirteen principal Federal Statistical Agencies and more than 90 federal organizations that engage in statistical activities. Please, see Table 1 for the full list of thirteen principal statistical agencies in the U.S.

Table 1: 13 Principal Federal Statistical Agencies in the U.S.:

Agency	Governing body	Founded
Bureau of Economic Analysis	Department of Commerce	1972
Bureau of Justice Statistics	Department of Justice	1979
Bureau of Labor Statistics	Department of Labor	1884
Bureau of Transportation Statistics	Department of Transportation	1992
Census Bureau	Department of Commerce	1903
Economic Research Service	Department of Agriculture	1961
Energy Information Administration	Department of Energy	1977
National Agricultural Statistics Service	Department of Agriculture	1961
National Center for Education Statistics	Department of Education	1867
National Center for Health Statistics	Department of Health and Human Services	1960
National Center for Science and Engineering Statistics	Independent	1950
Office of Research, Evaluation and Statistics	Social Security Administration	1935
Statistics of Income	Department of Treasury	1916

One of the forces driving the increasing demand for quality data was the transition of the U.S. to a welfare state. By definition, a welfare state means “a state that is committed to providing basic economic security for its citizens by protecting them from market risks associated with old age, unemployment, accidents, and sickness” (Weir). In order to efficiently allocate resources and provide targeted assistance, the state needed more complex and accurate databases on individual citizens. A turning point became the Social Security Act of 1935, which was part of the New Deal, a series of government programs in response to the Great Depression. At the time, the United States was the only modern industrial country that did not have a social security system.

One of the main provisions of the Social Security Act was the creation of the Social Security Number (SSN), which assigned a unique 9-digit number to every U.S. citizen, as well as a permanent and temporary resident. Over time the social weight and public perception of the SSN changed. Carolyn Puckett, working for the Office of Research, Evaluation, and Statistics at the Social Security Administration, wrote in 2009 that “created merely to keep track of the earnings history of U.S. workers for Social Security entitlement and benefit computation purposes, it. [SSN] has come to be used as a nearly universal identifier” (Puckett). It turned into the primary method for public services to identify citizens and organize the individual records.

1.4 Changing perceptions

As the number of entries about citizens started going up in the following years, various concerns about its privacy implications started emerging. In her 2018 book “The Known Citizen: A History of Privacy in Modern America,” Sarah E. Igo, Professor of History and the Dean of Strategic Initiatives for the College of Arts and Science at Vanderbilt University, writes that with the passage of the Social Security Act of 1935, “questions about how thoroughly the state ought to know its own people became less theoretical” (Igo, p. 57). Professor Igo writes that until the 1930s, public perception was that the government tracked only the troubled citizens and marginal communities to maintain public order. However, in the New Deal era, the government’s administrative tracking captured even more privileged citizens, and “being known to the government” became “increasingly constitutive of citizenship itself: a necessary exchange for steady employment, increased economic security, and free movement across borders” (Igo, p. 56).

Initial public reactions to the newly instituted Social Security programs were largely positive, especially during the years of World War II, when it enabled the government to efficiently identify and provide assistance for war veterans and wounded warriors. Some people went even as far as tattooing their social security numbers on their bodies to make sure they would not forget their nine digits. However, in the following decades, especially as the economic crisis and war waded into history, public debate about government databases shifted into a new phase. On the one hand, some government bureaucrats and social scientists believed that increasing public data records’ quantity and quality would lead to more efficient social and economic policies. On the other hand, many civil society activists and legal scholars were voicing concerns that swelling volumes of databases on citizens was an invasion of privacy.

1.5. The National Data Center

In this context, the story of the failed National Data Center in the 1960s is especially noteworthy and extremely relevant to the debate around the Evidence-Based Policy Act adopted in 2018. It started with a request from a group of social scientists, who in 1965 “recommended that the federal government develop a national data center that would store and make available to researchers the data collected by various statistical agencies” (Kraus, p. 1). The ensuing political turmoil is captured very eloquently in the “Statistical Déjà vu: The National Data Center Proposal of 1965 and Its Descendants” paper Rebecca Kraus wrote in 2013. On one side, some social scientists believed that “government programs designed to address social issues, such as civil rights, housing, employment, welfare, education, and poverty” could be improved if the academic community had access to the public data generated by the federal government (Kraus, p. 4). On the other hand, privacy advocates were concerned about the potential risks and vulnerabilities such a center would create. The proposal of the National Data Center lost fume in 1970, when the Bureau of the Budget, which led the research behind it, was reorganized into the Office of Management and Budget.[1]

2. The Commission

2.1 Formation of the Commission

In March 2016, Speaker of the House Paul Ryan and Senator Patty Murray put forward the bipartisan Evidence-Based Policymaking Commission Act of 2016, which President Barack Obama signed within the same month. It laid the foundation for the establishment of the U.S. Commission on Evidence-Based Policymaking (CEP), directed to “consider how to strengthen government’s evidence-building and policymaking efforts,” as well as “study how the data that government already collects can be used to improve government programs and policies,” and present its findings and recommendations to the Congress and the President.

2.2 Bipartisan initiative

It is worth underlining the bipartisan nature of this initiative. Two congress leaders, Democratic Senator from Washington State Patty Murray and Republican Speaker of the House of Representatives from Wisconsin Paul Ryan, had established good relations back in 2013 when they achieved breakthrough success with the Bipartisan Budget Act of 2013. The bill allowed Congress to avert a government shutdown and, in the long run, to save close to $23 billion. Patty Murray and Paul Ryan had made only small compromises to achieve the breakthrough, and both were applauded for the ensuing agreement. Three years later, they built on this success and initiated the CEP. During the introduction of the commission’s findings, Senator Murray said that “No matter what side of the aisle you’re on, we should all agree that government should work as efficiently as possible for the people it serves” (U. S. Senator Patty Murray). At the same time, Ryan Paul remarked that “Patty and I have long advocated for a way to better measure the federal government’s effectiveness—and this bill puts those efforts into action” (U. S. Senator Patty Murray).

2.3 Composition of the Commission

Consequently, the Commission was comprised of individuals who did not have strong political affiliations. They were mostly academics, some with prior experience in the federal government, one current employee of the U.S. Office of Management and Budget, and three from the private sector. Two of the commission’s fifteen members are well-known privacy advocates: Paul Ohm, a Professor of Law at the Georgetown University Law Center, and Latanya Sweeney, Professor of the Practice of Government and Technology at the Harvard Kennedy School. They are both well recognized for their research and publications on privacy law and policy. Paul Ohm’s position is that “data can be either useful or perfectly anonymous but never both.” Latanya Sweeney was a graduate student at the Massachusetts Institute of Technology in 1997 when she reidentified the Massachusetts Governor Bill Weld connecting his publicly accessible records to his anonymized medical records (Meyer). This made a big public impact and led to new legal restrictions on the disclosure of protected health information under the Health Insurance Portability and Accountability Act, known as HIPAA. So, within the CEP, Ohm and Sweeney advocated for additional frictions in accessing the government databases and adding layers of privacy protections.

Table 2: Members of the U.S. Commission on Evidence-Based Policymaking

	Name	Affiliation
1	Commissioner and Chair Katharine G. Abraham	University of Maryland
2	Commissioner and Co-Chair Ron Haskins	Brookings Institution
	Commissioners:
3	Sherry Glied	New York University
4	Robert M. Groves	Georgetown University
5	Robert Hahn	University of Oxford
6	Hilary Hoynes	University of California, Berkeley
7	Jeffrey Liebman	Harvard University
8	Bruce D. Meyer	University of Chicago
9	Paul Ohm	Georgetown University
10	Nancy Potok	U.S. Office of Management and Budget
11	Kathleen Rice Mosier	Faegre Baker Daniels, LLP
12	Robert Shea	Grant Thornton, LLP
13	Latanya Sweeney	Harvard University
14	Kenneth R. Troske	University of Kentucky
15	Kim R. Wallin	D.K. Wallin, Ltd.

However, on the other side of the debate were social scientists who believed that access to more data would improve both the quality of academic research and the efficiency of the government’s public policy. Consequently, there were many heated debates within the commission. The CEP held its first meeting in July 2016 and presented its final report in September 2017. During this period, they surveyed 209 Federal offices that work with evidence (data), invited 49 witnesses, held meetings with 40 organizations, hosted three public hearings, and reviewed comments from 350 respondents in the Federal Register (Bipartisan Policy Center). When the time came, they were able to present a final document that was undersigned unanimously by all commission members.

2.4 Recommendations

The final report of the commission, titled The Promise Of Evidence-Based Policymaking, was presented to the public on September 7, 2017. It included 22 specific recommendations that fell under four categories: 1. Improving Secure, Private, and Confidential Data Access; 2. Modernizing Privacy Protections for Evidence Building; 3. Implementing the National Secure Data Service; 4. Strengthening Federal Evidence-Building Capacity.In the 138-page document, the word privacy is used – 390 times, secure – 183 times, and confidential – 12. Overall, the report recognizes that “the country’s laws and practices are not currently optimized to support the use of data for evidence building, nor in a manner that best protects privacy” and suggests several measures to address this issue (Commission on Evidence-Based Policymaking).

2.5 The National Secure Data Service

One of the central ideas in the report is the establishment of a National Secure Data Service (NSDC), a kind of a successor to the idea of the National Data Center from the 1960s. Back then, during one of the Congressional hearings, economist Richard Ruggles had remarked that “although the emphasis in the privacy hearings was mainly on the possible danger of centralizing records, they also brought out that in some instances, the centralization of files can result in increasing the protection of individual privacy in situations where there have been flagrant abuses” (Kraus, p. 21). Building on this premise, members of the CEP believed that creating a centralized data center could enhance both the quality of data and privacy standards. The report suggested that the NSDC could learn from the expertise and institutional knowledge of the Center for Administrative Records Research and Applications (CARRA) and the Center for Economic Studies (CES) under the Census Bureau, which have been carrying out similar functions.

3. The Legislation

3.1 Passing into law

The Foundations for Evidence-Based Policymaking Act passed the House of Representatives on November 15, 2017. About eleven months later, the Senate approved the bill, as amended, by a unanimous vote. In January 2019, the President signed the “Foundations for Evidence-Based Policymaking Act of 2018” into law (Legislative Bulleting).The final act, which is about thirty pages long, makes only seven references to privacy, but it creates clear boundaries for the use of public data, assigns responsible parties for handling and protection of databases, and assumes legal penalties for the violations of the act’s provisions. Overall, the Act presents several progressive and innovative approaches to handling public data, but whether a sufficient level of privacy protections supplements these new practicesrequires a closer examination.

3.2 Legal Amendments

It goes without saying that the act is not built in a vacuum but rather supplements a complex system of pre-existing rules and regulations. The full title of the EBPA is: “to amend titles 5 and 44, United States Code, to require Federal evaluation activities, improve Federal data management, and for other purposes.” Title 5 of the U.S. Code is about “Government Organization And Employees,” and it contains regulations, such as The Freedom of Information Act (FOIA) adopted in 1967 and the Privacy Act of 1974. FOIA provides the American citizens the right to request access to records from any federal agency, given it does not violate certain privacy and confidentiality rules (Branscomb).[2] Privacy Act of 1974 established “a code of fair information practices that govern the collection, maintenance, use, and dissemination of information about individuals that is maintained in systems of records by federal agencies” (Privacy Act of 1974). EBPA did not make any changes either in FOIA or the Privacy Act but complimented title 5 of the U.S. Code with additional provisions about federal government data handling practices.

Title 44 of the U.S. Code is about “Public Printing and Documents” and covers all the archives, registries, and records managed by the federal government. Most provisions of the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) also fall under title 44. CIPSEA was part of the broader E-Government Act of 2002, and it established uniform confidentiality standards to protect the data collected by federal statistical agencies. The purpose was to avoid opportunities for triangulating data points and reidentifying respondents based on data shared by various statistical agencies. The Evidence-based Policymaking Act repealed CIPSEA 2002 and instead reauthorized CIPSEA 2018, with the overall intention of providing more opportunities to use public data for statistical purposes and imposing more responsibilities for risk aversion (Ruyle).

The EBPA also passed into law the “Open, Public, Electronic, and Necessary Government Data Act,” also known as the OPEN Government Data Act. Since 2009, the U.S. General Services Administration has been running a website Data.gov, which publishes for public access machine-readable datasets produced by the executive branch of the national government (Data.Gov). In March 2017, House democratic representative Derek Kilmer from Washington State proposed the OPEN Government Data Act that would expand the coverage of the data.gov and require “open government data assets made available by federal agencies (excluding the Government Accountability Office, the Federal Election Commission, and certain other government entities) to be published as machine-readable data… when not otherwise prohibited by law” (H.R.1770 – 115th Congress). All in all, EBPA was not an out of the blue, disruptive legislature, but rather another step towards open data and evidence-based policymaking that was plugged into the pre-existing legal infrastructure.

3.3 Statistical purpose

A top priority in the text of the EBPA is ensuring that only anonymized aggregate data will be shared to protect the confidentiality of respondents. One of the most frequently used terms is “statistical purpose” (mentioned 35 times), which according to the title 44 of the U.S. Code, means “the description, estimation, or analysis of the characteristics of groups, without identifying the individuals or organizations that comprise such groups” (44 USC 3561: Definitions). For example, collecting and processing data on the overall number of traffic incidents in Washington DC falls under statistical purposes. However, if the data is used to calculate car insurance rates adjusted for individual drivers in Washington DC, that would be a non-statistical use. For most social research and public policy purposes, aggregate data is sufficient. For example, if the unemployment rate among the Hispanic population is higher than other groups, then the government can initiate a tailored policy approach targeting specifically that group. However, when very large quantities of data are centralized in one place and various bits and parts are shared on public platforms, it creates opportunities for reverse tracking the data points, making meaningful connections, and reconstructing certain parts of the database not meant for public disclosure.

3.4 Risks and Responsibilities

From this standpoint, EBPA puts a big responsibility on the heads of federal agencies and will hold them accountable for determining “risks and restrictions related to the disclosure of personally identifiable information, including the risk that an individual data asset in isolation does not pose a privacy or confidentiality risk but when combined with other available information may pose such a risk.” Additionally, the law establishes the position of Evaluation Officer in each agency, whom the head of the agency will designate without regard to political affiliation. The main function of the Evaluation Officer will be to “continually assess the coverage, quality, methods, consistency, effectiveness, independence, and balance of the portfolio of evaluations, policy research, and ongoing evaluation activities of the agency.”

However, EBPA centralizes the data generated by all federal agencies. So to minimize the risks of reidentification, there is a need for interagency coordination. For this purpose, the law expands the functions and the institutional scope of the Interagency Council on Statistical Policy (ICSP), established under section 3504 (e)(8) of title 44, which designates the head of the Office of Management and Budget as head of the Council. In the 1980s, ICSP was an informal group that brought together representatives from federal statistical agencies to coordinate their activities, but it was authorized by statute as a formal council in 1995 (The Structure of the Federal Statistical System).. The Paperwork Reduction Act of 1995 has put the OMB, namely its Office of Information and Regulatory Affairs (OIRA) division, in charge of coordinating the U.S. Federal statistical system (Statistical Programs & Standards). The head of OIRA’s Statistical and Science Policy Office is also the Chief Statistician of the U.S.,[3] who hosts the meetings of the ICSP on a monthly basis. Under the EBPA, heads of statistical units or other officials with appropriate expertise from other federal agencies will also join the ICSP, which will have more responsibilities.

3.5 Upcoming assessment

The new law also establishes the position of Chief Data Officer (CDO) in each agency, who are “responsible for lifecycle data management,” as well as managing “data assets of the agency, including the standardization of data format, sharing of data assets in accordance with applicable law,” among fourteen other duties outlined in the law. Furthermore, section § 3520A of the EBPA provisions the establishment of the Chief Data Officer Council, which also falls under the OMB, but is separate from the ICSP. It is a temporary council that brings together representatives from 39 federal agencies. The CDO Council is assigned a number of tasks to complete before January 2025 (when it will disintegrate) (About Us. Federal CDO Council): “1. establish Governmentwide best practices for the use, protection, dissemination, and generation of data; 2. promote and encourage data sharing agreements between agencies; 3. identify ways in which agencies can improve upon the production of evidence for use in policymaking,” etc. So, certain provisions of EBPA are still in the assessment phase, and it will take a couple more years for EBPA to fully unpack.

4. Privacy

4.1 What is privacy?

The word “privacy” traces its roots to the Latin word privus, which means separate or single. The Merriam-Webster dictionary offers two definitions for privacy: 1. the quality or state of being apart from company or observation; 2. freedom from unauthorized intrusion. However, there are different approaches to privacy in the scholarly community and, consequently, different definitions. Generally, the significance and value of privacy may change depending on the social, political, and cultural circumstances, which has made it an elusive concept for a consensus definition. Nonetheless, the debate around privacy has been trending since the mid-twentieth century. It is not likely to end anytime soon, as modern technologies move us into uncharted territories with new friction points.

4.2 Privacy as a human right

A popular privacy perspective views it as a fundamental human right, or “the right to be left alone,” protected by law (MacCarthy, 2017). This approach recognizes an individual’s right to personal physical and informational space protected from external intrusion. The United States Constitution provides certain privacy protections. The Constitution’s fourth Amendment states: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated.” The Fifth Amendment provides conditional protections for private information by creating the right against self-incrimination. In the U.S. Common Law system, there are a number of court cases, such as Griswold v Connecticut, Lawrence v Texas, among others, that broaden the scope of these constitutional privacy protections. Additionally, the United States has dozens of legislation providing sectoral privacy protections. For example, the aforementioned HIPAA provides privacy protections for medical records, or the Family Educational Rights and Privacy Act legally restricts access to student records.

4.3 Privacy as a harm

Another privacy perspective is to view it as a right to be protected from harm. This approach shifts the focus of privacy debate from an individual to the level of society and asks, what the implications of data collection are on the society overall. This leaves a smaller space for personal information protection, which applies only when there is direct and tangible harm to the individual. Privacy as a harm framework emerged in the 1970s when one of its pioneers, Richard Posner wrote that we have two economic goods, “privacy” and “prying,” and expanding the privacy protections of individuals while contracting the rights of organizations collecting data is against our common interests (Posner, 1978). More recently, Howard Beales and Timothy Muris made a case for privacy in the harm framework by highlighting the example of credit score reporting since “collecting financial information about individuals has made loans more accessible to general public” (Beales & Muris, 2008). So, the social benefit of more accessible loans trumps the individual’s right to withhold financial information. This approach also prioritizes data protection over data collection and emphasizes the right to be protected from harmful externalities of data versus the data collection itself.

4.4 Privacy in social context

The most recent addition to the privacy debate was made by Hellen Nissenbaum, Professor at Cornell University. In her 2007 book, “Privacy in Context” Nissenbaum laid out a new privacy framework, which integrates elements from both the human rights and harm frameworks. Nissenbaum builds on the premise that privacy is a social construct, so its interpretation and application may vary depending on the social circumstances. From this vantage point, structured social factors such as canonical activities, roles, norms, and values, define the optimal degree of data access and visibility (Nissenbaum, p. 17). For example, the doctor you are visiting may have access to your medical records, but an insurance company may not. A basic quality of the social context framework is that privacy does not stop the flow of information but facilitates the information flow to some stakeholders while restricting it for others (MacCarthy, 2017). It is hard to disagree with Nissenbaum that privacy is a social construct, the value of which tends to change across geographic space, time, and other conditional factors. Especially nowadays, data has become omnipresent, and a uniform, rigid approach to all privacy issues cannot be the solution moving forward. Sometimes privacy is an inalienable human right. Other times there is a common public interest in sharing certain pieces of information that would be otherwise considered private. So, the social context framework offers a matrix that is broad, structured, and flexible enough to be applied across the privacy landscape.

4.5 Reasonable expectation of privacy

One of the most common reference points in the debate about privacy is the notion of “reasonable expectation of privacy.” It traces back to the seminal Supreme Court case Katz v. United States, which took place in the 1960s when the debate around the first National Data Center was ongoing. The Court’s decision expanded the Fourth Amendment privacy protections to include “what [a person] seeks to preserve as private, even in an area accessible to the public.” In concurrence with the final decision, Justice John Harlan established a two-part privacy test, which relies on the subjective expectations of the individual under query and the objective expectations of privacy by society as a whole. However, Hellen Nissenbaum, along with other contemporary privacy scholars, believes that due to the impact of modern disruptive technologies, the binary approach to privacy of inside/outside, secret/not secret or expected/not expected is somewhat outdated. Nissenbaum writes that previously “people could count on going unnoticed and unknown in public arenas; they could count on disinterest in the myriad scattered details about them” (Nissenbaum, p. 116), but now it has become far more complicated. New technologies allow capturing myriad details or data points about us into centralized databases, which adds new layers to the privacy debate where little details make big differences.

5. Analysis

This section takes a closer look at the implications of EBPA by asking what the risks and opportunities are in the centralization of federal data from a privacy standpoint.

5.1 Data Centralization

Contrary to one of the top recommendations in the CEP report, the EBPA did not establish a National Secure Data Service, but it did create frameworks for interagency coordination and data centralization. We discussed earlier the expanded role of the ICSP and the temporary Chief Data Officers Council. EBPA also established another temporary council, Advisory Committee on Data for Evidence Building, that brings together Evaluation Officers, Chief Data Officers, and other managers responsible for data handling across the federal statistical system. Currently, the Advisory Committee is administered by the Census Bureau and the Bureau of Economic Analysis (BEA) under the Department of Commerce and works closely with the Office of Management and Budget. (Advisory Committee on Data for Evidence Building). In its Year 1 report, published in October 2021, the Advisory Committee has already affirmed the need for the establishment of the National Secure Data Service, as proposed by the CEP (Advisory Committee on Data for Evidence Building: Year 1 Report).

5.2 Advantages of the NSDS

It is hardly surprising because, from the beginning, one of the top priorities behind the EBPA was the creation of a centralized command and control mechanism over all the data the federal government generates. When CEP started its first meetings for the research on EBPA, it had sixteen talking points, four of which were about the NSDS. It included points such as “tiered access with a NSDS,” or the role of the NSDS in the federal evidence ecosystem (CEP report, p. 123). To follow up on an earlier discussion, in its final recommendations, CEP proposed that it would enable the OMB to create higher standards for data collection and protection, which could be applied across the country. So, the same level of national database protection principles would be applied to data from either small rural communities or large metropolitan areas. NSDS would also help curtail duplicative efforts and improve the efficiency of the statistical agencies. Consequently, it would also decrease the expenses on federal data and reduce the burden on the public.

5.3 Risks and vulnerabilities

However, the federal government is handling very large volumes of data on a routine basis, and the centralization of so much statistical information within the hands of one center creates new privacy risks. First, it may change the public perception of federal data and potentially create a burden on civic life. Second, increasing public access to federal statistics increases the risk to data confidentiality, and EBPA creates obligations for making significant amounts of datasets publicly available.

5.4 Panopticon view

During the first round of privacy debates in the 1960s, Democratic Congressman from New Jersey, Cornelius Gallagher said that improving government efficiency promised by the idea of a National Data Center “would be paid for at the far greater expense of weakening the right to privacy of all American citizens” (Kraus, p. 11). While a privacy scholar Vance Packard concluded his Congressional testimony by noting that “my own hunch is that Big Brother, if he ever comes to these United States, may turn out to be not a greedy power seeker, but rather a relentless bureaucrat obsessed with efficiency” (Kraus, p. 10). As we discussed earlier, privacy is a social construct, and its social value and impact may change depending on the circumstances. For example, we might feel comfortable sharing our medical records with the hospital, educational records with the employer, and income statements with the Internal Revenue System, but it creates a different reality when someone is able to put it all together. It gives an impression that someone knows about you as much as you do, and you are no longer in charge of your privacy. This kind of public opinion is detrimental to civic life, even if it is not based on true facts, as perception becomes a reality, and people inhibit their freedom of self-expression.

One of the most influential philosophers of the 20^th century, Michael Foucault, put forward the concept of panopticism. Its central argument is that people change their behavior, even when there is a modest chance that those in the position of power could watch them. Originally panopticon was a constructional design plan for prisons proposed by English philosopher Jeremy Bentham in the late 18^th century. The idea is that the prison floor is designed in a circular form, where the prison guard sits in the very center and can see all the inmates, but they cannot see the guard. Foucault articulated that this creates a power dynamic, where inmates become their own surveillance because they do not know when they could be monitored.

5.5 Privacy: statistics vs. surveillance

However, there is an important distinction between the statistical analysis foretold by the EBPA and the type of surveillance assumed by the panopticon approach. Surveillance focuses on specific targets, whereas statistics processes aggregate data, and as we mentioned earlier, EBPA puts a heavy emphasis that data will be used for statistical purposes only. Part B of the Act is titled “Confidential Information Protection” and has several safeguards against abuses of the federal databases. For example, it suggests that those who handle the data will take a pledge of confidentiality and will be liable in front of the law for a Class E felony and could be

imprisoned for up to 5 years and/or fined up to $250 000.[4] EBPA also obliges the statistical agencies to clearly distinguish any information that could be used for non-statistical purposes and provide public notice about the actual purpose of the data. However, there are loopholes in the legislation about what will be the mechanisms and conditions for public communication. A 2019 survey by the Pew Research Center showed that 64% of Americans are concerned over the government’s use of public data, while 78% do not understand what government does with the collected data (Auxier, et al). It would be good to have more legal encouragement for the executive agencies like the NSDS to prioritize public accountability and engagement.

5.6 Privacy vs. confidentiality vs. anonymity

People working for the NSDS will also face technical challenges in preserving the confidentiality and anonymity of data. First, let us look at the distinctions between privacy, confidentiality, and anonymity. Confidentiality and anonymity are only about a person’s actions and data, but privacy is also about the person (Privacy and Confidentiality). For example, whether someone may ask you personal questions is a matter of privacy. However, whether they can share your responses with another person is a question of confidentiality. Confidentiality implies that the surveyor knows your identity but will not share it outside a certain social group. Anonymity refers to a condition where even the primary surveyor does not know or register your identity. Both confidentiality and anonymity fall under the bigger umbrella of privacy, but neither captures its full meaning.

5.7 Privacy legislations

Not only in the United States but around the world, privacy regulations do not apply to anonymized data. For example, European Union’s well-known privacy law, the General Data Protection Regulation, has a provision that states that “The principles of data protection should apply to any information concerning an identified or identifiable natural person… this Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes” (Recital 26: Not Applicable to Anonymous Data). The United States Privacy Act of 1974 also has a specific exemption for “statistical record,” which means “a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identifiable individual” (Privacy Act of 1974). EBPA complies with this provision of the privacy law. However, in half a century since the Privacy Act was passed, many changes have happened both in the statistical science and technical capacity of machines to process data.

5.8 Reidentification

We have mentioned earlier the new methods and techniques for reidentification by triangulating data points from several anonymized data sets. Paul Ohm, Georgetown Law professor and a member of the CEP, wrote in his 2010 paper that “Reidentification science disrupts the privacy policy landscape by undermining the faith we have placed in anonymization… advances in reidentification expose these promises as too often illusory” (Ohm, Paul ). To avoid the traps of reconstruction algorithms, statistical experts have developed several data protection mechanisms. For example, for many years, the Census Bureau, forced to publish blocks of its data sets, has been using various noise-infusion techniques, such as ‘swapping,’ ‘blank-and-impute,’ ‘partially synthetic data’ and most recently, differential privacy (boyd and Sarathy, p. 7). These approaches preserve the integrity of the datasets and maintain their full value for most purposes without compromising confidentiality. However, in very few scenarios, these methods could result in minor deviations since data sets are manipulated. These manipulation methods cannot be shared publicly because that would undermine the confidentiality of the datasets. Consequently, these disclosure control methods create friction between data users and the Census Bureau.

6. Recommendations:

6.1 Study the impact on civic activism

In the early 1970’s Advisory Committee on Automated Personal Data Systems was established under the Department of Health, Education, and Welfare to research the potentially harmful consequences of automated personal data systems, effective safeguards to protect against those negative consequences, as well as “policy and practice relating to the issuance and use of Social Security Numbers” (U. S. Department of Health, Education and Welfare). The Committee published its final report titled “Records, Computers, and the Rights of Citizens” in 1973, which had a ripple effect on privacy laws and regulations around the world for the following decades. In the United States, it laid the foundations for the Fair Information Practice Principles, applied by the Federal Trade Commission to the private sector, and made an impact on the Privacy Act of 1974.

Much has changed since the 1970s, and more changes will come after the EBPA is fully unrolled. Now, the federal government needs to conduct a similar study to assess the impact of the EBPA and data centralization on civic activism and freedom of expression. The United States is by far the biggest experiment in human history, testing the power of a society built on individual liberties. One of the cornerstones of America’s success story is the value and emphasis it puts on freedom of self-expression. Even nominal burdens on privacy and civil liberties could be a very high cost to pay for the promises of EBPA.

6.2 Revisiting the legislation

The findings of that report should be built into revising the Privacy Act of 1974. The latest change to the legislation was made in 1988 when Congress passed the Computer Matching and Privacy Protection Act, which requires that federal agencies “enter into written agreements with other agencies or non-Federal entities before disclosing records for use in computer matching programs.” On the official online database of the Congress (Congress.gov), there are 1,200 bills that have the word privacy in the title. Most of them have not passed the House floor, but it shows how complicated is the legal terrain on privacy in the United States. Ninety-four privacy bills were introduced in 1973-74, and then there have been, on average, twenty bills on privacy initiated every year.

Current legislation puts a heavy burden on the statistical agencies to respond to three competing demands. They have to produce good quality data, but they also have to protect the privacy of their respondents. Now, they are also obliged to make these datasets publicly available, which forces them to use various techniques such as differential privacy. However, that makes certain data consumers unhappy, as we can see from the experience of the Census Bureau. So, it would be good to relieve the statistical agencies of some of this burden and provide legal tools and justifications for the privacy protections applied to public datasets.

7. Conclusion

Over the years, U.S. federal statistical agencies have accumulated tremendous institutional expertise and technical capacity to produce large-scale, high-quality data. Now EBPA is rallying up the forces of the federal statistical agencies into a cohesive unit to provide a numerical insight into the performance of the executive branch. It will create an administrative mechanism for informing the government’s policy decisions, as well as a public accountability mechanism since large segments of the government data will be made publicly accessible. However, it also consolidated all the statistical information of the federal government into centralized databases, which creates new privacy risks and vulnerabilities. EBPA is yet to be fully unrolled, but one of its main consequences is expected to be the establishment of the NSDS, which will have an enormous weight on its shoulders as it will need to satisfy several competing demands. On both ends of the line, NSDS will be working with and for the American people, so it is very important to keep them informed and understand public impact and expectations. It is the right time for the U.S. government to conduct a study on the impact of centralized, automated databases on civic life, akin to the one conducted in 1973, and incorporate that into updating the privacy legislation.

References:

About Us. (2020). Federal CDO Council. https://www.cdo.gov/about-us/

Advisory Committee on Data for Evidence Building. (2022). U.S. Bureau of Economic Analysis (BEA). https://www.bea.gov/evidence

Advisory Committee on Data for Evidence Building: Year 1 Report. (2021, October). Office of Management and Budget. https://www.bea.gov/system/files/2021-10/acdeb-year-1-report.pdf

Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2020, August 17). Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information. Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/

A Timeline of Census History. United States Census Bureau.

https://www.census.gov/history/img/timeline_census_history.bmp

Beales, Howard, & Muris, Timothy. “Choice or Consequences: Protecting Privacy in Commercial Information.” 75 U. Chi. L. Rev. 109 2008 pp. 109-120

Bipartisan Policy Center. Frequently Asked Questions Related to the Commission on Evidence-Based Policymaking’s Report. (2019, March). https://bipartisanpolicy.org/download/?file=/wp-content/uploads/2019/03/CEP-FAQs.pdf

boyd, d. & Sarathy, J. “Differential Perspetives: Epistemic Disconnects Surrounding the US Census Bureau’s Use of Differential Privacy”

Branscomb, Anne (1994). Who Owns Information?: From Privacy To Public Access.

Commission on Evidence-Based Policymaking. (2017, September). THE PROMISE OF EVIDENCE-BASED POLICYMAKING. Bipartisan Policy Center. https://bipartisanpolicy.org/download/?file=/wp-content/uploads/2019/03/Appendices-e-h-The-Promise-of-Evidence-Based-Policymaking-Report-of-the-Comission-on-Evidence-based-Policymaking.pdf

Data.Gov. (2022) About. https://data.gov/about/

Dr. Latanya Sweeney’s Home Page. (2021). http://latanyasweeney.org/

Gauthier, J. H. S. (2021). 1790 Overview – History – U.S. Census Bureau. United States Census Bureau. https://www.census.gov/history/www/through_the_decades/overview/1790.html

H.R.1770 – 115th Congress (2017–2018): OPEN Government Data Act. Congress.Gov | Library of Congress. https://www.congress.gov/bill/115th-congress/house-bill/1770

Igo, S. E. (2020). The Known Citizen: A History of Privacy in Modern America. Harvard University Press.

Mark MacCarthy, (2017). “Privacy Policy and Contextual Harm” 13 I/S: Journal of Law and Policy. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3093253

Nissenbaum, Helen (2007). Privacy in Context. Stanford University Press. Kindle Edition

44 USC 3561: Definitions. Office of the Law Revision Counsel (2022).https://uscode.house.gov/view.xhtml?req=(title:44%20section:3561%20edition:prelim)%20OR%20(granuleid:USC-prelim-title44-section3561)&f=treesort&edition=prelim&num=0&jumpTo=true

The National Constitution Center (2022). The Constitution – Full Text. https://constitutioncenter.org/interactive-constitution/full-text

Paul Ohm. (n.d.). PaulOhm.Com. https://www.paulohm.com/

Puckett, C. (2009, July 1). The Story of the Social Security Number. Social Security Administration Research, Statistics, and Policy Analysis. https://www.ssa.gov/policy/docs/ssb/v69n2/v69n2p55.html

U. S. Senator Patty Murray (2017, November 1). Senator Murray, Speaker Ryan Introduce Evidence-Based Policymaking Legislation. https://www.murray.senate.gov/senator-murray-speaker-ryan-introduce-evidence-based-policymaking-legislation/

Legislative Bulletin (2019). The President Signs H.R. 4174, “Foundations for Evidence-Based Policymaking Act of 2018.” Social Security Administration

https://www.ssa.gov/legislation/legis_bulletin_021519.html

Meyer, M. (2018, October 31). Law, Ethics & Science of Re-identification Demonstrations. Harvard Law Petrie Flom Center. https://blog.petrieflom.law.harvard.edu/symposia/law-ethics-science-of-re-identification-demonstrations/

Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization” (August 13, 2009). UCLA Law Review, Vol. 57, p. 1701, 2010. Available at SSRN: https://ssrn.com/abstract=1450006

Posner, Richard. “The Right of Privacy.” Georgia Law Review 393 (1978) pp. 393 – 404

Privacy Act of 1974. (2021, April 30). Department of Justice. https://www.justice.gov/opcl/privacy-act-1974

Privacy and Confidentiality. (n.d.). CHOP Research Institute. https://irb.research.chop.edu/privacy-and-confidentiality

“Privacy.” Merriam-Webster.com Dictionary, Merriam-Webster, https://www.merriam-webster.com/dictionary/privacy. Accessed 3 May. 2022.

Public Law No: 115–435. Foundations for Evidence-Based Policymaking Act of 2018 Congress.gov. (2019). https://www.congress.gov/bill/115th-congress/house-bill/4174

Recital 26: Not Applicable to Anonymous Data. General Data Protection Regulation. (2016).

https://gdpr-info.eu/recitals/no-26

Ruyle, M. (2019, March 1). New Law Offers Reforms to Improve Access to Data, Confidentiality Protections | Amstat News. Magazine of the American Statistical Association. https://magazine.amstat.org/blog/2019/02/01/law-improves-data-confidentiality/

Statistical Programs & Standards. (2021, December 22).The White House.

https://www.whitehouse.gov/omb/information-regulatory-affairs/statistical-programs-standards/

The Structure of the Federal Statistical System. (n.d.). The White House. https://obamawhitehouse.archives.gov/omb/inforeg_statpolicy/bb-structure-federal-statistical-system

Understanding Confidentiality and Anonymity. (n.d.). The Evergreen State College. https://www.evergreen.edu/humansubjectsreview/confidentiality

U. S. Department of Health, Education and Welfare. (1973, July). Records, Computers and the Rights of Citizens. DHEW Publication. https://www.justice.gov/opcl/docs/rec-com-rights.pdf

Weir, M (2001). Welfare State. International Encyclopedia of the Social & Behavioral Sciences

https://doi.org/10.1016/B0-08-043076-7/01094-9

[1] General Service Administration proposed a similar idea in the 1970s to create an inter-connected network of federal government data systems, which did not succeed either.

[2] They have a very user-friendly website operated by the Department of Justice at https://www.foia.gov/

[3] OIRA is also in charge of the cost-benefit analysis laid out in the President’s Executive Order 12866

[4] It is important to note that the Privacy Act of 1974 imposed note more than $5000 fine, which in today’s money equals around $30 000: “Any member, officer, or employee of the Commission… who knowing that disclosure of the specific material is so prohibited, willfully discloses the material in any manner to any person or agency not entitled to receive it, shall be guilty of a misdemeanor and fined not more than $5,000.”

Podcast | Data & Truth with danah boyd

May 22, 2022May 26, 2025 hpanahovLeave a comment

The topic of this episode is data and truth. There is a popular saying that we live in a data driven world? But where is data driving us? According to some estimates the amount of data generated over the next 3 years will be more than the amount of data created over the past 30 years. We have immersed ourselves in zettabytes of data to minimize uncertainty, make sense of the world around us and validate every step we take. But how reliable is all this data and can it really help us find the truth? In this episode we look for the answers to this and other questions with prominent scholar Prof danah boyd, whose research examines the intersection between technology and society. She’s a partner researcher at Microsoft, the founder of the well-known non-profit research institute Data & Society, as well as a Distinguished Visiting Professor at Georgetown University, where she taught a graduate course on Data and Politics of Evidence.

A Critical Review of UNEP’s Food Waste Index

April 5, 2022May 26, 2025 hpanahovLeave a comment

Its Impact and Limitations on Sustainable Consumption Policies

I. Introduction

Sustainable consumption is one of the priority areas in the international development agenda. In 2015, 193 UN member states undersigned the 2030 Agenda for Sustainable Development, which consists of seventeen interlinked Sustainable Development Goals. It is a comprehensive development framework that also focuses on “responsible consumption and production.” However, it is a strategic-level document, which did not take into account the operational-level challenges for developing indicators to measure the progress towards these goals. In 2021, United Nations Environment Program published its first Food Waste Index (FWI) report, which is presented as the most comprehensive report on global food waste and made many news headlines.[1][2] The UNEP has done an enormous job building the groundwork for producing global data on food waste, but the organization attributes low or very low confidence level to nearly 80% of the data used to construct the FWI. Given the context, the FWI is not a reliable benchmark for either measuring progress or informing adequate policy decisions.

II. Background

In September 2015, at the landmark UN Sustainable Development Summit in New York, countries worldwide agreed on a post-2015 global development agenda “to achieve a better and more sustainable future for all people and the world by 2030.”[3] They agreed on 17 Sustainable Development Goals, which are broken down into 169 SDG Targets, which in turn have 232 unique indicators (as of February 2022) to track progress.[4] Particularly, SDG 12 focuses on “responsible consumption and production,” which is about “decoupling economic growth from environmental degradation, increasing resource efficiency and promoting sustainable lifestyles.”[5] There are eight targets under SDG 12, which mainly focus on national policies and big-scale producers, but two of them are about consumer behavior and thus fall within the scope of our research. Target 12.3: reduce food losses along production and supply chains and halve global per capita food waste at the retail and consumer levels;[6] and, 12.8: promote universal understanding of sustainable lifestyles.

SDG Target 12.3 has two indicators: the Food Loss Index produced by Food and Agriculture Organization of the UN and Food Waste Index produced by the UN Environment Programme (UNEP). The Food Loss Index (FLI) measures the percentage of food loss from production up to (but not including) retail level. Food Waste Index (FWI) focuses on the percentage of food wasted at the retail and consumption stages. Since the focus of this paper is on sustainable consumption, I will take a closer look at the Food Waste Index, analyze the data behind it, and assess its impact.

After carefully examining the datasets used for the Food Waste Index, I concluded that existing data are not reliable enough for measuring the progress towards SDG Target 12.3, and advancing tailored policy interventions. However, these conclusions should not undermine the importance of the food waste issue, since every data point, every study and observation demonstrate that there is a significant food waste problem both in economically developed and underperforming countries. It is a major concern, as hundreds of millions around the world suffer from malnutrition, since their caloric intake falls below minimum energy requirements.[7] That is also the reason, why we need to understand the limitations of currently available data.

III. Data Analysis

UNEP worked together with a non-profit organization based in the United Kingdom the Waste and Resources Action Program (WRAP) to produce its first Food Waste Index in 2021, which is considered the “most comprehensive report into global food waste in homes.”[8] The report was published in 2021, but the numbers represent the situation in 2019. According to the report, 17% of all food that reaches retail ends up in the dumpster. Of that number, households are accountable for 61% of food waste, food service industry (restaurants) for 26% and retail for 13%.[9]

These are staggering numbers and to put them in perspective, they mean that roughly 931 million tonnes of food is wasted every year, which is more than the total consumption in a country as big as India. If we combine Food Waste Index with Loss Index, it would mean that more than a third of all food is either lost or wasted somewhere along the chain, which also accounts for nearly 10% of global carbon emissions. However, what if we scratch the surface and look behind the report into the raw data[10] that shaped this report. How reliable are the food waste numbers?

Authors of the report acknowledge that it is very challenging to collect data on food waste and admit that they have high-quality data from only 14 countries,[11] while they have medium confidence in reports from 42 countries. The dataset of the report lists 233 geographic units (mainly UN Member states), and has assigned no estimate, very low confidence or low confidence for data estimates for 183 of them, or 79%.[12] The below pie chart presents a visual breakdown of the data source confidence levels:[13]

Evidently, there is not much confidence in the credibility of the reported figures. The authors of the report also elaborate that overall, they were able to collect 152 data points from 54 countries and then extrapolated that data to calculate the estimates for other geographic areas where data was not available. However, even the credibility of those available data points can be questioned. For example, Poland is assigned a medium confidence level, even though the data source for Poland is a small study by local civil society actors. “The Pilot Study of Characteristics of Household Waste Generated in Suburban Parts of Rural Areas” (Steinhoff-Wrześniewska, Aleksandra), mentions that:

21 households, representing 83 people, were audited. None of them were involved in agricultural production. They were provided with three bags for sorting (bio-waste, hygenic waste, all other waste) and had waste collected in each of the four seasons. It is unclear for how long during each season the measurement took place. As a result of small sample size and unknown length, we cannot have high confidence in the estimate.

Population of Poland is 38 million and only 15 million of it lives in rural areas, while 61% reside in urban centers. A sample of only 21 households from suburban parts of rural Poland observed over undefined periods of time is not a strong representative of food management habits across the whole country.

The question is whether these numbers can serve as a reliable metrics to measure the progress or calibrate policy actions. SDG Target 12.3 aims to halve the global per capita food waste by 2030. According to UNEP’s 2021 Index average food waste per household equals 79 kg a year in high-income countries equals, 76 kg in upper middle-income countries, 91 kg in lower middle-income countries, while the data for low-income countries is insufficient. For example, the 2021 Food Waste Index Report mentions that “The next questionnaire will be sent to Member States in September 2022, and results will be reported to the SDG Global Database by February 2023.” What if the next report shows that annual food waste per household in upper middle-income countries is 86 kg. It would lead to the conclusion that the food waste in this category of countries is increasing, while in fact, the number could have been decreasing. American biochemist Erwin Chargaff once said: “I thought it was the task of the natural sciences to discover the facts of nature, not to create them.” Relying on inaccurate data for measuring progress could set in motion mismatched policy interventions and do more harm than good.

IV. Theoretical Framework

There are no easy shortcuts to producing global data, such as Food Waste Index. It requires the formation of a specific global knowledge infrastructure focused around food waste. It entails standardizing measurements and processes, disciplining staff and synchronizing reporting timelines. Achieving this subject specific institutional interoperability on a global scale, requires significant amounts of money and resources. So, I explain the current shortcomings of the Food Waste Index, by looking at the global knowledge infrastructure behind it and reference mainly these two scholarly works for theoretical backup: A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming, by Paul Edwards, and Standards and Their Stories: How Quantifying, Classifying, and Formalizing Practices Shape Everyday Life, by Martha Lampland and Susan Leigh Star.

Food Waste Index is not a legitimate scientific fact, because there is no well-founded knowledge infrastructure behind it. In his book “A Vast Machine”, Paul Edwards writes that “an established fact is one supported by an infrastructure,”[14] and elaborates that “knowledge infrastructures comprise robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds.”[15] If we get rid of the infrastructure, we are left with claims and facts that can neither be backed up nor verified.

In modern world, infrastructures are all around us and we use them on a daily basis, without paying much attention, unless there is a problem with them and/or we have to change them.[16] For example, behind the tap water we use, there is a complex infrastructure of plumbing and water regulation. In a similar fashion, global data requires an elaborate knowledge infrastructure that consists of national communities of scientists, government bureaucrats, and civil society activists, who understand each other, can inform and keep each other accountable. These communities need physical facilities, such as offices and laboratories, as well as legal space to conduct their work with respect to intellectual property.[17] They require mediums of communication such as conferences, journals, web portals etc., to exchange knowledge and keep up to date.

However, most importantly, for these national information eco-systems to reach beyond their borders and co-produce global data, they need standardized methods and measures. The amount of reported food waste can change depending on how countries define food waste, when they measure it and what factors they take into account. For example, according to the UNEP, “food waste is defined as edible parts and associated inedible parts going directly to the following destinations: landfill, controlled combustion, litter discards/refuse, compost/aerobic digestion, land application, co/anaerobic digestion, sewer, but does not include food waste used for biomaterial/processing, animal feed or not harvested.[18] In some countries associated inedible parts of the food used for compost, might not be considered food waste. A more accurate report, should also take into account seasonal fluctuations of food waste.

V. UNEP’s Food Waste Index

Bottom line up front, there is no global knowledge infrastructure around food waste and UNEP did not have the resources to build it up in the given time frame. UNEP has been working on food waste reduction since 2013, when it launched the global campaign Think Eat Save, but it became a priority task for UNEP only in 2019, following the UN Environment Assembly Resolution 4/2, which mandated UNEP to accelerate global action on food waste reduction.[19]

Established in 1972 and headquartered in Nairobi, Kenya UNEP has around 860 staff members worldwide.[20] The mission statement of the UNEP, which is celebrating its 50^th anniversary this year, “is to provide leadership and encourage partnership in caring for the environment by inspiring, informing, and enabling nations and peoples to improve their quality of life without compromising that of future generations.”[21] By default the top priority for UNEP has been to lead the international efforts against climate change.

In 2013, UNEP in partnership with the Food and Agriculture Organization of the UN (FAO) launched the Save Food Initiative and its subcomponent program “Think Eat Save: Reduce Your Footprint.” Primary goal of the FAO established in 1945 is to “achieve food security that people have regular access to enough high-quality food to lead active, healthy lives.”[22] In 2011, FAO had released its estimates that nearly 1/3 of the world’s food was lost or wasted every year, which lead to their joint Save Food Initiative with UNEP two years later.

So, until recently food waste data was tangled with research into food loss and fell under the prerogative of FAO. The inherent structure of the UN system and the scheme for resource distribution, incentivizes UN agencies to compete for more responsibilities and programmatic oversight. In a 2019 survey by the UN Office of Internal Oversight Services, 80% of UNEP staff “noted that there was critical competition for donor sources with other UN entities.”[23] This institutional contest between FAO and UNEP could potentially explain why between 2015 and 2019, no organization was assigned as a custodian for Food Waste Index.

The first-time that food waste showed up in UNEP’s program of work and budget was in biennial 2018-2019, approved by the UN Environmental Assembly of the UNEP (UNEA) in May 2016.[24] It includes planned work outputs such as “Within sustainable food and agriculture policy frameworks, urban planning and/or existing sustainable consumption strategies, technical and policy guidance provided to public and private actors to measure, prevent and reduce food waste and increase the uptake of sustainable diet strategies and activities,” as well as “Outreach and communication campaigns to raise awareness of citizens (particularly young people) on the benefits of shifting to more sustainable consumption and production practices.” Their previous work plan for 2016-2017, proposed in 2014, had no mention of food waste.[25]

In May 2016, UNEA also adopted a resolution on “Prevention, reduction and reuse of food waste,” which requests the UNEP Executive Director “in cooperation with the Food and Agriculture Organization to “continue to raise awareness of the environmental dimensions of the problem of food waste, and of potential solutions and good practices for preventing and reducing food waste and promoting food reuse and environmentally sound management of food waste.”[26] However, UNEP became the custodian of the Food Waste Index only in 2019, and solidified itself as the lead agency on tackling food waste pursuant to the UNEA Resolution 4/2.[27]

In 2019, UNEP received a new Executive Director Inger Anderson, a competent professional who is well versed both in sustainable development and food security issues. She has more than 30 years of experience in international development organizations, which include her roles as Vice President of the World Bank for Sustainable Development and Head of the CGIAR Fund Council.[28] CIAGR is the Consortium of International Agricultural Research Centers, which brings together international organizations engaged in research about food security. Her predecessor came from a diplomatic background and was asked to resign as a result of an internal audit. Media reports, citing the leaks from the internal audit documents, mentioned that the head of UNEP spent “$500,000 on air travel and hotels in just 22 months, and was away 80% of the time.”[29] So, positive changes happened in the organization under the new leadership and Food Waste Index became one of the top priorities for UNEP.

When UNEP was first assigned as a custodian in 2019, Food Waste Index was still classified as a Tier 3 indicator by the UN’s Inter-agency and Expert Group on SDG Indicators (IAEG-SDGs). The UN breaks down all SDG indicators into 3 Tiers:

“Tier 1: Indicator is conceptually clear, has an internationally established methodology and standards are available, and data are regularly produced by countries for at least 50 per cent of countries and of the population in every region where the indicator is relevant.
Tier 2: Indicator is conceptually clear, has an internationally established methodology and standards are available, but data are not regularly produced by countries.

Tier 3: No internationally established methodology or standards are yet available for the indicator, but methodology/standards are being (or will be) developed or tested.”

Tier classifications change over time as the quality of data for indicators improves. For example, as of February 2022, IAEG-SDG lists 136 Tier I indicators, 91 Tier II indicators and 4 indicators that have multiple tiers (different components of the indicator are classified into different tiers),[30] while in September 2016, there were 81 Tier I indicators, 57 Tier II indicators and 88 Tier III indicators.[31] According to the IAEG reports Food Waste Index was upgraded from Tier III to Tier to II within 2 years.

When we look at the work plan of the UN Environment Program for 2020-2021, it has 7 subprograms, and collecting data for Food Waste Index falls under the Subprogram 6, which is about Resource Efficiency. In 2020-2021, UNEP allocated $95.6 million to the Subprogram 6, which means roughly $48 million per annum. It had 114 staff members working towards the 20 planned work outputs under the Resource Efficiency subprogram.

Mainly these work outputs were geared towards developing the information infrastructure for delivering the SDG indicators. For example, “Resource use assessments and related policy options are developed and provided to countries to support planning and policy-making, including support for the application and monitoring of relevant SDG indicators.” Or, “Database services providing enhanced availability and accessibility of life cycle assessment data are provided through an interoperable global network, methods for environmental and social indicators and the ways to apply them in decision-making.”[32] Most of these programmatic activities are about capacity development, technical assistance, training, policy support, etc.

As a result of UNEP’s active engagement, the number of countries that have a common global measurement approach for consistent reporting under SDG 12.3 increases every year. On average UNEP adds around 10 countries a year to their list of countries compatible for food waste reporting. This shows that UNEP is on the right track on building the knowledge infrastructure for a more reliable global Food Waste Index.

UNEP’s methodology for data collection is to send out Questionnaire on Environment Statistics (Waste Section) to National Statistical Offices and Ministries of Environment. If the respective authorities from these countries do not respond, then UNEP refers to alternative sources for information. However, we should be clear eyed that national executive agencies that collaborate with UNEP are not politically neutral entities and their responses to questionnaires can be subject to political interests of their respective governments.[33] So, these agencies might have the capacity to produce reliable numbers, but not the intention. For this reason, it would benefit the credibility of the food waste index, if UNEP increases its engagement with civil society organizations that can serve as alternative sources of reporting on food waste.

VI. Conclusion

The 2021 Report on Food Waste Index, does not just provide us with numbers about food waste, but it also informs us about the state of the knowledge infrastructure around food waste. The formation of a knowledge infrastructure is a lengthy and complicated process. Institutional resources of the UN system, its global reach, and modern technologies have enabled UNEP to make tremendous progress towards building this infrastructure, within a very short period of time. However, it is still unclear, when UNEP will be able to produce reliable global data on food waste. UNEP can draw many valuable lessons from their 2021 report on food waste, but it should not be used as a benchmark for progress, since it could lead to many misplaced conclusions down the road.

Looking into the future the importance of sustainable consumption will only increase. Over the course of the past century, humanity experienced unprecedent growth in global wealth and food production. Surging food production rates create enormous pressure on the environment, even though hundreds of millions are still not getting their fair share. One of the big reasons for this failure is the food waste problem. Unfortunately, until recently food waste issue has been largely neglected and calculating exactly how much food is wasted has remained an elusive target. If UNEP stays consistent with its action plan, global Food Waste Index will become increasingly more reliable, as more and more countries will be able to plug into the global knowledge infrastructure on food waste. However, there is a lot of work ahead. In the meantime, I would like to reiterate the call of the UNEP Executive Director Inger Anderson’s opening message in the 2021Food Waste Index Report, “let us all shop carefully, cook creatively and make wasting food anywhere socially unacceptable.”

[1] “U.N. Report Says 17% of Food Wasted at Consumer Level.” U.S., Reuters, 4 Mar. 2021,

[2] Merchant, Natalie. “Global Food Waste Twice the Size of Previous Estimates.” World Economic Forum, 26 Mar. 2021.

[3] Sustainable Development. (2022). UN Department of Economic and Social Affairs. https://sdgs.un.org/

[4] “Measuring Progress towards the Sustainable Development Goals.” Our World in Data, SDG Tracker, sdg-tracker.org. Accessed 5 Mar. 2022.

[5] Sustainable consumption and production policies. (2022). UNEP – UN Environment Programme.

[6] UNEP Food Waste Index Report 2021. (2021). UNEP – UN Environment Programme. https://www.unep.org/resources/report/unep-food-waste-index-report-2021

[7] Roser, M. (2019, October 8). Hunger and Undernourishment. Our World in Data. https://ourworldindata.org/hunger-and-undernourishment

[8] “New UNEP Report Developed in Collaboration with WRAP Reveals True Scale of Global Food Waste.” The Waste and Resources Action Programme, 2021, wrap.org.uk/FoodWasteIndex.

[9] UNEP Food Waste Index Report 2021. (2021). UNEP – UN Environment Programme.

[10] SDG Indicators Database. (2021). UN Department of Economic and Social Affairs. https://unstats.un.org/sdgs/UNSDG/IndDatabasePage

[11] According to the UNEP Food Waste Index Report 2021, countries with high-quality data on food waste are Australia, Austria, Canada, China, Denmark, Estonia, Germany, Ghana, Italy, Malta, the Netherlands, New Zealand, Norway, the Kingdom of Saudi Arabia, Sweden, the United Kingdom and the United States.

[12] “Food Waste Index Level 1 Annex.” UNEP- UN Environment Program, 2021, wedocs.unep.org/bitstream/handle/20.500.11822/35355/FWD.xlsx.

[13] Ibid

[14] Edwards, P. N. (2013). A Vast Machine, p. 22

[15] Edwards, P. N. (2013). A Vast Machine, p. 17

[16] Lampland, Martha, and Susan Leigh Star. Standards and Their Stories.

[17] Ibid

[18] UNEP Food Waste Index Report 2021. (2021), p. 14

[19] “Promoting Sustainable Practices and Innovative Solutions for Curbing Food Loss and Waste.” United Nations Environment Assembly, UNEP – UN Environment Programme, Mar. 2019, wedocs.unep.org/bitstream/handle/20.500.11822/28499/English.pdf.

[20] UNEP | International Organizations. (2005). IGPN – International Green Purchasing Network. http://www.igpn.org/global/interorg/unep.html

[21] “About UN Environment Programme.” UNEP – UN Environment Programme, http://www.unep.org/about-un-environment. Accessed 5 Mar. 2022.

[22] “About FAO.” Food and Agriculture Organization of the United Nations, http://www.fao.org/about/en. Accessed 5 Mar. 2022.

[23] Ivanova, Maria (Feb 23, 2021). The Untold Story of the World’s Leading Environmental Institution: UNEP at Fifty, p. 62

[24] “Programme of Work and Budget for the Biennium 2018‒2019.” United Nations Environment Assembly, UNEP – UN Environment Program, May 2016

[25] “Proposed Biennial Programme of Work and Budget for 2016–2017.” United Nations Environment Assembly, UNEP – UN Environment Programme, June 2014

[26] “Prevention, Reduction and Reuse of Food Waste.” United Nations Environment Assembly, UNEP – UN Environment Program, May 2016.

[27] “Promoting Sustainable Practices and Innovative Solutions for Curbing Food Loss and Waste.” United Nations Environment Assembly, UNEP – UN Environment Programme, Mar. 2019.

[28] Inger Andersen. (2019). UNEP – UN Environment Program

[29] Carrington, D. (2018, November 20). UN environment chief resigns after frequent flying revelations. The Guardian.

[30] “Tier Classification for Global SDG Indicators.” UN Statistics Division, Feb. 2019,

[31] “Tier Classification for Global SDG Indicators.” UN Statistics Division, Sept. 2016,

[32] “Proposed Programme of Work and Budget for the Biennium 2020‒ 20211.” UN Environment Assembly, p. 98

[33] In her book “Shades of Citizenship,” Melissa Nobles presents a very illuminating discussion about the impact of the political interests of the data collecting agencies on the data they produce

On Facial Recognition Technology

April 5, 2022January 22, 2023 hpanahovLeave a comment

Why the US needs federal law on Facial Recognition Technology?

Originally published on Intersect: The Stanford Journal of Science, Technology, and Society

Introduction

Since the beginning of the 2000s, Facial Recognition Technology (FRT) has become significantly more accurate and more accessible. Both government and commercial entities use it in increasingly innovative approaches. News agencies use it to spot celebrities at big events. Car companies install it on dashboards to alert drivers falling asleep at the wheel. Governments have used it to track Covid-19 patients’ compliance with quarantine regimes, or to reunite missing children with their families.[1] However, as the use of technology has become more widespread, the controversies around it have also grown. The technology offers tremendous opportunities, but there are reasons to be concerned about its impact on privacy and civil liberties, if it is not used properly. In this paper, I make a brief introduction to facial recognition technology, look separately at commercial and government applications of it, and present my argument why the US needs a federal legislation on FRT.

1. The Nuts and bolts of FRT

Facial recognition falls under the category of biometric data. The software pinpoints facial landmarks, measures the distance between them, and creates a geometric shape of your face.[2] It is less accurate than other biometric identifiers, such as iris and or fingerprint scanning, because of two reasons. One, facial images are not always of high quality. Two, unlike other biometric identifiers, facial features can change over time, due to aging, plastic surgery, cosmetics, effects of drug abuse or smoking, etc.[3] However, FRT has become a lot more popular, because it can be used remotely and is a lot easier to apply in high traffic places.

Today, facial recognition is used mainly for two reasons. First, face verification, also, known as “one-to-one” matching. It is used to verify that you are who you say you are. It is commonly applied to unlock a smartphone or replace ID checks.[4] Second, face identification, also, known as “one-to-many” matching. Usually used to search for persons of interest, where you start the search with an image of a person you do not know to determine his/her identity.[5]

Another category is facial analysis, where the algorithm analyses facial features to determine “age, gender, ethnicity, emotions, fitness for certain jobs.”[6] For example, McDonald’s has used facial analysis in its Japanese stores to check if the employees are smiling, when assisting the customers.[7] Walmart is working on a facial analysis system that will help to process the shoppers’ mood while they are in a store.[8] There have been numerous reports that China is using facial analysis to track ethnic Uighurs, a largely Muslim group in the western province of Xinjiang. Reportedly, the technology can distinguish “Uighur/non-Uighur attributes”, and allows the Chinese police to track the movements of the minority group.[9] While the reports of the Chinese government crackdown on Uighurs have been confirmed, the credibility of the software distinguishing Uighurs purely on facial features is questionable.[10][11]

These news stories give us a good idea about how the FRT can evolve in the future, but at this point in time, facial analysis software is mainly in the research and trial phase. So, this paper will keep the focus on facial recognition. The truth is even facial recognition technology is prone to mistakes. On several occasions, police have arrested the wrong person, because of a mistake by the FRT. In June 2020, Detroit Police Chief said that the software they use misidentifies 96% of the time, so they use it only to narrow down their search sample.[12] In 2018, American Civil Liberties Union tested the facial recognition software of Amazon to compare the images of members of Congress with a database of 25000 mugshots of convicted criminals.[13] Amazon’s “Rekognition” software falsely identified 28 members of congress as criminals. (Amazon’s software is available for public use and cost the ACLU only $12.33).

The FRT is more likely to make a mistake with women and people with darker skin tones than with white men. In the ACLU test, 40% of the false matches were African Americans, even though they comprise only 20% of Congress. In 2018, MIT study of gender and skin-type bias in commercial artificial-intelligence systems showed a 34.7% error rate for dark-skinned women, and only 0.8% for light-skinned men.[14] There are two likely explanations for this bias: darker skins do not reflect light as well as fair skin tones; 2. smaller sample size of minorities’ images.

However, this is a changing pattern and every year the FRT is getting better at recognizing people of all skin tones. A big reason for this is that both the quantity and the quality of the facial images are going up. According to the National Institute of Standards and Technology (NIST) under the US Department of Commerce, the best face identification algorithm in 2014 had an error rate of 4.1%, while by 2020 the leading algorithm had an error rate of less than 1%.[15]

2. Commercial use of FRT

The market for FRT emerged only around 2001, but it has been dynamically growing ever since. According to various estimates, it is expected to reach somewhere between 7 and 10 billion USD in 2022. More and more organizations are using FRT to replace ID checks. Schools use it to track attendance and/or keep away unwanted people. It is widely used to group and catalog images and video files. We have already mentioned some other innovative ways how FRT can be used. However, it is important to note that not all uses of the technology have equal social impact and the US Congress needs to take action and set legal boundaries for commercial use of the FRT.

If we go back to the two sub-categories of facial recognition we discussed earlier, the main issue in the commercial use of the FRT is around facial identification. In the case of facial verification or one-to-one searches, there is a set limit to the database and everyone involved is usually aware that they are part of a certain facial verification system. Usually, facial features of more people are processed to train the algorithm, but that is less problematic since those images are anonymized. In the case of facial identification or one-to-many searches, there is no set limit to the databank and many people are not aware that their information is on a certain database. So, this raises a question about consent.

Can the companies use the images we share on public platforms online to build their database without asking for permission? On November 2, 2021, Facebook announced that it is shutting down its facial recognition system and deleting “more than a billion people’s individual facial recognition templates”.[16] That is why we no longer see little squares around faces when we scroll over Facebook photos. The decision came 6 months after Facebook had to pay $650 million for violating the Illinois Biometric Information Privacy Act (BIPA), which bans collecting and storing of the facial geometry of Illinois residents.[17] Facebook made an elaborate argument that it inflicted no harm on its users, but still lost the case, since BIPA clearly states that processing the biometric data of Illinois residents without opt-in consent is illegal.[18]

Most big tech companies in the US have or had their own facial recognition software, but following the controversies over the racial bias issue, they have restricted investments in FRT. Within a week in June 2020, IBM announced that it is getting out of the facial recognition business altogether, while Microsoft and Amazon declared a moratorium on selling their facial recognition technology to law enforcement agencies. However, these tech giants are not the biggest in the facial recognition market. Table 1 lists 10 of the biggest companies in the FRT market.

Table 1: Some of the biggest companies in the FRT market

Company	Country	Founded in	Web info
Ayonix	Japan	2007	https://ayonix.com
Clearview AI	USA	2017	https://www.clearview.ai/
Clear Secure	USA	2010	https://www.clearme.com
Cognitec	Germany	2002	https://www.cognitec.com/
iOmniscient	Australia	2001	https://iomni.ai
Kairos	USA	2012	https://www.kairos.com
Megvii	China	2011	https://en.megvii.com
NVISO	Switzerland	2009	https://www.nviso.ai/en
Oosto*	Israel	2015	https://oosto.com
SenseTime	China	2014	https://www.sensetime.com/en

* Former AnyVision

In January 2020, The New York Times investigation revealed that a New York based company Clearview AI built a database of 3 billion images scraped from the internet and is selling its software to 600 law enforcement agencies.[19] A month later, BuzzFeed did a follow-up investigation and found that Clearview “had provided its facial recognition tool to more than 2,200 police departments, government agencies, and companies across 27 countries.”[20] Now the company is facing lawsuits in at least 7 countries, including the United States, Canada, Australia, Germany, United Kingdom, France, Italy and Greece.[21] In November 2021, UK government imposed a $23 million fine on Clearview, for violating their national data privacy law. Twitter, Google, and Facebook have also sent cease-and-desist letters requesting it stops using the public information of their users.^[22]

When sued under BIPA, Clearview responded that it will delete data of all the residents from Illinois. Currently, on its website Clearview offers an opt-out form for residents of Illinois and California, which also has legislation similar to BIPA.[23] The United States Congress should pass federal law similar to BIPA or California’s Consumer Privacy Act that would introduce clearly defined limits for commercial use of the FRT. However, considering that on the other side of the debate, this technology adds value to the efforts of the security agencies, the federal legislation should not be overly restrictive. An opt-out consent might be a reasonable solution.

We should, also consider that with every passing day, it is becoming easier to build a search engine for photo matching, like Clearview. Two weeks after the attack on the US Capitol on January 6^th, 2021, a website named Faces of the Riot appeared online, which catalogued the faces of 6000 individuals who were present during the incident, extracted from 827 videos posted on social media platform, Parler. The author of the website, who self-identified as a student in the Washington DC area, told the journalists that he intended to help the police investigation and that he used only open-source software.[24] Thus, a heavily restricted legal environment might not achieve the intended purpose, but create a lucrative black market for the FRT. The federal law on commercial use of FRT should define feasible legal boundaries and find the right balance between the right to privacy and public security efforts.

3. Government use

The number of governments using facial recognition is growing every year. They use it mainly for security and traffic control purposes. However, facial recognition technology and the artificial intelligence behind it are very powerful tools that can be used in many different ways that are not always in the public interest. The federal legislative bodies need to intervene and establish certain standards, impose responsibilities and delineate restrictions for the public use of the FRT.

If facial recognition becomes overly pervasive, then independent of the intent, it could lead to constraints on public freedom. It is important for governments to evaluate the potential impact of facial recognition on civil liberties and establish ethical principles and regulatory guidelines before expanding the use of FRT. A privacy impact assessment by The International Justice and Public Safety Network, which is comprised mainly of seasoned law enforcement officers, mentions that “the mere possibility of surveillance has the potential to make people feel extremely uncomfortable, cause people to alter their behavior, and lead to self-censorship and inhibition.”[25] There are various reports that this is happening in China, where facial recognition is very commonplace. German journalist, Kai Stritmatter, who has studied China for more than 30 years writes about the government use of facial recognition in China: “What the Communist Party is doing with all this high-tech surveillance technology now is they’re trying to internalize control. … Once you believe it’s true, it’s like you don’t even need the policemen at the corner anymore, because you’re becoming your own policeman.”[26] In order to provide a better context, I present a brief overview of the government uses FRT in China and the European Union.

China

According to one estimate in 2020, there were around 770 million surveillance cameras installed around the world and roughly 54% of those cameras were in China.[27] Based on the number of cameras per 1000 people 16 out of the top 20 most surveilled cities are in China.[28] Facial recognition technology is omnipresent in most parts of the country and is used by both government and private entities. For example, at KFC China you can pay by smiling into a camera. According to new guidelines passed by China’s Supreme People’s Court, since August 1, 2021, commercial venues, such as hotels, shopping malls, and airports, need to get consent from customers to use facial recognition.[29] The new rules also impose restrictions on the use of the technology and responsibilities for protecting it.[30] The decision of the Supreme People’s Court came about a year after residents in Honk-Kong staged mass protests against the ubiquitous facial recognition and toppled 20 lampposts equipped with cameras.[31] However, there are no restrictions on the government use of the FRC and it continues to be an integral part of the social credit score system. If a Chinese citizen decides to jaywalk on a street equipped with facial identification camera, she will receive a private message with a fine and that will impact negatively on her social credit score.

European Union

Two weeks ago, a coalition in the German parliament, led by the ruling Social Democratic Party said they want to ban “biometric recognition in public spaces as well as automated state scoring systems by AI.”[32] In April of 2021, European Commission proposed a new regulation titled Harmonized Rules on Artificial Intelligence, which also suggests a ban on facial recognition, absent certain exceptions for security purposes. According to the proposed regulation, the use of “real time remote biometric identification systems in publicly accessible spaces for the purpose of law enforcement is prohibited unless certain limited exceptions apply.”[33] Exceptions include: strictly necessary for a targeted search of potential victims of a crime, prevention of a specific imminent threat to life, or the detection or identification of a perpetrator. The act has already been criticized and various improvements have been offered, but is a great starting point on this very important issue.

The United States

The United States, the world leader in AI industry, does not have a regulation on the fair use of facial recognition either, but the issue is on the agenda of political debates in Congress. In March 2021, National Security Commission on Artificial Intelligence, a bipartisan working group, released its final report, where it recommends the “Congress to require prior risk assessments “for privacy and civil liberties impacts” of AI systems, including facial recognition.” In 2020, “Facial Recognition and Biometric Technology Moratorium Act”, was proposed, but did not pass. Such a moratorium would give time to improve the accuracy of the facial recognition technology and conduct an assessment of its potential implications.

4. Conclusion

One of the biggest concerns in the United States has been the bias of the facial recognition software. As discussed earlier facial recognition systems have been biased against minorities, which has led to several wrong arrests by police. For example, in the summer of 2020 Robert Williams, a resident of Michigan was detained and kept in the police station overnight because a facial recognition algorithm made a flawed match. Usually, these cases get resolved within hours, but it creates a tremendous inconvenience for innocent people and their families. The United States needs a national law that sets out the legal framework for public use of the FRT and addresses all the possible side effects. For example, an effective way to address this issue would be to have third-party testing and approval for the facial software used by police.[34] They would use only the software that is certified by an independent agency. It is also important that police do not use low quality images in their queries.[35]

Facial recognition technology is a powerful new tool that requires a comprehensive approach, which takes into account its impact on the economy, national security, and civic life. It presents incredible opportunities, especially in aiding the work of law enforcement agencies, but finding the right balance between security and civil liberties will be one of the biggest challenges. Federal law is required to regulate both commercial and government use of the FRT and establish quality and credibility standards for the facial recognition software. The law should not force the police to work with analog technologies in a digital age,[36] but they should enforce high ethical standards that will minimize the potentially negative impact on civic life.

[1] Nagaraj, A. (2020, Feb 14). Indian police use facial recognition app to reunite families with lost children. Reuters

[2] Symanovich, S. (2021, Aug 20). What is facial recognition? How facial recognition works. Norton.

[3] Facial Recognition. (2021, October). INTERPOL.

[4] Nature Editorial, & Castelvecchi, D. (2020, Nov 18). Is facial recognition too biased to be let loose? Nature.

[5] Ibid

[6] Ibid

[7] Kaspersky. (2021, August 23). What is Facial Recognition – Definition and Explanation. Kaspersky.Com

[8] Nothing personal? How private companies are using facial recognition tech. (2020, Jun 8). TechHQ.

[9] Mozur, P. (2019, May 6). One Month, 500,000 Face Scans: How China Is Using A.I. to Profile a Minority. The New York Times

[10] Crawford, K., Dobbe, R., Dryer, T., & Fried, G. (2019, December). 2019 Report. AI Now Institute. New York University

[11] Rollet, C. (2019, November 11). Hikvision Markets Uyghur Ethnicity Analytics, Now Covers Up. IPVM.

[12] Koebler, J. (2020, June 29). Detroit Police Chief: Facial Recognition Software Misidentifies 96% of the Time. Vice.

[13] Snow, J. (2018, August 3). Amazon’s Face Recognition Falsely Matched 28 Members of Congress With Mugshots. American Civil Liberties Union.

[14] Hardesty, L. (2018, February 12). Study finds gender and skin-type bias in commercial artificial-intelligence systems. MIT News | Massachusetts Institute of Technology.

[15] Crumpler, W. (2020, April 14). How Accurate are Facial Recognition Systems – and Why Does It Matter? Center for Strategic and International Studies.

[16] Pesenti, J. (2021, Nov 3). An Update On Our Use of Face Recognition. Meta.

[17] 740 ILCS 14/ Biometric Information Privacy Act. (2008, October 3). Illinois General Assembly.

[18] MacCarthy, M. (2020, Aug 20). Who thought it was a good idea to have facial recognition software? Brookings.

[19] Hill, K. (2021, November 2). The Secretive Company That Might End Privacy as We Know It. The New York Times.

[20] Mac, R. (2020, May 8). Clearview AI Says It Will No Longer Provide Facial Recognition To Private Companies. BuzzFeed News.

[21] Webster, S. (2021, May 27). Clearview AI Hit With Dozens of Lawsuit in Europe Over Method of Collecting Data. Tech Times.

[22] Julia Horowitz (2020, Jul 3). Tech companies are still selling facial recognition tools to the police. CNN Business

[23] Illinois Opt-Out Request Form. (2021). Clearview AI. Retrieved December 9, 2021, from https://clearviewai.typeform.com/to/HDz8tJ?typeform-source=www.clearview.ai

[24] Greenberg, A. (2021, January 20). This Site Published Every Face From Parler’s Capitol Riot Videos. Wired.

[25] Garvie, C., & Moy, L. M. (2019, May 16). America Under Watch | Face Surveillance in the United States. America Under Watch – Real-Time Facial Recognition in America. https://www.americaunderwatch.com

[26] Davies, D. (2021, Jan 5). Facial Recognition And Beyond: Journalist Ventures Inside China’s ‘Surveillance State’. NPR.

[27] Keegan, M. (2020, August 14). The Most Surveilled Cities in the World. US News.

[28] Bischoff, P. (2021, May 17). Surveillance camera statistics: which cities have the most CCTV cameras? Comparitech.

[29] Dou, E. (2021, July 30). China built the world’s largest facial recognition system. Now, it’s getting camera-shy. Washington Post. https://www.washingtonpost.com/world/facial-recognition-china-tech-data/2021/07/30/404c2e96-f049-11eb-81b2-9b7061a582d8_story.html

[30] Ibid

[31] Fussell, S. (2019, August 30). Why Hong Kong Protesters Are Cutting Down Lampposts. The Atlantic.

[32] Heikkilä, M. (2021, November 24). German coalition backs ban on facial recognition in public places. POLITICO.

[33] HARMONISED RULES ON ARTIFICIAL INTELLIGENCE. (2021, April 21). European Union Law.

[34] MacCarthy, M. (2021, May 25). Mandating fairness and accuracy assessments for law enforcement facial recognition systems. Brookings.

[35] Hill, K. (2020, August 3). Wrongfully Accused by an Algorithm. The New York Times.

[36] Porter, T. (2019, March 21). The debate on automatic facial recognition continues. Surveillance Camera Commissioner’s Office.

Cybernetics.blog

Author: hpanahov