The Evidence-Based Policymaking Act and Privacy

Abstract:

The legislative act on The Foundations of Evidence-Based Policymaking created a framework for the centralization of statistical information collected by dozens of US federal agencies across the country and imposed responsibilities for sharing that data within the government, as well as with researchers and private entities. One of the main outcomes of the act is expected to be a National Secure Data Service, which will promote collaboration, help to avoid duplication, and minimize public expenditure on data collection and processing. Most importantly, it will improve the government efficiency by restructuring the national statistical ecosystem to better inform policy decisions. However, the centralization of federal data foretold by EBPA creates new privacy risks and vulnerabilities, which is why in the 1960s similar idea of a National Data Center was rejected in Congress. Back then, the debate around data centralization ended with the passing of the Privacy Act of 1974. A semi-century later, the data centralization idea has been approved, but no changes were made to the privacy legislation. This paper argues that, while EBPA is a positive step forward, it needs additional privacy safeguards that could be provided by revising the Privacy Act of 1974, which was last updated in 1988.    

1. Background

This section looks at why the government collects data, how its institutional and technical capacity to process data has changed over time, and its consequential impact on the public debate around privacy. 

1.1 Purpose  

The corpse idea behind the Foundations of Evidence-Based Policymaking Act of 2018 (EBPA) is to create metrics for analyzing the government’s policy decisions and thus improve the federal government’s effectiveness. According to title 44 of the U.S. Code, the term evidence means “information produced as a result of statistical activities conducted for a statistical purpose” (44 USC 3561: Definitions). However, not all statistics are the same, and relying on bad data can do more harm than good. So, EBPA intends to increase not only the quantity of data supplied for informing policy decisions but also the quality.   

1.2 Why do governments collect data?

The collection of certain data types is essential for a government to carry out its basic functions. As far back as five-six thousand years ago, ancient governments in Babylonia and Egypt collected some primitive forms of census data. Early governments needed the census data mainly for taxation and military recruitment. However, with the emergence of democratic states, census data became a crucial element of political representation. In the United States, holding a decennial census is embedded in the Constitution. Article 1, Section 2 of the Constitution mentions that “the actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States and within every subsequent Term of ten Years” (The National Constitution Center). The Nation’s Founders intended to equally divide the seats in Congress among the States and their populations. The initial benchmark was one Congress representative for every 30 thousand residents (Gauthier). (Today, that number hovers around 700 000) Consequently, census data was necessary to advance democratic governance.

1.3 Changing capacity to process data

However, producing quality data did not come so easily, as it requires institutional capacity building, training of professional staff, a certain level of public awareness, and resources to provide for all this. The first census in the U.S. took place in 1790 and counted the total population as 3,929 214 (A Timeline of Census History). Then, both President George Washington and Secretary of State Jefferson expressed skepticism and thought it was undercounted.  Until 1840, State Secretaries were put in charge of organizing the decennial census, which was a temporary assignment. In 1849, Congress established a census board to oversee data collection, and the responsibility for census data shifted from the Department of State to the Department of Interior (DOI). And only in 1902, the Census Bureau became a permanent agency under the DOI (A Timeline of Census History).

This gradual shift from a temporary ad-hoc group of amateurs to a permanent government bureaucracy happened parallel to the increasing complexity of census operations and the government’s growing demand for quality data. It is noteworthy that two other federal statistical agencies were established before the Census Bureau. One is the National Center For Education Statistics, founded in 1867, and the other is the U.S. Bureau of Labor Statistics, founded in 1884. Today, overall, the U. S. has thirteen principal Federal Statistical Agencies and more than 90 federal organizations that engage in statistical activities. Please, see Table 1 for the full list of thirteen principal statistical agencies in the U.S.   

Table 1: 13 Principal Federal Statistical Agencies in the U.S.:

AgencyGoverning bodyFounded
Bureau of Economic AnalysisDepartment of Commerce1972
Bureau of Justice StatisticsDepartment of Justice1979
Bureau of Labor StatisticsDepartment of Labor1884
Bureau of Transportation StatisticsDepartment of Transportation1992
Census BureauDepartment of Commerce1903
Economic Research ServiceDepartment of Agriculture1961
Energy Information AdministrationDepartment of Energy1977
National Agricultural Statistics ServiceDepartment of Agriculture1961
National Center for Education StatisticsDepartment of Education1867
National Center for Health StatisticsDepartment of Health and Human Services1960
National Center for Science and Engineering StatisticsIndependent1950
Office of Research, Evaluation and StatisticsSocial Security Administration1935
Statistics of IncomeDepartment of Treasury1916

One of the forces driving the increasing demand for quality data was the transition of the U.S. to a welfare state. By definition, a welfare state means “a state that is committed to providing basic economic security for its citizens by protecting them from market risks associated with old age, unemployment, accidents, and sickness” (Weir). In order to efficiently allocate resources and provide targeted assistance, the state needed more complex and accurate databases on individual citizens. A turning point became the Social Security Act of 1935, which was part of the New Deal, a series of government programs in response to the Great Depression. At the time, the United States was the only modern industrial country that did not have a social security system.

One of the main provisions of the Social Security Act was the creation of the Social Security Number (SSN), which assigned a unique 9-digit number to every U.S. citizen, as well as a permanent and temporary resident. Over time the social weight and public perception of the SSN changed. Carolyn Puckett, working for the Office of Research, Evaluation, and Statistics at the Social Security Administration, wrote in 2009 that “created merely to keep track of the earnings history of U.S. workers for Social Security entitlement and benefit computation purposes, it. [SSN] has come to be used as a nearly universal identifier” (Puckett). It turned into the primary method for public services to identify citizens and organize the individual records.

1.4 Changing perceptions

As the number of entries about citizens started going up in the following years, various concerns about its privacy implications started emerging. In her 2018 book “The Known Citizen: A History of Privacy in Modern America,” Sarah E. Igo, Professor of History and the Dean of Strategic Initiatives for the College of Arts and Science at Vanderbilt University, writes that with the passage of the Social Security Act of 1935, “questions about how thoroughly the state ought to know its own people became less theoretical” (Igo, p. 57). Professor Igo writes that until the 1930s, public perception was that the government tracked only the troubled citizens and marginal communities to maintain public order. However, in the New Deal era, the government’s administrative tracking captured even more privileged citizens, and “being known to the government” became “increasingly constitutive of citizenship itself: a necessary exchange for steady employment, increased economic security, and free movement across borders” (Igo, p. 56).

Initial public reactions to the newly instituted Social Security programs were largely positive, especially during the years of World War II, when it enabled the government to efficiently identify and provide assistance for war veterans and wounded warriors. Some people went even as far as tattooing their social security numbers on their bodies to make sure they would not forget their nine digits. However, in the following decades, especially as the economic crisis and war waded into history, public debate about government databases shifted into a new phase. On the one hand, some government bureaucrats and social scientists believed that increasing public data records’ quantity and quality would lead to more efficient social and economic policies. On the other hand, many civil society activists and legal scholars were voicing concerns that swelling volumes of databases on citizens was an invasion of privacy.

1.5. The National Data Center

In this context, the story of the failed National Data Center in the 1960s is especially noteworthy and extremely relevant to the debate around the Evidence-Based Policy Act adopted in 2018. It started with a request from a group of social scientists, who in 1965 “recommended that the federal government develop a national data center that would store and make available to researchers the data collected by various statistical agencies” (Kraus, p. 1). The ensuing political turmoil is captured very eloquently in the “Statistical Déjà vu: The National Data Center Proposal of 1965 and Its Descendants” paper Rebecca Kraus wrote in 2013. On one side, some social scientists believed that “government programs designed to address social issues, such as civil rights, housing, employment, welfare, education, and poverty” could be improved if the academic community had access to the public data generated by the federal government (Kraus, p. 4). On the other hand, privacy advocates were concerned about the potential risks and vulnerabilities such a center would create. The proposal of the National Data Center lost fume in 1970, when the Bureau of the Budget, which led the research behind it, was reorganized into the Office of Management and Budget.[1]

2. The Commission

2.1 Formation of the Commission

In March 2016, Speaker of the House Paul Ryan and Senator Patty Murray put forward the bipartisan Evidence-Based Policymaking Commission Act of 2016, which President Barack Obama signed within the same month. It laid the foundation for the establishment of the U.S. Commission on Evidence-Based Policymaking (CEP), directed to “consider how to strengthen government’s evidence-building and policymaking efforts,” as well as “study how the data that government already collects can be used to improve government programs and policies,” and present its findings and recommendations to the Congress and the President.  

2.2 Bipartisan initiative

It is worth underlining the bipartisan nature of this initiative. Two congress leaders, Democratic Senator from Washington State Patty Murray and Republican Speaker of the House of Representatives from Wisconsin Paul Ryan, had established good relations back in 2013 when they achieved breakthrough success with the Bipartisan Budget Act of 2013. The bill allowed Congress to avert a government shutdown and, in the long run, to save close to $23 billion. Patty Murray and Paul Ryan had made only small compromises to achieve the breakthrough, and both were applauded for the ensuing agreement. Three years later, they built on this success and initiated the CEP. During the introduction of the commission’s findings, Senator Murray said that “No matter what side of the aisle you’re on, we should all agree that government should work as efficiently as possible for the people it serves” (U. S. Senator Patty Murray). At the same time, Ryan Paul remarked that “Patty and I have long advocated for a way to better measure the federal government’s effectiveness—and this bill puts those efforts into action” (U. S. Senator Patty Murray).

2.3 Composition of the Commission

Consequently, the Commission was comprised of individuals who did not have strong political affiliations. They were mostly academics, some with prior experience in the federal government, one current employee of the U.S. Office of Management and Budget, and three from the private sector. Two of the commission’s fifteen members are well-known privacy advocates: Paul Ohm, a Professor of Law at the Georgetown University Law Center, and Latanya Sweeney, Professor of the Practice of Government and Technology at the Harvard Kennedy School. They are both well recognized for their research and publications on privacy law and policy. Paul Ohm’s position is that “data can be either useful or perfectly anonymous but never both.” Latanya Sweeney was a graduate student at the Massachusetts Institute of Technology in 1997 when she reidentified the Massachusetts Governor Bill Weld connecting his publicly accessible records to his anonymized medical records (Meyer). This made a big public impact and led to new legal restrictions on the disclosure of protected health information under the Health Insurance Portability and Accountability Act, known as HIPAA. So, within the CEP, Ohm and Sweeney advocated for additional frictions in accessing the government databases and adding layers of privacy protections.

Table 2: Members of the U.S. Commission on Evidence-Based Policymaking

 NameAffiliation
  1Commissioner and Chair Katharine G. Abraham  University of Maryland
  2Commissioner and Co-Chair Ron Haskins  Brookings Institution
 Commissioners: 
3Sherry GliedNew York University
4Robert M. GrovesGeorgetown University
5Robert HahnUniversity of Oxford
6Hilary HoynesUniversity of California, Berkeley
7Jeffrey LiebmanHarvard University
8Bruce D. MeyerUniversity of Chicago
9Paul OhmGeorgetown University
10Nancy PotokU.S. Office of Management and Budget
11Kathleen Rice MosierFaegre Baker Daniels, LLP
12Robert SheaGrant Thornton, LLP
13Latanya SweeneyHarvard University
14Kenneth R. TroskeUniversity of Kentucky
15Kim R. WallinD.K. Wallin, Ltd.

However, on the other side of the debate were social scientists who believed that access to more data would improve both the quality of academic research and the efficiency of the government’s public policy. Consequently, there were many heated debates within the commission. The CEP held its first meeting in July 2016 and presented its final report in September 2017. During this period, they surveyed 209 Federal offices that work with evidence (data), invited 49 witnesses, held meetings with 40 organizations, hosted three public hearings, and reviewed comments from 350 respondents in the Federal Register (Bipartisan Policy Center). When the time came, they were able to present a final document that was undersigned unanimously by all commission members.

2.4 Recommendations

The final report of the commission, titled The Promise Of Evidence-Based Policymaking, was presented to the public on September 7, 2017. It included 22 specific recommendations that fell under four categories: 1. Improving Secure, Private, and Confidential Data Access; 2. Modernizing Privacy Protections for Evidence Building; 3. Implementing the National Secure Data Service; 4. Strengthening Federal Evidence-Building Capacity.In the 138-page document, the word privacy is used – 390 times, secure – 183 times, and confidential – 12. Overall, the report recognizes that “the country’s laws and practices are not currently optimized to support the use of data for evidence building, nor in a manner that best protects privacy” and suggests several measures to address this issue (Commission on Evidence-Based Policymaking).   

2.5 The National Secure Data Service

One of the central ideas in the report is the establishment of a National Secure Data Service (NSDC), a kind of a successor to the idea of the National Data Center from the 1960s. Back then, during one of the Congressional hearings, economist Richard Ruggles had remarked that “although the emphasis in the privacy hearings was mainly on the possible danger of centralizing records, they also brought out that in some instances, the centralization of files can result in increasing the protection of individual privacy in situations where there have been flagrant abuses” (Kraus, p. 21). Building on this premise, members of the CEP believed that creating a centralized data center could enhance both the quality of data and privacy standards. The report suggested that the NSDC could learn from the expertise and institutional knowledge of the Center for Administrative Records Research and Applications (CARRA) and the Center for Economic Studies (CES) under the Census Bureau, which have been carrying out similar functions.

3. The Legislation

3.1 Passing into law

The Foundations for Evidence-Based Policymaking Act passed the House of Representatives on November 15, 2017. About eleven months later, the Senate approved the bill, as amended, by a unanimous vote. In January 2019, the President signed the “Foundations for Evidence-Based Policymaking Act of 2018” into law (Legislative Bulleting).The final act, which is about thirty pages long, makes only seven references to privacy, but it creates clear boundaries for the use of public data, assigns responsible parties for handling and protection of databases, and assumes legal penalties for the violations of the act’s provisions. Overall, the Act presents several progressive and innovative approaches to handling public data, but whether a sufficient level of privacy protections supplements these new practicesrequires a closer examination.

3.2 Legal Amendments

It goes without saying that the act is not built in a vacuum but rather supplements a complex system of pre-existing rules and regulations. The full title of the EBPA is: “to amend titles 5 and 44, United States Code, to require Federal evaluation activities, improve Federal data management, and for other purposes.” Title 5 of the U.S. Code is about “Government Organization And Employees,” and it contains regulations, such as The Freedom of Information Act (FOIA) adopted in 1967 and the Privacy Act of 1974. FOIA provides the American citizens the right to request access to records from any federal agency, given it does not violate certain privacy and confidentiality rules (Branscomb).[2] Privacy Act of 1974 established “a code of fair information practices that govern the collection, maintenance, use, and dissemination of information about individuals that is maintained in systems of records by federal agencies” (Privacy Act of 1974). EBPA did not make any changes either in FOIA or the Privacy Act but complimented title 5 of the U.S. Code with additional provisions about federal government data handling practices.

Title 44 of the U.S. Code is about “Public Printing and Documents” and covers all the archives, registries, and records managed by the federal government. Most provisions of the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA) also fall under title 44. CIPSEA was part of the broader E-Government Act of 2002, and it established uniform confidentiality standards to protect the data collected by federal statistical agencies. The purpose was to avoid opportunities for triangulating data points and reidentifying respondents based on data shared by various statistical agencies. The Evidence-based Policymaking Act repealed CIPSEA 2002 and instead reauthorized CIPSEA 2018, with the overall intention of providing more opportunities to use public data for statistical purposes and imposing more responsibilities for risk aversion (Ruyle).  

The EBPA also passed into law the “Open, Public, Electronic, and Necessary Government Data Act,” also known as the OPEN Government Data Act. Since 2009, the U.S. General Services Administration has been running a website Data.gov, which publishes for public access machine-readable datasets produced by the executive branch of the national government (Data.Gov). In March 2017, House democratic representative Derek Kilmer from Washington State proposed the OPEN Government Data Act that would expand the coverage of the data.gov and require “open government data assets made available by federal agencies (excluding the Government Accountability Office, the Federal Election Commission, and certain other government entities) to be published as machine-readable data… when not otherwise prohibited by law” (H.R.1770 – 115th Congress). All in all, EBPA was not an out of the blue, disruptive legislature, but rather another step towards open data and evidence-based policymaking that was plugged into the pre-existing legal infrastructure.

3.3 Statistical purpose

A top priority in the text of the EBPA is ensuring that only anonymized aggregate data will be shared to protect the confidentiality of respondents. One of the most frequently used terms is “statistical purpose” (mentioned 35 times), which according to the title 44 of the U.S. Code, means “the description, estimation, or analysis of the characteristics of groups, without identifying the individuals or organizations that comprise such groups” (44 USC 3561: Definitions). For example, collecting and processing data on the overall number of traffic incidents in Washington DC falls under statistical purposes. However, if the data is used to calculate car insurance rates adjusted for individual drivers in Washington DC, that would be a non-statistical use. For most social research and public policy purposes, aggregate data is sufficient. For example, if the unemployment rate among the Hispanic population is higher than other groups, then the government can initiate a tailored policy approach targeting specifically that group. However, when very large quantities of data are centralized in one place and various bits and parts are shared on public platforms, it creates opportunities for reverse tracking the data points, making meaningful connections, and reconstructing certain parts of the database not meant for public disclosure.

3.4 Risks and Responsibilities

From this standpoint, EBPA puts a big responsibility on the heads of federal agencies and will hold them accountable for determining “risks and restrictions related to the disclosure of personally identifiable information, including the risk that an individual data asset in isolation does not pose a privacy or confidentiality risk but when combined with other available information may pose such a risk.” Additionally, the law establishes the position of Evaluation Officer in each agency, whom the head of the agency will designate without regard to political affiliation. The main function of the Evaluation Officer will be to “continually assess the coverage, quality, methods, consistency, effectiveness, independence, and balance of the portfolio of evaluations, policy research, and ongoing evaluation activities of the agency.”

However, EBPA centralizes the data generated by all federal agencies. So to minimize the risks of reidentification, there is a need for interagency coordination. For this purpose, the law expands the functions and the institutional scope of the Interagency Council on Statistical Policy (ICSP), established under section 3504 (e)(8) of title 44, which designates the head of the Office of Management and Budget as head of the Council. In the 1980s, ICSP was an informal group that brought together representatives from federal statistical agencies to coordinate their activities, but it was authorized by statute as a formal council in 1995 (The Structure of the Federal Statistical System).. The Paperwork Reduction Act of 1995 has put the OMB, namely its Office of Information and Regulatory Affairs (OIRA) division, in charge of coordinating the U.S. Federal statistical system (Statistical Programs & Standards). The head of OIRA’s Statistical and Science Policy Office is also the Chief Statistician of the U.S.,[3] who hosts the meetings of the ICSP on a monthly basis. Under the EBPA, heads of statistical units or other officials with appropriate expertise from other federal agencies will also join the ICSP, which will have more responsibilities.  

3.5 Upcoming assessment 

The new law also establishes the position of Chief Data Officer (CDO) in each agency, who are “responsible for lifecycle data management,” as well as managing “data assets of the agency, including the standardization of data format, sharing of data assets in accordance with applicable law,” among fourteen other duties outlined in the law. Furthermore, section § 3520A of the EBPA provisions the establishment of the Chief Data Officer Council, which also falls under the OMB, but is separate from the ICSP. It is a temporary council that brings together representatives from 39 federal agencies. The CDO Council is assigned a number of tasks to complete before January 2025 (when it will disintegrate) (About Us. Federal CDO Council): “1. establish Governmentwide best practices for the use, protection, dissemination, and generation of data; 2. promote and encourage data sharing agreements between agencies; 3. identify ways in which agencies can improve upon the production of evidence for use in policymaking,” etc. So, certain provisions of EBPA are still in the assessment phase, and it will take a couple more years for EBPA to fully unpack.

4. Privacy

4.1 What is privacy?

The word “privacy” traces its roots to the Latin word privus, which means separate or single. The Merriam-Webster dictionary offers two definitions for privacy: 1. the quality or state of being apart from company or observation; 2. freedom from unauthorized intrusion. However, there are different approaches to privacy in the scholarly community and, consequently, different definitions. Generally, the significance and value of privacy may change depending on the social, political, and cultural circumstances, which has made it an elusive concept for a consensus definition. Nonetheless, the debate around privacy has been trending since the mid-twentieth century. It is not likely to end anytime soon, as modern technologies move us into uncharted territories with new friction points.

4.2 Privacy as a human right

A popular privacy perspective views it as a fundamental human right, or “the right to be left alone,” protected by law (MacCarthy, 2017). This approach recognizes an individual’s right to personal physical and informational space protected from external intrusion. The United States Constitution provides certain privacy protections. The Constitution’s fourth Amendment states: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated.” The Fifth Amendment provides conditional protections for private information by creating the right against self-incrimination. In the U.S. Common Law system, there are a number of court cases, such as Griswold v Connecticut, Lawrence v Texas, among others, that broaden the scope of these constitutional privacy protections. Additionally, the United States has dozens of legislation providing sectoral privacy protections. For example, the aforementioned HIPAA provides privacy protections for medical records, or the Family Educational Rights and Privacy Act legally restricts access to student records.

4.3 Privacy as a harm

Another privacy perspective is to view it as a right to be protected from harm. This approach shifts the focus of privacy debate from an individual to the level of society and asks, what the implications of data collection are on the society overall. This leaves a smaller space for personal information protection, which applies only when there is direct and tangible harm to the individual. Privacy as a harm framework emerged in the 1970s when one of its pioneers, Richard Posner wrote that we have two economic goods, “privacy” and “prying,” and expanding the privacy protections of individuals while contracting the rights of organizations collecting data is against our common interests (Posner, 1978). More recently, Howard Beales and Timothy Muris made a case for privacy in the harm framework by highlighting the example of credit score reporting since “collecting financial information about individuals has made loans more accessible to general public” (Beales & Muris, 2008). So, the social benefit of more accessible loans trumps the individual’s right to withhold financial information. This approach also prioritizes data protection over data collection and emphasizes the right to be protected from harmful externalities of data versus the data collection itself.

4.4 Privacy in social context

The most recent addition to the privacy debate was made by Hellen Nissenbaum, Professor at Cornell University. In her 2007 book, “Privacy in Context” Nissenbaum laid out a new privacy framework, which integrates elements from both the human rights and harm frameworks. Nissenbaum builds on the premise that privacy is a social construct, so its interpretation and application may vary depending on the social circumstances. From this vantage point, structured social factors such as canonical activities, roles, norms, and values, define the optimal degree of data access and visibility (Nissenbaum, p. 17). For example, the doctor you are visiting may have access to your medical records, but an insurance company may not. A basic quality of the social context framework is that privacy does not stop the flow of information but facilitates the information flow to some stakeholders while restricting it for others (MacCarthy, 2017). It is hard to disagree with Nissenbaum that privacy is a social construct, the value of which tends to change across geographic space, time, and other conditional factors. Especially nowadays, data has become omnipresent, and a uniform, rigid approach to all privacy issues cannot be the solution moving forward. Sometimes privacy is an inalienable human right. Other times there is a common public interest in sharing certain pieces of information that would be otherwise considered private. So, the social context framework offers a matrix that is broad, structured, and flexible enough to be applied across the privacy landscape.

4.5 Reasonable expectation of privacy

One of the most common reference points in the debate about privacy is the notion of “reasonable expectation of privacy.” It traces back to the seminal Supreme Court case Katz v. United States, which took place in the 1960s when the debate around the first National Data Center was ongoing. The Court’s decision expanded the Fourth Amendment privacy protections to include “what [a person] seeks to preserve as private, even in an area accessible to the public.” In concurrence with the final decision, Justice John Harlan established a two-part privacy test, which relies on the subjective expectations of the individual under query and the objective expectations of privacy by society as a whole. However, Hellen Nissenbaum, along with other contemporary privacy scholars, believes that due to the impact of modern disruptive technologies, the binary approach to privacy of inside/outside, secret/not secret or expected/not expected is somewhat outdated. Nissenbaum writes that previously “people could count on going unnoticed and unknown in public arenas; they could count on disinterest in the myriad scattered details about them” (Nissenbaum, p. 116), but now it has become far more complicated. New technologies allow capturing myriad details or data points about us into centralized databases, which adds new layers to the privacy debate where little details make big differences.

5. Analysis

This section takes a closer look at the implications of EBPA by asking what the risks and opportunities are in the centralization of federal data from a privacy standpoint. 

5.1 Data Centralization

Contrary to one of the top recommendations in the CEP report, the EBPA did not establish a National Secure Data Service, but it did create frameworks for interagency coordination and data centralization. We discussed earlier the expanded role of the ICSP and the temporary Chief Data Officers Council. EBPA also established another temporary council, Advisory Committee on Data for Evidence Building, that brings together Evaluation Officers, Chief Data Officers, and other managers responsible for data handling across the federal statistical system. Currently, the Advisory Committee is administered by the Census Bureau and the Bureau of Economic Analysis (BEA) under the Department of Commerce and works closely with the Office of Management and Budget. (Advisory Committee on Data for Evidence Building). In its Year 1 report, published in October 2021, the Advisory Committee has already affirmed the need for the establishment of the National Secure Data Service, as proposed by the CEP (Advisory Committee on Data for Evidence Building: Year 1 Report).   

5.2 Advantages of the NSDS

It is hardly surprising because, from the beginning, one of the top priorities behind the EBPA was the creation of a centralized command and control mechanism over all the data the federal government generates. When CEP started its first meetings for the research on EBPA, it had sixteen talking points, four of which were about the NSDS. It included points such as “tiered access with a NSDS,” or the role of the NSDS in the federal evidence ecosystem (CEP report, p. 123). To follow up on an earlier discussion, in its final recommendations, CEP proposed that it would enable the OMB to create higher standards for data collection and protection, which could be applied across the country. So, the same level of national database protection principles would be applied to data from either small rural communities or large metropolitan areas. NSDS would also help curtail duplicative efforts and improve the efficiency of the statistical agencies. Consequently, it would also decrease the expenses on federal data and reduce the burden on the public.

5.3 Risks and vulnerabilities

However, the federal government is handling very large volumes of data on a routine basis, and the centralization of so much statistical information within the hands of one center creates new privacy risks. First, it may change the public perception of federal data and potentially create a burden on civic life. Second, increasing public access to federal statistics increases the risk to data confidentiality, and EBPA creates obligations for making significant amounts of datasets publicly available.  

5.4 Panopticon view

During the first round of privacy debates in the 1960s, Democratic Congressman from New Jersey, Cornelius Gallagher said that improving government efficiency promised by the idea of a National Data Center “would be paid for at the far greater expense of weakening the right to privacy of all American citizens” (Kraus, p. 11). While a privacy scholar Vance Packard concluded his Congressional testimony by noting that “my own hunch is that Big Brother, if he ever comes to these United States, may turn out to be not a greedy power seeker, but rather a relentless bureaucrat obsessed with efficiency” (Kraus, p. 10). As we discussed earlier, privacy is a social construct, and its social value and impact may change depending on the circumstances. For example, we might feel comfortable sharing our medical records with the hospital, educational records with the employer, and income statements with the Internal Revenue System, but it creates a different reality when someone is able to put it all together. It gives an impression that someone knows about you as much as you do, and you are no longer in charge of your privacy. This kind of public opinion is detrimental to civic life, even if it is not based on true facts, as perception becomes a reality, and people inhibit their freedom of self-expression.

One of the most influential philosophers of the 20th century, Michael Foucault, put forward the concept of panopticism. Its central argument is that people change their behavior, even when there is a modest chance that those in the position of power could watch them. Originally panopticon was a constructional design plan for prisons proposed by English philosopher Jeremy Bentham in the late 18th century. The idea is that the prison floor is designed in a circular form, where the prison guard sits in the very center and can see all the inmates, but they cannot see the guard. Foucault articulated that this creates a power dynamic, where inmates become their own surveillance because they do not know when they could be monitored.

5.5 Privacy: statistics vs. surveillance

However, there is an important distinction between the statistical analysis foretold by the EBPA and the type of surveillance assumed by the panopticon approach. Surveillance focuses on specific targets, whereas statistics processes aggregate data, and as we mentioned earlier, EBPA puts a heavy emphasis that data will be used for statistical purposes only. Part B of the Act is titled “Confidential Information Protection” and has several safeguards against abuses of the federal databases. For example, it suggests that those who handle the data will take a pledge of confidentiality and will be liable in front of the law for a Class E felony and could be

imprisoned for up to 5 years and/or fined up to $250 000.[4] EBPA also obliges the statistical agencies to clearly distinguish any information that could be used for non-statistical purposes and provide public notice about the actual purpose of the data. However, there are loopholes in the legislation about what will be the mechanisms and conditions for public communication. A 2019 survey by the Pew Research Center showed that 64% of Americans are concerned over the government’s use of public data, while 78% do not understand what government does with the collected data (Auxier, et al). It would be good to have more legal encouragement for the executive agencies like the NSDS to prioritize public accountability and engagement. 

5.6 Privacy vs. confidentiality vs. anonymity  

People working for the NSDS will also face technical challenges in preserving the confidentiality and anonymity of data. First, let us look at the distinctions between privacy, confidentiality, and anonymity. Confidentiality and anonymity are only about a person’s actions and data, but privacy is also about the person (Privacy and Confidentiality). For example, whether someone may ask you personal questions is a matter of privacy. However, whether they can share your responses with another person is a question of confidentiality. Confidentiality implies that the surveyor knows your identity but will not share it outside a certain social group. Anonymity refers to a condition where even the primary surveyor does not know or register your identity. Both confidentiality and anonymity fall under the bigger umbrella of privacy, but neither captures its full meaning.

5.7 Privacy legislations

Not only in the United States but around the world, privacy regulations do not apply to anonymized data. For example, European Union’s well-known privacy law, the General Data Protection Regulation, has a provision that states that “The principles of data protection should apply to any information concerning an identified or identifiable natural person… this Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes” (Recital 26: Not Applicable to Anonymous Data). The United States Privacy Act of 1974 also has a specific exemption for “statistical record,” which means “a record in a system of records maintained for statistical research or reporting purposes only and not used in whole or in part in making any determination about an identifiable individual” (Privacy Act of 1974). EBPA complies with this provision of the privacy law. However, in half a century since the Privacy Act was passed, many changes have happened both in the statistical science and technical capacity of machines to process data.

5.8 Reidentification

We have mentioned earlier the new methods and techniques for reidentification by triangulating data points from several anonymized data sets. Paul Ohm, Georgetown Law professor and a member of the CEP, wrote in his 2010 paper that “Reidentification science disrupts the privacy policy landscape by undermining the faith we have placed in anonymization… advances in reidentification expose these promises as too often illusory” (Ohm, Paul ). To avoid the traps of reconstruction algorithms, statistical experts have developed several data protection mechanisms. For example, for many years, the Census Bureau, forced to publish blocks of its data sets, has been using various noise-infusion techniques, such as ‘swapping,’ ‘blank-and-impute,’ ‘partially synthetic data’ and most recently, differential privacy (boyd and Sarathy, p. 7). These approaches preserve the integrity of the datasets and maintain their full value for most purposes without compromising confidentiality. However, in very few scenarios, these methods could result in minor deviations since data sets are manipulated. These manipulation methods cannot be shared publicly because that would undermine the confidentiality of the datasets. Consequently, these disclosure control methods create friction between data users and the Census Bureau.

6. Recommendations:

6.1 Study the impact on civic activism

In the early 1970’s Advisory Committee on Automated Personal Data Systems was established under the Department of Health, Education, and Welfare to research the potentially harmful consequences of automated personal data systems, effective safeguards to protect against those negative consequences, as well as “policy and practice relating to the issuance and use of Social Security Numbers” (U. S. Department of Health, Education and Welfare). The Committee published its final report titled “Records, Computers, and the Rights of Citizens” in 1973, which had a ripple effect on privacy laws and regulations around the world for the following decades. In the United States, it laid the foundations for the Fair Information Practice Principles, applied by the Federal Trade Commission to the private sector, and made an impact on the Privacy Act of 1974.

Much has changed since the 1970s, and more changes will come after the EBPA is fully unrolled. Now, the federal government needs to conduct a similar study to assess the impact of the EBPA and data centralization on civic activism and freedom of expression. The United States is by far the biggest experiment in human history, testing the power of a society built on individual liberties. One of the cornerstones of America’s success story is the value and emphasis it puts on freedom of self-expression. Even nominal burdens on privacy and civil liberties could be a very high cost to pay for the promises of EBPA. 

6.2 Revisiting the legislation

The findings of that report should be built into revising the Privacy Act of 1974. The latest change to the legislation was made in 1988 when Congress passed the Computer Matching and Privacy Protection Act, which requires that federal agencies “enter into written agreements with other agencies or non-Federal entities before disclosing records for use in computer matching programs.” On the official online database of the Congress (Congress.gov), there are 1,200 bills that have the word privacy in the title. Most of them have not passed the House floor, but it shows how complicated is the legal terrain on privacy in the United States. Ninety-four privacy bills were introduced in 1973-74, and then there have been, on average, twenty bills on privacy initiated every year.

Current legislation puts a heavy burden on the statistical agencies to respond to three competing demands. They have to produce good quality data, but they also have to protect the privacy of their respondents. Now, they are also obliged to make these datasets publicly available, which forces them to use various techniques such as differential privacy. However, that makes certain data consumers unhappy, as we can see from the experience of the Census Bureau. So, it would be good to relieve the statistical agencies of some of this burden and provide legal tools and justifications for the privacy protections applied to public datasets.

7. Conclusion

Over the years, U.S. federal statistical agencies have accumulated tremendous institutional expertise and technical capacity to produce large-scale, high-quality data. Now EBPA is rallying up the forces of the federal statistical agencies into a cohesive unit to provide a numerical insight into the performance of the executive branch. It will create an administrative mechanism for informing the government’s policy decisions, as well as a public accountability mechanism since large segments of the government data will be made publicly accessible. However, it also consolidated all the statistical information of the federal government into centralized databases, which creates new privacy risks and vulnerabilities. EBPA is yet to be fully unrolled, but one of its main consequences is expected to be the establishment of the NSDS, which will have an enormous weight on its shoulders as it will need to satisfy several competing demands. On both ends of the line, NSDS will be working with and for the American people, so it is very important to keep them informed and understand public impact and expectations. It is the right time for the U.S. government to conduct a study on the impact of centralized, automated databases on civic life, akin to the one conducted in 1973, and incorporate that into updating the privacy legislation.

References:

About Us. (2020). Federal CDO Council. https://www.cdo.gov/about-us/

Advisory Committee on Data for Evidence Building. (2022). U.S. Bureau of Economic Analysis (BEA). https://www.bea.gov/evidence

Advisory Committee on Data for Evidence Building: Year 1 Report. (2021, October). Office of Management and Budget. https://www.bea.gov/system/files/2021-10/acdeb-year-1-report.pdf

Auxier, B., Rainie, L., Anderson, M., Perrin, A., Kumar, M., & Turner, E. (2020, August 17). Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information. Pew Research Center: Internet, Science & Tech. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/

A Timeline of Census History. United States Census Bureau.

https://www.census.gov/history/img/timeline_census_history.bmp

Beales, Howard, & Muris, Timothy. “Choice or Consequences: Protecting Privacy in Commercial Information.” 75 U. Chi. L. Rev. 109 2008 pp. 109-120

Bipartisan Policy Center. Frequently Asked Questions Related to the Commission on Evidence-Based Policymaking’s Report. (2019, March). https://bipartisanpolicy.org/download/?file=/wp-content/uploads/2019/03/CEP-FAQs.pdf

boyd, d. & Sarathy, J. “Differential Perspetives: Epistemic Disconnects Surrounding the US Census Bureau’s Use of Differential Privacy”

Branscomb, Anne (1994). Who Owns Information?: From Privacy To Public Access.

Commission on Evidence-Based Policymaking. (2017, September). THE PROMISE OF EVIDENCE-BASED POLICYMAKING. Bipartisan Policy Center. https://bipartisanpolicy.org/download/?file=/wp-content/uploads/2019/03/Appendices-e-h-The-Promise-of-Evidence-Based-Policymaking-Report-of-the-Comission-on-Evidence-based-Policymaking.pdf

Data.Gov. (2022) About. https://data.gov/about/

Dr. Latanya Sweeney’s Home Page. (2021). http://latanyasweeney.org/

Gauthier, J. H. S. (2021). 1790 Overview – History – U.S. Census Bureau. United States Census Bureau. https://www.census.gov/history/www/through_the_decades/overview/1790.html

H.R.1770 – 115th Congress (2017–2018): OPEN Government Data Act. Congress.Gov | Library of Congress. https://www.congress.gov/bill/115th-congress/house-bill/1770

Igo, S. E. (2020). The Known Citizen: A History of Privacy in Modern America. Harvard University Press.

Mark MacCarthy, (2017). “Privacy Policy and Contextual Harm” 13 I/S: Journal of Law and Policyhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=3093253

Nissenbaum, Helen (2007). Privacy in Context. Stanford University Press. Kindle Edition

44 USC 3561: Definitions. Office of the Law Revision Counsel (2022).https://uscode.house.gov/view.xhtml?req=(title:44%20section:3561%20edition:prelim)%20OR%20(granuleid:USC-prelim-title44-section3561)&f=treesort&edition=prelim&num=0&jumpTo=true

The National Constitution Center (2022). The Constitution – Full Text. https://constitutioncenter.org/interactive-constitution/full-text

Paul Ohm. (n.d.). PaulOhm.Com. https://www.paulohm.com/

Puckett, C. (2009, July 1). The Story of the Social Security Number. Social Security Administration Research, Statistics, and Policy Analysis. https://www.ssa.gov/policy/docs/ssb/v69n2/v69n2p55.html

U. S. Senator Patty Murray (2017, November 1). Senator Murray, Speaker Ryan Introduce Evidence-Based Policymaking Legislation. https://www.murray.senate.gov/senator-murray-speaker-ryan-introduce-evidence-based-policymaking-legislation/

Legislative Bulletin (2019). The President Signs H.R. 4174, “Foundations for Evidence-Based Policymaking Act of 2018.” Social Security Administration

https://www.ssa.gov/legislation/legis_bulletin_021519.html

Meyer, M. (2018, October 31). Law, Ethics & Science of Re-identification Demonstrations. Harvard Law Petrie Flom Center. https://blog.petrieflom.law.harvard.edu/symposia/law-ethics-science-of-re-identification-demonstrations/

Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization” (August 13, 2009). UCLA Law Review, Vol. 57, p. 1701, 2010. Available at SSRN: https://ssrn.com/abstract=1450006

Posner, Richard. “The Right of Privacy.” Georgia Law Review 393 (1978) pp. 393 – 404

Privacy Act of 1974. (2021, April 30). Department of Justice. https://www.justice.gov/opcl/privacy-act-1974

Privacy and Confidentiality. (n.d.). CHOP Research Institute. https://irb.research.chop.edu/privacy-and-confidentiality

“Privacy.” Merriam-Webster.com Dictionary, Merriam-Webster, https://www.merriam-webster.com/dictionary/privacy. Accessed 3 May. 2022.

Public Law No: 115–435. Foundations for Evidence-Based Policymaking Act of 2018 Congress.gov. (2019). https://www.congress.gov/bill/115th-congress/house-bill/4174

Recital 26: Not Applicable to Anonymous Data. General Data Protection Regulation. (2016).

Ruyle, M. (2019, March 1). New Law Offers Reforms to Improve Access to Data, Confidentiality Protections | Amstat News. Magazine of the American Statistical Association. https://magazine.amstat.org/blog/2019/02/01/law-improves-data-confidentiality/

Statistical Programs & Standards. (2021, December 22).The White House.

https://www.whitehouse.gov/omb/information-regulatory-affairs/statistical-programs-standards/

The Structure of the Federal Statistical System. (n.d.). The White House. https://obamawhitehouse.archives.gov/omb/inforeg_statpolicy/bb-structure-federal-statistical-system

Understanding Confidentiality and Anonymity. (n.d.). The Evergreen State College. https://www.evergreen.edu/humansubjectsreview/confidentiality

U. S. Department of Health, Education and Welfare. (1973, July). Records, Computers and the Rights of Citizens. DHEW Publication. https://www.justice.gov/opcl/docs/rec-com-rights.pdf

Weir, M (2001). Welfare State. International Encyclopedia of the Social & Behavioral Sciences

https://doi.org/10.1016/B0-08-043076-7/01094-9


[1] General Service Administration proposed a similar idea in the 1970s to create an inter-connected network of federal government data systems, which did not succeed either. 

[2] They have a very user-friendly website operated by the Department of Justice at https://www.foia.gov/

[3] OIRA is also in charge of the cost-benefit analysis laid out in the President’s Executive Order 12866

[4] It is important to note that the Privacy Act of 1974 imposed note more than $5000 fine, which in today’s money equals around $30 000: “Any member, officer, or employee of the Commission… who knowing that disclosure of the specific material is so prohibited, willfully discloses the material in any manner to any person or agency not entitled to receive it, shall be guilty of a misdemeanor and fined not more than $5,000.”