In December 2020, a Committee of Experts (CoE)) appointed by the Indian government to deliberate on data governance released its report on Non-Personal Data Governance Framework (NPD Framework). The CoE has proposed creating national legislation and a regulator, the Non-Personal Data Authority (NPDA), to establish rights of India, Indian citizens and communities over non-personal data (NPD) collected and created in India. The stated aims of the framework are to i) generate economic, social and public value through processing and use of data, ii) incentivize innovation and encourage start-ups iii) address privacy concerns from the processing of non-personal data.
This is not the first time that India has attempted to bring NPD under a governance framework. The Personal Data Protection Bill (PDP Bill) introduced in Parliament in December 2019 mandated sharing of non-personal or anonymised personal data “to enable better targeting of delivery of services or formulation of evidence-based policies by the Central Government.” Although not a new area of policymaking, the renewed focus on NPD is noteworthy because if the proposed framework passes muster, India could become the first country to put in place a comprehensive framework for non-personal data.
The focus on data sharing in the report also highlights a significant departure from the Indian government’s data governance efforts. With increased digitalisation, managing the growing complexities arising from the collection, control and movement of data across borders has become a prime concern for governments. So far, the government has been focused on addressing privacy related concerns around the collection, storage, processing and transfer of personal data that have resulted in data localization mandates for personal data.
The PDP Bill mandates storage of ‘sensitive personal data’ within India and outlines conditions for its transfer outside India. The Bill also provides the Indian government with the power to classify any data as ‘critical personal data’ and mandate it be stored and processed only in India. The use of data localization for specific subsets of personal data (critical/sensitive) highlights the way governments utilize rights protection measures to establish or expand their discretionary powers over cross-border data flows. Similarly, tailored localization measures for ‘critical commercial data’ or ‘providers of public service’ indicate a use of security concerns to exert national boundaries over data flows.
Unlike the rights protection focus of the PDP Bill, the NPD framework uses the economic and social “value generation capacity of the data” as the rationale for exerting control over data. The committee’s recommendations on enabling public access to private data to derive social, public and economic value for citizens and communities in India flows from its understanding of data as a resource that can be owned and extracted. Another striking feature of the NPD framework are the tailored conditions of access and standards of care for data that has strategic value. Given that the NPD framework seeks to access greater amounts of data while data protection law focuses on limiting data collection and use, it will be interesting to see how the Indian government will harmonize the divergent approaches and distinct objectives of both these regulations going forward.
In this blogpost we build on our comments to the CoE to highlight some of the flaws in the proposed NPD framework. Although it is intended to provide regulatory certainty around the creation, stewardship of and decision-making on NPD in India, the complicated framework outlined by the Committee cannot achieve that goal. The NPD framework also raises questions about whether invalid assumptions about data ownership and value are driving the Indian government’s strategy and thinking.
NPD: Definitions, Roles, Obligations and Data-sharing Mechanisms
The report defines NPD as any data other than personal data but the committee has narrowed down this broad definition by distinguishing between types of NPD based on how it is produced:
- Data that was linked to individuals but has been stripped of personal data i.e., anonymised and aggregated data
- Data that does not pertain to individuals such as industrial data, data from infrastructural sensors or meteorological data
Data Custodians. Any government or private entity that undertakes the collection, storage, processing, and use of data is defined as a data custodian and companies who process data on behalf of clients or data custodians as data processors. Both data custodians and processors have data-sharing obligations, however, data processors such as cloud service providers have been exempted from sharing any NPD that is processed on behalf of data custodians.
Data Business. Data custodians or processors that meet certain thresholds of data collection defined by the regulatory authority are classified as data businesses. The thresholds are based on certain parameters like gross revenue, number of consumers/households/devices handled, percentage of revenues from consumer information, etc. All data businesses are required to register in India and must disclose meta-data about their collection, storage and processing practices into meta-data directories that will be managed by the NPDA. These directories will be accessible only to organizations registered in India.
Data Trustees. Although the term is not defined, any government or non-profit private organization like a community or industry body can register as a data trustee with the NPDA and request for the creation of High-value Datasets (HVDs). HVDs refer to datasets created for limited purposes like ‘public good’ and in the ‘public interest’. Each HVD can have only one data trustee, however, a single data trustee may be responsible for more than one HVD.
Data trustees collect NPD that constitutes HVD from various data businesses, including public or private data custodians or processors who are obligated to contribute to the HVD by sharing subsets of data that they collect. They also manage requests for accessing an HVD that can be raised by any organization (not individuals) registered in India and may levy a nominal charge as processing and maintenance fees. As stewards of HVDs data trustees are required to maintain data infrastructure, which refers to “technical-material elements like actual databases, APIs, organisational systems” and set up grievance redressal mechanisms.
NPDA. NPDA has the ultimate decision-making authority over NPD and entities producing NPD in India. The NPDA manages creation and access to the meta-data directory and is also responsible for determining the appropriateness of the chosen HVD based on the evaluation of the objective and impact of the HVD. Who can register as a data trustee is decided by the NPDA based on assessment of the capacity and capability of the potential data trustee to handle HVD and whether due process has been followed by the data trustee before proposing the creation of HVD.
Case for Regulation Is Not Clear
Under the NPD framework proprietary information, trade secrets, information that is likely to violate privacy of individuals or communities is excluded and data sharing must be specific, targeted at the three purposes defined in the report: sovereign, business and public good. The committee has not made any specific recommendations regarding sharing of data for business purposes since arrangements for such data sharing between two or more for-profit private entities already exist. Similarly, the report only reiterates the need for data sharing for sovereign purposes like national security or managing public emergencies since regulations for such data sharing already exist.
It is not clear what is the problem the NPD framework is addressing. Is the goal to create rules for free movement or portability of NPD or is the objective to prohibit hoarding of data in specific sectors? Or are the obligations and conditions for access and control of NPD a way to exert Indian sovereignty over data?
Since the committee has not proposed anything new regarding sharing data for business or sovereign purposes, the scope of the data sharing framework is limited to ‘public good’ purposes or the benefit of the larger community. While clarifying the purpose and conditions of data sharing is important, the notion of a ‘public good’ purpose does neither. Public good is subjective, can vary depending on the specific context or circumstances. Different communities are also likely to interpret public good differently or seek discrete benefits. Designing data-sharing arrangements around such indeterminate concepts can end up exacerbating contestations over data. Considering that frameworks for sharing of data for sovereign and business purposes already exist, and the lack of clarity on what counts as public good, it is important that the committee think about the need for regulation before mandating a complex data sharing framework.
Impact on competition, innovation and security
The committee believes that mandating data-sharing will spur competition and innovation as it addresses a key barrier to competition in markets: lack of access to data. However to be effective, any data-sharing framework needs to provide the correct incentives for making data available. Collecting, processing and curating NPD has costs associated with it. Data businesses bear these costs or invest resources in collecting NPD because they derive value from such data. The recommendation to provide open access to meta-data and share data through HVDs could undermine private incentives to produce NPD.
Not only will a mandatory data sharing regime disincentivize data businesses from building the facilities or services needed to collect and use NPD, it also creates the possibility that their competitors may end up gaining access to data through the data sharing mechanisms. Under the NPD framework, the date trustee decides what constitutes a HVD, but there is no oversight on its decisions. The data trustee can be used by businesses to gain access to data of their competitors. There are no mechanisms to ensure that the data trustee does not abuse this power.
The NPD framework undermines the ability of data businesses to gain exclusive benefits from the NPD they collect but also saddles them with potential costs. The duty of care makes them responsible for preventing any harms that might be done from any uses of the NPD collected by them. The data sharing obligations stand in tension with the duty of care assigned to entities collecting NPD in India. For example, mandatory sharing of meta-data and opting out of data anonymization by data principals could lead to privacy harms. The NPD framework also does not account for any security implications of creating a centralized meta-data directory or data being shared with organizations with inadequate security practices or without the technical expertise to handle data securely
The NPD framework is also likely to create incentives for firms looking to avoid data-sharing obligations to claim the dataset includes personal or sensitive data, or a trade secret that cannot be shared. Thus, a mandatory data-sharing regime may actually stymie innovation by discouraging sharing of data. Similarly, the categorization of data business and the various obligations that must be met in order to retain access over data collected in India raises the costs of operating in the country. The role of data trustees as stewards of valuable datasets creates conditions that could remove incentives for companies to collect and create NPD in India impacting the competitiveness of the Indian data market.
Ownership of Non-Personal Data
At the heart of the report is the question of who owns NPD and the value derived from such data. Contestations over data-ownership emerge from different perceptions about how value is created in the data economy. India needs to start by recognizing that data is not a naturally occurring resource that is simply “collected” rather, data is a byproduct of interactions between operators of infrastructures that provide products, services and applications, and the users of those products and services and applications. In many cases there is no interaction with a person, but with sensors, machines and the environment.
The NPD framework reflects a debate over the source of NPD’s value. One side believes that value emerges only when data is used in a particular context, and only after someone has invested resources to collect, organize and combine it in useful ways. While the cost of reproducing data may be zero, the cost of generating it and making it accessible and useful is not zero. The other side argues that although currently companies that own the infrastructures for data collection derive value from data, individuals and communities are central to the creation of value from data and ownership rights should rest with them. The CoE has used this perspective to assign ownership rights to communities or countries from where data is sourced and create a mandatory data-sharing regime to ensure that the benefits from NPD accrues to India and its people.
The problem with this framing is that it assumes it is easy to ascertain the source of creation and assign corresponding ownership rights or that participating members of the community have the knowledge, ability and incentives to coordinate efforts to acquire access to NPD. If an airplane engine is manufactured by a US firm but is purchased by an Indian airline, what community owns the operational data generated by the engine, which is currently transmitted to and processed by the engine manufacturer? Is it owned by the engine manufacturer, or the home country of the engine manufacturer? Is it owned by the airline or the home country of the airline? Or is it owned by the people in the markets served by the airline?
The committee fails to recognize that in any collection of NPD there could be multiple overlapping communities involved e.g., national, provincial, municipality, neighborhood, and non-geographic groupings such as industry, ethnicity, occupation, and so on. Multiple communities could claim ownership over or seek access to the same dataset and the NPD framework seems to open the data to any and all of these competing or conflicting claims over data.
Assigning ownership over NPD based merely on the claim that it is sourced in India or derived from some kind of interaction between India and its communities also has important consequences for various entities involved in the process of production of NPD. The stewardship functions of data trustees and NPDA over the meta-data directory and HVDs respectively, confers authority to NPDA to decide who has access to data, under what conditions and for whose benefit. It is also worth noting that the authority on defining terms of access is different from authority on deciding who gets access based on these terms. If one does not want to call this ownership, it is still true that their role as stewards gives them the kind of control over access to, use of, and transferability of NPD typically associated with ownership in studies of law and economics.
Given the ambiguities around the term ‘community’ it is not surprising that community rights conceptualized under the framework can go only so far. The report notes that data custodians have a ‘duty of care’ or the responsibility to ensure that no harm comes to concerned persons or communities in relation to handling NPD. In addition to ensuring that no harms to persons or groups occur by re-identification of NPD, data trustees have a ‘duty of care’ to the concerned community for ensuring HVDs are only used in the interest of the community. While both data custodians and data trustees have been assigned a duty of care towards the “concerned community” it is not clear who or what constitutes this community.
Similarly, while the role of the data trustee has been created to ensure HVDs are only used in “the interest of the community” the framework does not include any mechanism for the community to ensure that their data is being used in their interest or the data trustee is acting in public interest. This framing also ignores the fact that the interest of the community could equally be served by enabling access to data or by preventing privacy risks or discriminatory outcomes that can happen through the collection of information at a societal level.
The report notes that “the community (through a non-profit organization – Section 8 company, Society, Trust) should be able to raise a complaint with a regulatory authority about harms emerging from sharing non-personal data about their community.” Not only does this redressal framework ignore the costs of registering a non-profit organization – Section 8 company, Society, Trust to raise a complaint, but also assumes that all communities are equal and will have the knowledge about harms emerging from sharing of NPD or the ability to coordinate efforts to seek redress against those harms.
All claims of data ownership stem from viewing data as property; i.e., as a resource that can be controlled and traded individually or collectively in exchange for social and economic benefits. Community rights or ownership of data is similarly derived from the understanding of data as property even if rooted in the perspective that data is valuable in the aggregate and harms from data can also be collective. Given the complexities of ascertaining data ownership, the community rights framing in the NPR report does little apart from inserting yet another claim in the contestations over data.
Before India develops a separate regulatory framework for NPD, the data protection law should be passed. Introduced in Parliament in 2019, the proposed PDP Bill is intended to overhaul India’s current data protection regime, which is currently governed by the Information Technology Act, 2000 and the rules thereunder. The PDP Bill was referred to a Joint Parliamentary Committee (“JPC”) in December 2019 which has been consulting government ministries, industry bodies and various stakeholders to get their views on the Bill. Following substantial delays in progress, the JPC’s final recommendations are due to be submitted to the Parliament soon.
More importantly, recent reports suggest that the JPC is now planning to expand the scope of the PDP Bill from just personal data to ‘encompass overall data protection’ and non-personal data. Reports on the scope of the PDP Bill being broadened to include NPD as well run contrary to the NPD Committee’s recommendation for all NPD-related provisions in the PDP Bill to be removed. More clarity on scope, compliance requirements including data localisation and cross border data transfer restrictions will be available once the JPC issues its report on the PDP Bill.
If introduced in its current form, the NPD framework is also likely to have the opposite of the intended effect. The incentives to both produce and share data will be undercut, or gamed. The objectives behind introducing the NPD framework and the principles on which the framework is built, such as data ownership and community rights, are murky. We recommend that the CoE think through the incentives for data production and sharing, and do a better job documenting what, if anything, is wrong with existing suppliers and uses of NPD.
Although the NPD data sharing regime is justified by an appeal to public good purposes, the mandatory provisions could be viewed as an attempt to use data sharing as a barrier to foreign service providers. We recommend that the CoE address the lack of oversight over the role of the data trustee, and put in place safeguards against misuse of data sharing obligations. Given the impact of ill-considered data governance laws on the digital economy, we recommend the committee abandon the NPD framework and focus on creating sector-specific data sharing frameworks instead of regulating NPD.