D13.3: Study on ID number policies: Future of IDentity in the Information Society

You are here: Resources > FIDIS Deliverables > Privacy and legal-social content > D13.3: Study on ID number policies >

Resources

Title:
ID-NUMBER POLICIES AND PROFILING PRACTICES

ID-Number Policies and Profiling Practices

Mireille Hildebrandt, Vrije Universiteit Brussel

In this section we will explore the relationship between ID-number policies and advanced profiling practices, as envisioned in scenarios of Ambient Intelligence. After summarising the key findings of FIDIS research on profiling in the context of AmI (3.6.1) we will discuss the legal-technological infrastructure that enables automatic and autonomic profiling (3.6.2). To this effect we will explore three scenarios of AmI, defined in a FIDIS workshop of January 26th 2007, with regard to the choice between a policy of single or multiple ID numbers to be used in the public and/or private sphere. We will conclude with comparison of unification (attainable by means of a single ID-number policy) and interoperability (attainable by means of a multiple ID-numbers policy) (3.6.3).

Key findings of FIDIS research on profiling (workpackage 7)

Profiling is another term for pattern recognition. All living organisms cope with their environment thanks to permanent profiling of and adaptation to (and of) their environment. Automatic profiling (based on clustering, association rules, etc.) produces a new type of exploratory knowledge, used for decision-making in the context of business, insurance, credit-scoring, health(care), anti-money laundering and criminal investigation. The correlations ‘discovered’ in the process of KDD (knowledge discovery in databases) may allow service providers and government authorities to predict the habits and preferences of individual citizens, without them being aware of this. This facilitates targeted services, preventive medicine, crime prevention, but also allows manipulation and discrimination.

Data protection regimes focus on personal data, not on the results of KDD. In the case of anonymisation D95/46EC may not even be applicable, while anonymisation does not exclude the use of data mining techniques such as KDD. In fact, group profiling mostly builds on inferences made on the basis of large amounts of anonymised data (entirely outside the scope of data protection), while the application of such profiles does impact individual persons and societal checks and balances. The legal status of such inferred group profiles is unclear.

The potential impact of the application of group profiles regards both more and less than the traditional concerns about privacy and security; the focus should be on equality, fairness, liability next to privacy and security.

The implications of profiling for democracy and rule of law disclose a need to rethink the relation between law, technology and public goods like e.g. privacy, security, equality, fairness and the possibility to attribute liability in the case of harm caused.

Data protection is focused on data minimisation (prohibition of unlimited collection, prohibition of use for other purposes) and seems to run counter to the need for maximum data collection needed to detect which data are relevant. The paradigms of data minimisation and KDD seem incompatible, because KDD is used to find out which data are relevant all data are needed. Also, data protection focuses on data instead of knowledge, losing interest after data has been anonymised, while the application of group profiles to individual citizens may have more of an impact than the use of personal data. Transparency is needed, for which reason we have introduced the principle of minimisation of knowledge asymmetry.

Enabling legal-technological framework for automatic profiling

Automatic profiling (KDD) requires access to as much data as possible, not because all data are deemed relevant but because KDD is used to establish which data are relevant. In the vision of AmI, the promotion of the Internet of Things, based on pro-active computing, autonomic computing and multi-agent systems (MAS), profiling is the enabling technology. It provides the only way to distinguish noise from information and without such discrimination the systems response would become inadequate, causing inefficiency, ineffectiveness, irritation and risk dangerous malfunction. This is the reason that in the vision of AmI the environment is the embodiment of sensor technologies, RFID systems, all interconnected via wireless M2M (MachineToMachine) communications and online databases. The ubiquitous, pervasive and real time monitoring of each and every move, change in temperature, sound or whatever should deliver the content for databases that are continuously updated and mined for significant patterns.

The identification of such patterns allows the identification of individual subjects on the basis of, for instance, behavioural biometric profiling (BBP), without necessarily identifying the individual’s name or address. This allows continuous (re)identification of an individual as the same individual – also within different contexts if the pattern recognition is exchanged – while the whole process falls outside the scope of data protection.

In as far as this is a problem, e.g. because a person is not aware of being categorized and not aware of the way the profile influences the risks and opportunities he is offered, one may need a legal right of access to the profiles that are applied. As will be analysed in FIDIS deliverable 7.9 and in the 4th workplan, such a legal right faces two types of obstructions:

the profiles and/or the database in which they are stored, may be protected by an intellectual right or fall within the scope of the trade secret;
even if these profiles can be accessed, the amount of profiles that is continuously constructed, applied and reconstructed can only be assessed by means of M2M communication. This generates the problem of how a human person can learn to access and assess the knowledge available on her own device: how to imagine a HMI (Human-Machine Interface) that provides meaningful information for the individual citizen.

Solutions for these obstructions are beyond the scope of this deliverable, but they will definitely need a joint effort of computer engineers, legal experts and policymakers.

However, one could try to imagine a situation in which individual subjects are monitored, profiled and identified in an AmI environment to the extent needed to adapt the environment, without necessitating any kind of transcontextual identification. In workpackage 3 this type of privacy protection has been developed conceptually by describing the identity of a person in terms of roles and partial identities, which allow a person to disclose only those personal data relevant within the specific context of either home, work, entertainment, travel, taxation etc. This approach fits with the purpose limitation principle and the requirement for consent in the case a data controller wants to transfer personal data to another organisation. The point of departure is a type of identity management based on user control. The technique to limit data exchange to data concerning the relevant partial identity is the use of pseudonyms.

In the FIDIS workshop on Ambient Law (26th January 2007, deliverable 7.8) 3 scenario’s have been identified concerning Ambient Intelligence:

scenario I is user-centric: the user is empowered in AmI, carrying a device with which to control the environment, for example, by determining which data can be exchanged between user and environment. This may be a ‘privacy-friendly’ and perhaps a commercial doom scenario. Key concepts are ‘data minimisation’, ‘contextual integrity’, ‘partial identities’ (pseudonyms).

scenario II is provider-centric: AmI is controlled by the providers of services (and goods, if there still are goods by then). The environment knows exactly who is where and will interact without consent, and perhaps without knowledge, of the user. Data flows freely between users and their devices, service providers, and perhaps third parties as well. This may be a ‘user-friendly’ and commercial Valhalla scenario. Key concepts are ‘data optimisation’, ‘networked environment’ and ‘distributed intelligence’ (the intelligence flows from the interconnectivity).

scenario III is a mix: in acknowledging that hiding data can make the environment less intelligent, while unlimited access to data can make individual citizens vulnerable to undesirable profiling, this scenario aims to achieve some kind of balance by minimising knowledge asymmetry.

These scenarios will be developed in the report on Ambient Law (D7.9), which is due at a later point in time. To assess the choices to be made in ID-number policies, however, it seems highly relevant to determine what impact the choice between single and multiple identifiers would mean for the feasibility of each scenario. Hereunder we will briefly indicate the impact of the use of either single or multiple identifiers within each scenario.

User control

In this scenario the user determines if and which data she will ‘leak’. This will severely limit the capacity of the networked environment to match existing group profiles with data of the user; and it will also restrict the construction and testing of profiles because the data from which profiles are inferred are not complete. Some would claim that his will limit or make impossible the intelligence of the environment. We should note that not ‘leaking’ your data has an impact on the construction of group profiles, which will in the end be less accurate the more people choose to ‘hide’ their data. Hiding your data may thus result in the inaccuracy of group profiles applied to others, causing them irritation or even harm. This is not an argument against hiding one’s data as it may also provide grounds against profiling whatsoever.

In the end this scenario may boil down to the fact that profiles are not inferred by the environment but programmed on the basis of a persons deliberate input. This allows for the use of partial identities and pseudonyms, which, combined with unlinkability beyond the relevant context would protect privacy in the traditional sense of non-disclosure of information.

A unique identifier would severely impact this scenario, because it allows governments and/or server providers to link the pseudonyms via this one cross-contextual identifier, which would enable profilers to link pseudonyms and in the end to create rich profiles out of the profiles linked with the different pseudonyms of the same person.

To facilitate user control via pseudonyms and unlinkability government policy should aim for a multiplicity of identifiers (separating ID numbers for healthcare, taxation, administration of justice, credit-rating and marketing).

We note that this preference for multiple identifiers does not only concern the fact that it facilitates unlinkability of personal data between contexts, but foremost facilitates unlinkability of group profiles per context. Considering the sophistication of profiles in some contexts (e.g. predicting the occurrence of disease) the risks of linking profiles across different contexts is way beyond the risk of linking personal data.

The transparent consumer- citizen

If the control of information flows is with the providers we may presume that data as well as profiles will be sold or kept secret to the extent that this generates profits and/or competitive advantages. The environment will be in constant flux, combining real time monitoring with autonomic profiling and automatic adaptation. At the same time government agencies may generate and/or buy profiles to anticipate citizens preferences as well as their violation of legal rules.

However, in the case of AmI many of the collected data will not necessarily be linked to a person’s name or address. To achieve an intelligent environment BBP may be one of the most important enabling technologies, allowing pattern recognition and the identification of a person as the same person without a need to identify the person by name or address. This could mean that even in this full-fledged AmI scenario – absent any substantial user control – a person could enjoy the benefits of customised services without being identified in the traditional sense of a unique identifier like a name or address. This means that the introduction of a unique identifier in this context could still make consumers and citizens substantially more transparent, by facilitating linkage of profile to the number ID and linking different profiles to each other via this single number ID. Especially in the case that a unique identifier was to be implemented in both the public and the private sphere, this could easily create the Big Brother watching all of us.

Analysing the violation of ‘privacy in public’, Helen Nissenbaum has described the increase of public surveillance technologies that tend to make people transparent in their public behaviour. In her argument defending the need to protect ‘privacy in public’ Nissenbaum has introduced the concept of ‘privacy as contextual integrity’. Her main aim is to object to a universal definition of privacy that restricts privacy to:

limiting surveillance of citizens and use of information about them by government agents
restricting access to sensitive, personal or private information
curtailing intrusion into places deemed private or personal

Instead of this a-contextual definition of privacy she advocates a more refined understanding, which takes into account:

norms of the appropriateness of a specific information flow
norms of flow or distribution of information

Her basic point is that to determine what should be considered a violation of privacy depends on the context and the (a)symmetry of power relations involved. Such contextual determination implies flexibility and a keen eye for detail, but it does not mean that ‘context is all’ in the sense that general rules lose their meaning. Norms of appropriateness and norms of distribution need to be inscribed at the constitutional, the legislative, the administrative and the judicial level: this would acknowledge the fact that privacy is an underdetermined concept with an open texture, though not undetermined and not open to the extent that it can mean anything.

In the mixed scenario the intelligence of the environment is distributed (which is also the case in the second scenario), but

the flow of information is not unlimited (not every exchange of data or profiles is appropriate), and
the transparency of consumer-citizens is countered by transparency of profiles (the flow of information is reciprocal, generating a fair distribution of knowledge and information)

In the case of multiple ID-numbers this combination of limitation of data/profile exchange and reciprocal transparency is supported by the multiplicity of partial identities, like in the case of the first scenario. The difference with the first scenario is that different contexts may be interoperable, depending on the appropriateness of the exchange; the difference with the second scenario is that different contexts do not have random access to data or profiles generated. A difference with both the first and the second scenario is the reciprocal transparency.

In the case of a unique ID-number it would be rather easy, like in the first and second scenario, to cross-link between contexts. This could make the limitation of data/profile exchange more difficult. On other hand, it could be fairly easy to gain access to all the data and profiles linked to this one ID-number, and this could facilitate transparency for a citizen in as far as she has access.

Unification or interoperability: single or multiple ID-number policies?

The choice between a single or multiple ID-number identifier(s) can be understood as the choice between unification and interoperability. A single unique identifier has the capacity to link all data and profiles regarding one person, thus providing a unification of all partial identities into one comprehensive profile. This unification makes transparency of the consumer-citizen easy and may even make transparency easy for the consumer-citizen if she can claim and manage access to the data connected with her ID-number. In fact a single ID-number could facilitate David Brin’s Transparent Society, discarding old-fashioned ideas like privacy, trusting the benefits of absolute reciprocal transparency.

Multiple ID-number policies allow to discriminate between different contexts, providing tailored ID-number policies depending on which type of privacy is appropriate per context. At the same time the reciprocity or distribution of the transparency can be tailored, depending on the need for checks and balances per context. This does not necessarily rule out interoperability between contexts (as would be the case in the first scenario), because ID-numbers may be linked, e.g. via clearing houses, to provide interoperability (faciliting the third scenario).

From the perspective of democracy and rule of law interoperability, contextual integrity and multiple identifiers seem preferable. They allow a fine-tuned combination of transparency and opacity tools to be built into the technological infrastructure of AmI, avoiding a kind of unification that makes individual citizens transparent to an unprecedented extent, while also avoiding a type of user control that precludes interoperability initiated by someone other than the user alltogether.

References

Brin 1998

D. Brin, The Transparent Society. Will Technology Force Us to Choose Between Privacy and Freedom?, Reading, Massachusetts: Perseus Books 1998.

Custers 2004

B. Custers, The Power of Knowledge. Ethical, Legal, and Technological Aspects of Data Mining and Group Profiling in Epidemiology, Nijmegen: Wolf Legal Publishers 2004.

Gutwirth & De Hert 2005

S. Gutwirth & P. De Hert, Privacy and Data Protection in a Democratic Constitutional State. Profiling: Implications for Democracy and Rule of Law, FIDIS deliverable 7.4. Brussels, 2005, available at: www.fidis.net

Hildebrandt 2006

M. Hildebrandt, ‘From Data to Knowledge: The challenges of a crucial technology.’, DuD - Datenschutz und Datensicherheit 30 (2006).

Hildebrandt 2007

M. Hildebrandt, Defining Profiling: A New Type of Knowledge. Profiling the European Citizen. A Cross-disciplinary Perspective, 2007. under review with Springer.

Hildebrandt & Backhouse 2005

M. Hildebrandt & J. Backhouse, Descriptive analysis and inventory of profiling practices. Brussels, FIDIS deliverable 7.2, 2005, available at: www.fidis.net

Hildebrandt & Gutwirth 2005

M. Hildebrandt & S. Gutwirth (Eds.), Implications of profiling practices on democracy and rule of law, Brussels: FIDIS Network of Excellence 2005.

ISTAG 2001

ISTAG, Scenarios for Ambient Intelligence in 2010, Information Society Technology Advisory Group 2001: available at: http://www.cordis.lu/ist/istag-reports.htm

ITU 2005

ITU, The Internet of Things. Geneva: International Telecommunications Union (ITU) 2005.

Jiang 2002

X. Jiang, Safeguard Privacy in Ubiquitous Computing with Decentralized Information Spaces: Bridging the Technical and the Social. Privacy Workshop September 29, 2002, University of California, Berkeley, Berkeley, available at: http://guir.berkeley.edu/pubs/ubicomp2002/privacyworkshop/papers/jiang-privacyworkshop.pdf

Nissenbaum 2004

H. Nissenbaum, ‘Privacy as Contextual Integrity’, Washington Law Review 79 (2004), pp. 101-140

Ronger, et al. 2005

P. H. H. Ronger, et al., A Multi-Agent Approach to Interest Profiling of Users. Multi-Agent Systems and Applications IV. 4th International Central and Eastern European Conference on Multi-Agent Systems, CEEMAS 2005, Budapest, Hungary, September 15 – 17, 2005. Proceedings. M. P¡echou¡cek, P. Petta and L. Zsolt Varga. Berlin Heidelberg, Springer. 3690: 326-335

Schreurs & Hildebrandt 2005

W. Schreurs & M. Hildebrandt, Legal Issues. Report on the Actual and Possible Profiling Techniques in the Field of Ambient Intelligence. W. Schreurs, M. Hildebrandt, M. Gasson and K. Warwick. Brussels, FIDIS deliverable 7.3, 2005, available at www.fidis.net.

Tennenhouse 2000

D. Tennenhouse, ‘Proactive Computing’, Communications of the ACM 43 (5) (2000), pp. 43-50

Want, et al. 2003

R. Want, et al., ’Comparing autonomic and proactive computing’, IBM Systems Journal 42 (1) (2003), pp. 129-136

Zarsky 2002-2003

T. Z. ‘"Mine Your Own Business!": Making the Case for the Implications of the Data Mining or Personal Information in the Forum of Public Opinion.’, Yale Journal of Law & Technology 5 (4) (2002-2003), pp. 17-47

10 / 20