You are here: Resources > FIDIS Deliverables > Profiling > D7.2: Descriptive analysis and inventory of profiling practices > 

D7.2: Descriptive analysis and inventory of profiling practices

1. Executive Summary  Foreword
2. WORKING DEFINITIONS OF PROFILING: SOME DISTINCTIONS
 3. Descriptive analysis of profiling

 

2. Working definitions of profiling: some distinctions

(VUB, Mireille Hildebrandt; input all partners via workshop 2-3 March 2005) 

2.1 How to identify profiling? 

 

In this paragraph we will construct some working definitions, meant to open a discussion rather than to close one. While trying to identify profiling we soon discovered that the term refers to rather different things that do not necessarily share common characteristics, but are related to each other in important ways. Pertinent examples are: (1) the term is used both for the construction and for the application of profiles; (2) group profiles have very different characteristics and a different impact from personalised profiles. Instead of trying to find indisputable characteristics that all meanings of profiling share, we will introduce a number of distinctions to enable a more refined analysis.  

 

To clarify the use of some other terms, we provide some working definitions of a small set of terms often used it the context of profiling in the glossary, three of which we introduce at this point, to prevent possible confusion: 

 

data subject:

data controller:

(end) user:

 

It is important to notice that the profiled data subject can be a person, an organisation, a thing or other ‘object’. In this deliverable we will focus on the use of profiling technologies to identify and represent the identity of persons and groups. 

 

2.2 Profiling as technique, technology and practice 

 

Profiling can provisionally be described as  

 

  1. the process of constructing or applying a profile of an individual or a group.  

 

This process involves techniques (methods) and technologies (combination of tangible instruments and techniques; hardware and software). To give an example: identifying someone by means of fingerprints is a technique that requires training. At the same time it is a technology, involving hardware (ink and cards and/or electronic imaging devices). Fingerprinting is a good example because it has been practised for a long time before the computer took over; thereby demonstrating that profiling is not new. Interestingly DNA-profiling has relativised the categorical identification that fingerprint-experts used to claim.

 

Apart from being a technique and a technology, profiling is also a practice: a specific way of doing things, within specific contexts, with specific purposes. It requires a learning process that integrates explicit with implicit knowledge. This means that profiling is a matter of expertise, of professional training and involves more than the mechanical application of explicit rules and procedures. Tesco’s Clubcard director, Tom Mason, is quoted as admitting: ‘You have to use intuition and creativity as well as statistical know-how, and you have to hope that you have identified the right things to test’.

 

In chapter 5 examples will be given of applications of profiling practices in the fields of marketing, employment, the financial sector, forensics and e-learning, some of which will be further elaborated in the Appendix.  

 

2.3 Profiles as knowledge construct  

 

When checking a dictionary we find three relevant entries for ‘profile’: (1) an outline; (2) a set of data; and (3) a concise biographical sketch. In the more specific literature on the type of profiles we are addressing here, we find that a profile is considered to be a knowledge construct, representing a subject (a person, a thing, an organisation or whatever). This ‘knowledge’ consists of patterns of correlated data and is often built on data collected over a period of time. When referring to profiles constructed on the basis of profiling technologies we could thus define a profile as ‘a set of correlated data that identify and represent a data subject’. When the data subject is a single person we speak of a personalised profile, when the data subject is a group/a category or a cluster we speak of a group profile. This distinction is central to this document, see also par. 2.6.

 

It should be obvious that a profile is not the same as the thing, group or person that is profiled. Certain salient data enable one to draw a picture, an outline, that represents the ‘original’, always framed from a certain perspective. This is an important point: even if profiles are inferred in real time (think of AmI) and change continuously, they will always remain a reference to an original that cannot be reduced to its profile. In the end it will ‘regulate your access to, and participation in, the European Information Society’.

 

This also means that – in the case of a person - the identity constructed during the process of profiling, must not be confused with the identity that the person being profiled experiences as her sense of self. The profile follows the logic of the recordable data, with the constraints inherent in computer technology. This issue, the inherently reductive character of a profile, is important because profiles may impact privacy and identity in the strong sense (concerning our sense of self). Since profiles will often affect our lives (providing or prohibiting access, enabling selection, inclusion and exclusion) it is of utmost importance to clarify in what ways and on what basis they affect our lives, without conflating profile and profiled person.

 

The issue is also important because in computer science and information theory it is often the case that ‘an old-fashioned semantic associated network is taken to be the essential structure of all human knowledge’.  Knowledge representation (like an ontology) is often understood as a mirror of reality, disregarding the discursive and/or semiotic nature of both knowledge and our perception of reality.  This can give rise to misunderstandings between the different disciplines that constitute the FIDIS consortium, challenging the partners to take the perspective of other disciplines to build more integrated forms of knowledge.

 

2.4 Automated and hand-made profiles 

 

Within the FIDIS network the main focus will be on automated profiling technologies and practices. The advance of automated profiling is connected with the increase in the size of data sets recorded in databases. Two factors contribute to this increase: the increase in the number of records of data subjects and the increase in the number of fields or attributes describing each data subject. This growth, in turn, has been driven by several factors. The increasing availability of computer systems and software applications, the generalised adoption of Internet and, in certain fields, the compulsory record-keeping mandated by government regulation mean that data is being produced and warehoused at unprecedented rates. As a result, the typical database has increased between 9 and 9,999 times during the past 5 years, as illustrated in table 1:

 

 

Table 1: The typical size of some databases

Types of Databases 

1999 

2004 

Growth in size 

Transactional 

100 gigabytes 

1 terabyte 

9 times 

Data warehouse 

1 terabyte 

100 terabytes 

99 times 

Data mart 

20 gigabytes 

1 terabyte 

49 times 

Mobile data 

100 megabytes 

10 gigabytes 

99 times 

Pervasive data 

100 kilobytes 

1 gigabyte 

9,999 times 

 

 

However, as we all know, long before computers made their way into everyday life, criminal investigators composed profiles of their unknown suspects, psychologists compiled profiles of people with similar personality disorders, marketing managers made profiles of different types of potential customers, and recruiting organisations wrote profiles of successful candidates for specific jobs. These profiles were often hand-made, even if based on established techniques and technologies.  

 

In this deliverable the focus will be on automated profiling technologies, that seem to introduce a new type of knowledge construction. The advance of these computerised profiling technologies does, however, not eliminate the handwork. First, some types of profiling can only be custom-made, since they do not involve masses of quantifiable data. Second, even when profiles are generated automatically, both the algorithms that enable the automation and the evaluation of the results will require professional handiwork.

 

2.5 Profiling online and offline behaviour 

 

In the field of automatically generated profiles, we must also discriminate between  profiling that takes online behaviour as its data-input, and profiling that concerns any type of off-line behaviour (e.g. using RFID-tags) or substance (e.g. using biometrics). The growth of online activities (web surfing, chatting, downloading information, subscribing to email newsletters, buying and selling of goods, booking of hotels, tickets for travel or theatre and other transactions) has increased the volume of data sets in databases, as mentioned in the last paragraph. This concerns information explicitly supplied by web users when applying for access or when concluding transactions on the web, but this is not the only (and probably not the main) source of information. Many data are recovered by tracking online behaviour by means of (third party) cookies that record the online activities of as many web users as possible, legally or illegally - discussed further in par. 3.3.2 and 3. With the advance of RFID technologies the tracking of off line behaviour may experience an upsurge, mirroring the possibilities of online tracking (scanner data, customer loyalty cards, transaction data of credit cards etc.), see e.g. par. 5.1.1. At the same time physical and anatomical biometrics seem to enable profiling the hardware of human beings to a previously undreamt-of extent, discussed in par. 3.3.4 and 5.4.  

 

2.6 Group-profiling and personalised profiling: identification and representation 

 

Profiles can be seen as knowledge constructs that represent and identify a data subject. Identification in this case does not mean that profiles should be reduced to tools of individuation. Other than a simple id-token, like an email-address or an attributed social security number, group profiles consist in correlated data that describe a person or group as a certain type of person or group, sharing a certain mix of static and/or dynamic attributes with others and thus belonging to the same group or category, even if not all attributes are shared by all members, see par. 3.2.4. In the case of personalised profiling, building on the processing of biometric or behavioural attributes of one person, a rich, sophisticated representation of a particular person can be constructed.

 

2.6.1 Group-profiling 

When looking at the definitions of identification, compiled during the first phase of FIDIS workpackage 2 (the identity of identity), it seems that these definitions are focused on individualising a person or – in other words – disclosing which set of attributes uniquely characterises a person. Identification thus seems based on the difference between one person and everybody else. Even though group-profiling is an instrument of identification, it adds another meaning. Instead of discriminating a person from all other persons, group profiling seems to focus on identifying a person with (as part of) a certain group of persons. Identification – of a person as belonging to a group - could than be defined as

 

the process of establishing that a subject is an element of a specific set of subjects, by means of the set of correlated attributes that defines the group  

 

However, this seems to presume that the attributes that constitute the group are all shared by every member of the group, which is often not the case. As will be described in paragraph 3.2.4, this is only the case when dealing with a distributive profile. This means that the set of attributes that define the group are distributed equally to each member of the group. In most cases we are dealing with non-distributive profiles, which means that the attributes that define the group are not all shared by all members of the group. As should be clear, the application of a non-distributive group profile to a member of the related group can give rise to problems if the non-distributive character of the profile is not taken into consideration.

 

If we link the idea of a profile, as described in par. 2.3 above, to identification technologies, a working definition of a group profile could be:  

 

  1. a group profile is a set of correlated data that identifies a group, and/or when applied identifies a person as a member of a group 

 

Interestingly this group can be a set of people that consider themselves a group, having some kind of interaction as such, but it might also be that a person is identified as belonging to the group of blue-eyed people, or the group of people with an increased risk of developing breast cancer. In that case the term ‘group’ means something entirely different. If we follow Custers on this issue, social science research has traditionally focused on groups that consider themselves a group, while profiling based on data-mining produces groups in the other sense. Group profiling in this sense is a form of categorisation and could be said to create a profile of an abstract person that does not necessarily apply to any particular person.

 

Group profiling will be discussed in par. 3.2. 

 

2.6.2 Personalised profiling 

Group profiling must be discriminated from the construction of profiles of a single person, for instance on the basis of sets of transactions or other data relating to one person. Most user modelling and biometric profiling fall into this category. Personalised profiling is highly relevant for the design of Ambient Intelligence (AmI) applications. Personalised and group profiling can easily be combined: (1) the use of personalised profiles can be combined with the use of group profiles if a person is estimated to belong to the relevant group; (2) databases containing personalised profiles can be mined to construct group profiles. 

 

Personalised profiling will be discussed in par. 3.3. 

 

2.7 Construction and application of profiles 

 

Another important distinction that should be made is between the process of constructing a profile, for instance by means of data-mining technologies, and the process of applying a profile: using a profile to identify a person as a specific individual (individuation) and/or as a member of a specific group or category (categorisation). Notwithstanding the importance of this distinction, in practice advanced profiling technologies combine construction and application of profiles: while identifying a data subject these technologies adjust the profile, thus continuously applying and reconstructing profiles – often in real time. 

 

Related distinctions concern bottom-up and top-down searches of databases; and correlating and monitoring of behavioural data. Par. 3.2.3, on data mining, will elaborate this.  

 

 

2.8 Conclusion: profiling and identification 

 

If we take the working definition of par. 2.6 we can define profiling as:  

 

  1. the process of constructing profiles (correlated data), that identify and represent either a person or a group/category/cluster,  

  2. and/or the application of profiles (correlated data) to identify and represent a person as a specific person or as member of a specific group/category/cluster;  

 

To understand the meaning of profiling, we should add the purpose of profiling. Rather than at individuation, profiling seems to aim for risk-assessment and/or assessment of opportunities of data subjects. This, however, cannot be taken for granted. If the interests of data user and data subject differ it may well be that the interests of the data controller, who pays for the whole process, will take precedence. Thus – in the end – what counts are the risks and opportunities for the data users. For this reason the purpose of profiling can best be formulated as: 

 

  1. aiming at the assessment of risks and/or opportunities for the data user (inferred from risks and opportunities concerning the data subject). 

 

 

 

In par. 4 the purposes will be further elaborated and analysed. First, in par. 3 we will analyse the processes of group and personalised profiling. 

 

 

 

1. Executive Summary  fidis-wp7-del7.2.profiling_practices_03.sxw  3. Descriptive analysis of profiling
3 / 10