You are here: Resources > FIDIS Deliverables > Forensic Implications > D5.4: Anonymity in electronic government: a case-study analysis of governments? identity knowledge > 
Statistical information in the health sector  Identification versus anonymity in e-government
 Case study: a federal agency collecting health data


Ingredients for anonymization techniques

Different techniques are available to anonymize data. The most effective one is to destroy parts of the data, namely the identifying parts. However, for the applications sketched above, this is not possible, since some linkability has to remain. 

The main ingredients for anonymization of data [BFS 1997, Jaquet-Chiffelle & Jeanneret 2001] come from the field of cryptography and we sketch them here without going into implementation or parameterization details which can be found for example in [Bauer 2000, Stallings 1998]. The core idea is to create a portable calculated pseudonym that protects the true identity of the patient efficiently.

  1. One-way hash-functions 

A hash-function takes as input any bit-string (or character string) and produces a bit-string of predefined length. The goal is to produce a “fingerprint”, or hash-code, of the input that has characteristics analogous to the human fingerprint, i.e., two persons (usually) have different fingerprints. 

Hash-functions must ensure uniqueness, i.e., given two different hash-codes, there must be two different input strings which have generated them. On the other hand, a hash-function does not guarantee that two different strings do have different hash-codes. Another requirement is that they are computationally easy (i.e., fast to compute). A typical application in computer science is the so-called hash-tables where the hash-codes determine the slot the data is put into.

Additionally, here the one-way characteristics requires that there is no (practically feasible) way to create an input which generates a given hash-code. A cryptographic hash-function has also the specificity that it is infeasible (in a reasonable amount of time) to create a collision: two elements having the same hash-code. Hence, they are cryptographically meaningful.

Typical example: SHA-1, SHA-2, MD5 

  1. Symmetric cryptography 

Also known as secret key cryptography. Very fast algorithms exist for encryption and decryption using symmetric cryptography, which means that the same key is used for both encryption and decryption. The sender and the receiver must know the very same key; they share a common secret. The generation of such a key is computationally trivial given a good source of random numbers, but the “transport” of the key is a major issue.  

Typical example: AES (Advanced Encryptions Standard), IDEA, Triple-DES 

  1. Public-key cryptography 

Also known as asymmetric cryptography. In contrast to symmetric cryptography, public-key algorithms have different keys, one for encryption, and another one for decryption. This allows one to distribute freely his public-key so that anyone is able to send him an encrypted message. The receiver keeps secret his private-key that allows him to decrypt the encrypted message. Public-key cryptography makes it possible to create digital signatures. Disadvantages are the computationally expensive key-generation and – in comparison to symmetric encryption – the much slower algorithms for encryption and decryption.  

Typical example: RSA. 

These ingredients are used for the protocol described in the next section. 


Statistical information in the health sector  fidis-wp5.del5.4-anonymity-egov_01.sxw  Case study: a federal agency collecting health data
37 / 45