You are here: Resources > FIDIS Deliverables > Forensic Implications > D5.4: Anonymity in electronic government: a case-study analysis of governments? identity knowledge > 
Case study: a federal agency collecting health data  Identification versus anonymity in e-government


Case study: international data collection for orthopedic evaluative research

A second example of an anonymous e-health database is the project Memdoc. This project has been initiated by the Institute for Evaluative Research of the University of Bern and is presented in [Roeder et al. 2004]. The goal of this project is to build very large registries of orthopedic cases. The pharmaceutical industry has well-known protocols for testing new medicines: first using animals, then healthy volunteers and finally sick volunteers. There exists no such possibility in orthopedic surgery. Once a hip has been implanted in a patient, we need to wait and see how long it resists inside a patient. The goal of a registry is to follow on a large scale (if possible for all the patients) a type of implants (for instance hip) in a country (or Europe-wide). The statistical results are used to measure the efficiency of a particular implant type or technique, such as to compare physicians or clinics regarding a benchmark (in order to reduce costs or increase efficiency).

The main difference between this case and the previous one is its international aspect. This has two consequences: Since nothing can be assumed regarding the IT infrastructure of the hospitals, data is transfered using a web interface. Moreover, since some European countries restrict the use and export of personal data, we can not have a central database containing all personal and medical data. We have a two-layer architecture, combining a central database and decentralized modules. There is only one central database for all the medical data, in order to have large datasets for statistics, but such a database has to be anonymous. The project uses decentralized web servers called “modules” to anonymize the data. The central server collects all medical data, while the different modules are used to store personal data.  

When a physician takes part in a registry, he has to submit his cases into the database. Each time he wants to report something, he connects to the module corresponding to his country using a web browser. He fills out a form containing the personal data of the patient, for instance names, date of birth, address or social security number. This information is stored on the module. Then the module creates on the server a new patient linked to the personal information using a random number. The physician fills out forms reporting medical information that is stored on the central server. 

This architecture allows a good separation between medical data that can be used for statistical purposes and personal data that is only used to search among one’s own patients, and is stored on the module. A patient belongs to one single clinic, this means that no physician outside this clinic can access any data of this person. If someone moves (or simply changes clinics), then he will have two records on the module corresponding to two different users on the server. 

The goal of the application is to follow orthopedic implants for a long time. We can not let users be lost when simply changing clinics. We need a solution to track such “moving patients”. 

  1. The first easy solution is to allow any physician having access to the module to see any patient. This solution can not be implemented for evident privacy reasons. Anybody could have access to the data of anyone else! 

  2. The second possibility is to check at the creation of each new patient if a patient having a similar identity already exists in the module; then we would reuse the same patient. This solution is not convenient, because any physician could access all the files stored on the server by forging new empty patients to read data already entered by colleagues. 

  3. Finally, the chosen solution relies on a one-way hash function (e.g., SHA-2) named h(x). For each module, we define a set of identifying attributes. It can be first and last names plus date of birth, it can also be first and maiden names plus date of birth or it could be the social security number of the patient; let us denote this information ID. Moreover, each module generates a unique key denoted by K. Then each patient receives a code generated by the module that corresponds to the result of the function: h(ID+K). This code is transferred to the central server and added to the record of the patient. This number is the same for this patient even if he changes clinics, etc. It is used for the statisticians on the central server, but can not be used by a physician to access the files of a colleague. 

Such an architecture allows one to store personal data locally and to centralize all the medical data. A patient can be followed all his life. This will help orthopedic research without harming privacy of the patients. This means also that in the light of the central question of this deliverable, namely “Is the identity knowledge in the government growing through the development of e-government” (cf section 1), this approach guarantees especially that no identifying information is transferred between countries. The data collected in the central database does not help the government to get more information about an individual.


Case study: a federal agency collecting health data  fidis-wp5.del5.4-anonymity-egov_01.sxw  Conclusion
39 / 45