You are here: Resources > FIDIS Deliverables > HighTechID > D3.8: Study on protocols with respect to identity and identification – an insight on network protocols and privacy-aware communication >

Resources

Identity Use Cases & Scenarios.
FIDIS Deliverables.
- Identity of Identity.
- Interoperability.
- Profiling.
- Forensic Implications.
- HighTechID.
  - D3.1: Overview on IMS.
  - D3.2: A study on PKI and biometrics.
  - D3.3: Study on Mobile Identity Management.
  - D3.5: Workshop on ID-Documents.
  - D3.6: Study on ID Documents.
  - D3.7: A Structured Collection on RFID Literature.
  - D3.8: Study on protocols with respect to identity and identification – an insight on network protocols and privacy-aware communication.
  - D3.9: Study on the Impact of Trusted Computing on Identity and Identity Management.
  - D3.10: Biometrics in identity management.
  - D3.11: Report on the Maintenance of the IMS Database.
  - D3.15: Report on the Maintenance of the ISM Database.
  - D3.17: Identity Management Systems – recent developments.
  - D12.1: Integrated Workshop on Emerging AmI Technologies.
  - D12.2: Study on Emerging AmI Technologies.
  - D12.3: A Holistic Privacy Framework for RFID Applications.
  - D12.4: Integrated Workshop on Emerging AmI.
  - D12.5: Use cases and scenarios of emerging technologies.
  - D12.6: A Study on ICT Implants.
  - D12.7: Identity-related Crime in Europe – Big Problem or Big Hype?.
  - D12.10: Normality Mining: Results from a Tracking Study.
- Privacy and legal-social content.
- Mobility and Identity.
- Other.
IDIS Journal.
FIDIS Interactive.
Press & Events.
In-House Journal.
Booklets
Identity in a Networked World.
Identity R/Evolution.

D3.8: Study on protocols with respect to identity and identification – an insight on network protocols and privacy-aware communication

Title:
APPLICATION LAYER PROTOCOLS

Application layer protocols

The protocols at the application layer are familiar to many people, at least by name. HTTP is one of the most popular protocols. It is used when surfing the web, indicated by the prefix “http” in front of web page addresses. Also well known are the SMTP and POP3 protocols, which are used for e-mail access, and of course FTP, which is a file transfer protocol.

DNS is one of the backbone protocols of the modern Internet. It provides mapping between easy-to-remember textual host names like www.fidis.net and not-that-easy-to-remember IP addresses like 80.237.131.150. The importance of the real-time transport protocol RTP is growing steadily, especially now with Voice over IP and similar applications.

HTTP

Functional description

HTTP stands for “HyperText Transfer Protocol”. It is a method used to transfer or convey information on the World Wide Web. Its original purpose was to provide a way to publish and retrieve HTML pages (Wikipedia: Hypertext Transfer Protocol 2007).

The HTTP protocol is standardised by the Internet Engineering Task Force (IETF) the most popular version HTTP 1.1 is defined in RFC 2616. HTTP is a request/response protocol used between clients (also called user agents) and servers. A client is an entity sending a request to a server. The server answers with a response. The client requests information (or data) from the server by initialising a connection, normally a TCP connection. The server processes the request and answers with a status message (e.g., “HTTP/1.1 200 OK”) and the requested data. Resources which can be accessed by a client are identified by so called Uniform Resource Identifiers (URIs). A well-known subset of URIs are URLs (Uniform Resource Locators), which are used for HTML documents, for example.

There are eight request methods defined for HTTP, e.g., GET, POST, HEAD, DELETE. The communication between client and server is unencrypted. The protocol information is transported via human-understandable header fields and header values. Here, “human-understandable” means that a human can easily conclude which protocol information is exchanged just by looking at the raw protocol data.

Identifiers and their uniqueness

HTTP is a stateless protocol. This means that each request and response is independent of any former or later requests or responses. For example, a server will process two subsequent requests by a client for two images which are part of the same HTML page totally independent of each other. Likewise, the responses by the server are independent. Because of this stateless approach (e.g., no “session” information needs to be stored to link different requests) the very basic HTTP protocol itself contains only the URL as an identifying property. However, there are several ways to integrate additional identifying information:

Cookies
If not explicitly forbidden by the user, any web browser allows web servers to set cookies. Cookies are small text files containing a limited amount of data. They are stored on the client machine of the user. If the user re-enters a web site which sets a cookie beforehand, that cookie will be automatically transferred to the web server.
To track and identify users, the server can store a unique ID in a cookie. By means of that ID, a server can recognise a user and trace its activities. Because the cookie can store only a small amount of data, the whole trace and related profiling information would be stored in a database on the web site. Normally this tracing would only work within the domain of the web site, as cookies are only sent back to the server which set them. To enable cross-site tracing, advertisement companies, such as DoubleClick Inc. (now acquired by Google) sends a cookie together with their web advertisements (e.g., pictures), which are displayed on many internet sites. Since these advertisement objects are all severed from the same server (even though the site being viewed maybe different) the server can read the ID stored within the cookie. As the advertisement company knows on which web site which advertisement is shown it can trace the surf behaviour (also known as “clickstream”) of a user even across multiple web sites.
Session Ids

Most servers support URL rewrite, which can be used to append a parameter to each link on web pages. Whenever the user clicks on such a link, the parameter is transmitted along with the requested resource to the server, thus allowing session management. The parameter often has the following form:

http://www.someURL.org/text.html?sid=sdf3s3rf39asdlv974

The last part of the URL, i.e., sid=sdf3s3rf39asdlv974, is the session ID identifying the current session.

Hidden form fields

Forms on web pages can contain hidden fields which are by default not visible to the user. The hidden fields usually cannot be edited from within the browser and contain additional data for the server, like a session ID. This data is transmitted with the rest of the form whenever this form is submitted by the user.

This list only contains some examples of how identifying information can be added to the HTTP protocol to allow tracking and tracing of users. Many more possibilities exist, especially when used in conjunction with the web (e.g., HTML documents).

Personal data

HTTP does, to some extent, contain personal data. HTTP has header fields like the Referer, the User-Agent, Accept and Accept-Language, which might contain information which reveal personal data.

Table : HTTP fields which contain personal data

Table 3 shows that HTTP headers contain personal data, though weak this may seem, fields like the Referer can contain quite sensitive data. Also the User-Agent can reveal interesting information: if the operating system of the user is “Linux” it might identify him as technically interested; if the version of the browser he uses is the latest available version (or even a “test” version) it might identify him as belonging to the group of “early adaptors” etc. Of course also the preferred language of the user as transmitted by the Accept-Language header field will reveal personal information (like cultural background) of the user.

Linkability: identifiability and profiling

The server can identify the user within a session by using cookies, session IDs or hidden form fields. But to map the available session ID to real user data (like the user name, address, buying habits etc.), the user has to provide some personal data to the server, first. This could happen when a user registers an account and logs in later.

Although unlikely, (probably unreliable) identification could take place by using a combination of available HTTP header fields like User-Agent, Accept and Accept-Language.

Avoidance or circumvention of information disclosure

The session-spanning, reliable identification by cookies or session IDs can be avoided if the user does not give any data to the server. Otherwise, for example by, opening an account at an online store, giving away personal data like address and credit card information and logging in later with the provided data, then the server can identify the user. Other than this, identification by the HTTP protocol alone is not likely, as described above.

It has to be noted though that long-living cookies can weaken the privacy of a user. Remember the example of the advertisement companies like DoubleClick or Google, which have a widespread network and which can collect information from many sources. Cookies created by companies like these with a lot of accessible data can lead to the identification of single users. To avoid this, cookies should be deleted each time the browser is closed. In this way, the lifetime of cookies is reduced severely, which makes it harder to identify a user.

In order to enhance privacy, user agents like browsers should only provide what is strictly required in the header fields. For example, the User-Agent field is usually not needed for surfing the web. Further on, the Referer should be disabled whenever possible.

There exist many software implementations – commercial and non commercial ones – which help the user to enhance his privacy while surfing the Web. One example of such filtering software is called “Privoxy”. “Privoxy is a web proxy with advanced filtering capabilities for protecting privacy, modifying web page data, managing cookies, controlling access, and removing ads, banners, pop-ups and other obnoxious Internet junk.”

FTP

Functional description

FTP is used to transfer data (files) between a client and a server. Clients connect to a FTP server in order to manipulate files, uploading or downloading them, renaming or deleting them etc. The FTP protocol runs exclusively over TCP, UDP is not supported. This makes sense, since UDP does not guarantee faultless transmission of data. Commands from the client to the server are sent over one connection to a certain port, data is sent via another connection. The control connection is idle while data is transferred. The objectives of FTP, as outlined by its RFC 959, are described as follows (Wikipedia: File Transfer Protocol 2007):

To promote sharing of files (computer software and/or data).
To encourage indirect or implicit use of remote computers.
To shield a user from variations in file storage systems among different hosts and platforms.
To transfer data reliably, and efficiently.

Some well-known weaknesses of the FTP protocol are:

The user name and password are sent in clear text from the client to the server.
Multiple connections are used.
A relatively high number of commands is needed to initiate file transfer, thus leading to a high latency.

Many FTP servers enable anonymous FTP, meaning that users can access (parts of) the server without a specified user name and password combination. The user name for an anonymous login is usually “anonymous”.

Identifiers and their uniqueness

The FTP protocol uses persistent connections, i.e., the client and server negotiate one or more ports over which they communicate. At least the connection for the control stream will remain open until the user logs out. Thus the server can trace the user’s actions, like which files have been downloaded. If the user logs in anonymously, the server has to track the user-interaction via its IP, which is given by the underlying network protocol TCP. The FTP protocol does not use any specific identifiers, except the log-in.

Personal data

If anonymous FTP access is used, the protocol does not contain any personal data itself, but underlying protocols do (like the IP address). When no anonymous log-in is used, the log-in data (i.e., user name and password) are personal data which is transferred to the server.

Linkability: identifiability and profiling

Users are identified via their user name and password, or, to be more precise, the user name and password are necessary to open a connection, which stays open as long as no time-out occurs. This open connection can be used to profile the user action, i.e., which directories have opened, which files up- or downloaded etc.

Avoidance or circumvention of information disclosure

The username/password can only be avoided by providing an anonymous log-in. If access to the data has to be controlled, some kind of identifier (authorisation token) has to be presented by the user.

In order to prevent eavesdropping on the connection, FTP can be enhanced by using an encrypted connection (like SSL). Thus the communication between client and server is run through a virtual tunnel, meaning that all requests and responses are encrypted. The FTPS protocol provides such measures. If FTPS cannot be used, IPSec and similar protocols at lower layers are available to protect the data.

SMTP

Figure : Example illustrating the protocols involved in an e-mail transmission

SMTP is a text-based mail protocol which offers push services by utilising a store and forward mechanism. SMTP is used by clients for sending plain text-messages to servers. Servers can forward text-messages to other servers via SMTP. To fetch e-mails, different protocols are used, like POP or IMAP. The general procedure and the different protocols involved are illustrated in Figure 2. In Step 1 and 3 the SMTP protocol is used, Step 2 is done by the DNS protocol and Step 4 by an e-mail fetching protocol like POP3 or IMAP.

Table : Sample SMTP conversation between e-mail client and server

An example of a run of the SMTP protocol between client and server is illustrated in Table 4 (Wikipedia: Simple Mail Transfer Protocol 2007). As can be seen in this example, no authentication takes place. Even so this can be seen as a good property from a privacy point of view it also poses a problem, since SMTP is a popular and widespread protocol, supported by many mail servers. Spammers can use this to send spam over open SMTP servers, which do not authenticate their clients. These SMTP servers are called “open relays”. To prevent this, SMTP extensions have been developed, like SMTPAuth or SMTP-After-POP. These extensions require some kind of authentication, in order to reduce the misuse of open SMTP servers by unauthorised users.

Identifiers and their uniqueness

The basic SMTP protocol per se contains no identifiers or identifying information. The sender has to state his domain, but the server cannot verify that the sender is really sending from this domain and has valid credentials. So an e-mail sent over SMTP could contain any sender address imaginable. The recipient of the e-mail has to be identified correctly; otherwise the e-mail cannot be delivered.

But in contrast to the first statement, SMTP might reveal a lot of identifying information if it is used in the usual way. Normally neither sender nor recipient will change their e-mail addresses for every e-mail they send/receive. Therefore one can link several SMTP protocol runs based on the sender or recipient e-mail addresses given. Moreover it is possible to send more than one e-mail within a SMTP session. The SMTP sever can then conclude that all the e-mails transmitted within a single SMTP session are sent by the same user.

If SMTP-After-POP or SMTPAuth is used, the SMTP server can verify the sender by some credentials (like user name and password), thus identifying the sender and validating its given sender address. These authorisation credentials can also be used to link multiple SMTP session to the same sender (user).

Personal data

The content of an e-mail can be seen as personal data. If the content is not encrypted, it is sent in clear text and can be “read” by any SMTP server forwarding the e-mail.

Furthermore, the sender and receiver address are both personal data, more so in conjunction.

Linkability: identifiability and profiling

If the users have to authenticate themselves reliably against the server, servers can create certain profiles of the e-mails sent. Every forwarding SMTP server involved in sending an e-mail can log this data, e.g., the originator, the recipient, the date, etc. Furthermore, since the text is sent in plain text, every server involved can “read” the content of the e-mail.

Even if the content is encrypted, the header is sent in plain text, so every server can create profiles for e-mails sent. The connection from the client to the first SMTP server can be secured, e.g., by SSL or similar tunnelling protocols, but this will only secure the integrity and confidentiality to the first server. If the e-mail is forwarded to other servers, a server-to-server SSL connection is required in order to protect the e-mail. It has to be noted that the e-mail still exists in plain text on each server.

Avoidance or circumvention of information disclosure

Providing the real recipients address cannot be avoided without (complex) extensions like re-mailers, which are introduced in the next section. If the user has to authenticate himself against the first SMTP server, the sender’s address can be verified. A user can prevent this by using SMTP servers which require no authentication, i.e., open relay servers.

The content of an e-mail can be secured using either symmetric or asymmetric cryptography, thus obtaining confidentiality and integrity of the data.

POP

POP is an acronym for “Post Office Protocol”, the third and currently most used version is POP3 as defined in RFC 1939. POP uses the TCP protocol to retrieve e-mails from a remote server. POP3, in contrast to SMTP, uses a pull mechanism to get the e-mails from the server. Thus, POP3 supports users with dial-up connections who are not online all the time. E-mails retrieved via POP3 can either be transferred to the client’s computer and then get deleted on the server, or they stay at the server. POP3 supports MIME to send non-ASCII attachments with e-mails, like (binary) images.

A client authenticates himself with a user name and password to the POP3-server. This data is normally sent unprotected in plain text. POP3 extensions like APOP encrypt the password before sending it from the client to the server for authentication.

Identifiers and their uniqueness

POP3 requires the client to authenticate himself via a user name and password. This can be used as an identifier.

Personal data

If the e-mail is not encrypted, personal data in the e-mail body can be read by the server and each other server forwarding an e-mail. The header cannot be encrypted, thus the sender and recipient identity, which must be valid, are readable for each processing mail server. Thus, a communication can be reconstructed easily.

Linkability: identifiability and profiling

The POP3 server knows about all the communication of its clients. Clients can only protect the text content by encrypting it, but the recipient address must be readable for the mail server in order to deliver the e-mail. Every mail server in between can read-out the header, i.e., get to know who sends an e-mail to whom.

Avoidance or circumvention of information disclosure

The encryption of the e-mail body (text content) can provide confidentiality and integrity. The communication between two clients cannot be hidden; at least the recipient address must be valid in order for the mail servers to deliver the e-mail correctly.

In order to send an e-mail without any type of return address, i.e., sending an e-mail which the receiver cannot associate to the sender by means of the header, services exist which strip the header of e-mails and redirect them to the intended target. These services are called (anonymous) remailers.

Remailers cannot guarantee privacy for the sender though. Next-level remailers, called mixmaster remailers (or “type 2” remailers), are more secure. Mixmaster remailers use advanced techniques to avoid tracing of e-mails, but usually it requires special client software to use these services. Security advances compared with normal remailers are for example that each sent e-mail has the same size, is encrypted and messages are sent through a couple of remailers (chaining) before being delivered to the recipient. Further information can be found at (Wikipedia: Mixmaster-Remailer 2007).

Remailers provide anonymity for the sender of a message. In order to get recipient anonymity, another approach is needed. Nymserver take messages with a remailer as “first” receiver. Additionally, an encrypted data block in the e-mail contains a symmetric key, the address of a second remailer and another encrypted data block. The first remailer, being able to decrypt the data block, can re-encrypt the message with the new symmetric key and send this and the received encrypted data block to the second remailer. This remailer decrypts the encrypted block, obtains a symmetric key, the address of a third remailer and another encrypted block. The system continues until one remailer at the end sends the e-mail to the recipient. The intended recipient decrypts the message with all given symmetric keys and gets the plain text content. This system requires a sophisticated infrastructure, where the key management (i.e., the spreading of the needed symmetric keys) especially poses a big problem.

DNS

The Domain Name System (DNS) is a hierarchical infrastructure for name resolution on the Internet. It allows the mapping of (numerical) IP addresses to user-friendly textual addresses, the well-known host/domain names.

Figure : Architecture of the Domain Name System

The DNS system can be visualised as a tree (see Figure 3). Each node and leaf holds at least one resource record, which itself holds information about the associated domain name. The DNS tree can be divided into sub-trees, called zones. Each zone consists of a collection of connected nodes and leafs. A zone is managed by a designated nameserver called “authoritative nameserver”. If a nameserver is queried for a domain it is not responsible for (i.e., the queried nameserver cannot resolve the request), the nameserver can forward the query to another nameserver and its answer to the originator (recursive querying). Another approach is to return the address of a list of different nameservers to the user when a query cannot be resolved (non-recursive querying). Upon a positive answer from a nameserver, the querying nameserver can cache this answer for further use in order to reduce latency and load (Guha, Francis 2007). A sample DNS query is illustrated in Figure 4

Figure : Sample DNS query

For each top-level domain there is at most one registry, for the German top-level domain “.de” the registry is the DENIC. The registries regulate the assignment of the possible mappings. A domain holder, also called a registrant, has to “lease” a domain from the registry for a certain fee.

With DNS, two types of affected entities must be distinguished:

Owner:
An owner is a legal person having a server with an IP address and a domain name which the owner wants to register with the Domain Name System in order for users (see below) to access the server and/or its services.
User:
A user is an entity (often a person) requiring a mapping, e.g., between domain name and IP address. This is needed for example when surfing the web and using domain names which have to be resolved to IP addresses.

The following sections will differentiate between the two affected entities “owner” and “user”.

Identifiers and their uniqueness

Owners

As an owner of a server one has to register requested host name with the server’s IP address with at least one DNS server so that the human readable addresses (like URLs) of the domain can be resolved. The registration process requires the usage of real world information by the owner, like name, address, contact data, etc.

The mandatory registration data is published by the registry in a so called WHOIS database. This database contains information about each and every domain holder, and can be accessed with the help of the WHOIS protocol by everyone.

Relevant for the case where an owner wants to register a mobile device (e.g. a laptop) is the fact that a registered IP address to a given domain name may reveal additional information about the mobile device, especially his current location. If a so called dynamic DNS service like DynDNS is used, which allows mobile devices with changing IP addresses to be reachable under a static domain name, the associated IP address may reveal information about the current location of that device. Services like DynDNS can normally be queried by everyone without any restriction (Guha, Francis 2007).

Users

A typical DNS query contains the information shown in Table 5.

Table : Fields of a DNS query

At first glance none of this data seems to contain any identifying values, at least none which can identify the entity sending the request. But at least the first DNS server knows who is sending this DNS query by underlying protocols like UDP providing information like the IP address. Thus a relation can be built between sender and requested host. In many cases the ISP of the requesting user maintains the first DNS server in order to reduce response time and to optimise by using a local cache.

Personal data

Owners

As described above, the registration of an IP address and/or domain name requires personal data about the registrant. Mandatory are fields like name, address, administrator contact information etc.

As described above, for mobile devices with a dynamic IP address but a static domain name, the IP address can reveal the location and thus information like movement patterns of the mobile device and consequently of the owner of the device (Guha, Francis 2007).

Users

In the usage of the DNS a user reveals which host-names or IP addresses he is interested in. Depending on them this could be seen as leakage of personal data (e.g., if www.aids.org or similar addresses are queried).

Linkability: identifiability and profiling

Owners

A lot of information can be gained by searching a WHOIS database. Not only is it possible to find out the owner of a certain domain name, but also all the domains a certain person/entity has registered. The location information for mobile users may lead to location profiling.

User

Primarily the first DNS server a user utilises for resolving domain names can collect a lot of data about the user. This DNS server is queried each time a mapping is needed (e.g., each time a new, unknown domain is used by the user), thus the DNS server can store the requests alongside the IP address of the user. If the DNS server is administrated by the ISP of the user, the ISP can easily link this data directly with all available personal data, like name, address etc.

Avoidance or circumvention of information disclosure

Owner

It does not seem to be necessary for the personal information about the owner of an IP address or domain name to be accessible publicly. At least a proxy should be available which masks this information up until a legitimate interest exists. A proxy could be a lawyer or a company which registers the domain on behalf of the user.

There are efforts to reform the registration process in order to provide more privacy to owners of domains and IP addresses. An ICANN key task force created a proposal in order to give more privacy options to domain name owners. The goal is to make requests for personal information of domains name owners much more expensive than they are now. This could be reached by giving the users the possibility to list e.g., a lawyer or a service provider as the contact person (Jesdanun 2007). Roessler gives additional information about DNS, WHOIS and privacy (Roessler 2002).

To avoid the leakage of location information for mobile devices, the DNS system has to be modified in order to prevent arbitrary people from requesting IP addresses to certain (private) domain names (Guha, Francis 2007).

User

Without further extensions it cannot be avoided that at least the first requested DNS server gets to know the sender of the request, because this DNS has to resolve the request, meaning it has to know the requested information (like IP address linked to a given domain name), and it has to send an answer to the correct user, meaning the DNS server needs to know the IP address of the user.

What can be done is the use of anonymisation services like “Tor” or similar approaches which (try to) hide the sender and maybe also the receiver of a message from eavesdroppers. Such services require a sophisticated infrastructure and often result in a high delay.

DNSSEC can be used to sign requests and responses from the client to its DNS server. What DNSSEC adds is primarily more secure name look-ups and reduced risk for manipulation of information and forged domains. But DNSSEC does not encrypt the DNS query itself in any way, so it does not achieve confidentiality. This means, there is no privacy in DNS queries:

”Most mobiles access a DNS server provided by the access network, which is typically configured with the DHCP protocol. The DNS server is able to record the names of the online servers contacted by the mobile. Even if the mobile connects to a VPN gateway and uses DNS services via a VPN tunnel, it may still rely on the local DNS server to resolve the VPN gateway name. This means that the DNS server in the access network learns the name of the organization to which the user belongs, and may enable it to identify the mobile with some accuracy. Furthermore, DNS requests are made recursively, which leaks the mobiles approximate location to the remote DNS servers. Thus, even if the actual data connections are forwarded via anonymizing proxies, the source of a DNS request may reveal the mobiles location to the peer endpoint.” (Aura, Zugenmaier 2004)

RTP

RTP is the acronym for Real-time Transport Protocol, which defines a standard for delivering real-time information (like audio or video) over the Internet. In today’s Internet it is of growing importance since it is used as transport protocol for Voice over IP (VoIP).

RTP uses two communication channels, one for the control information (via the “RTP Control Protocol” RTCP which uses TCP) and one for the data, normally using UDP. The session establishment, i.e., primarily the call setup and tear-down, is managed by an extra protocol, e.g., SIP (Session Initiation Protocol), best known from Voice over IP usage.

According to RFC 1889, the first RTP standard, the services provided by RTP include (Wikipedia: Real-time Transport Protocol 2007):

Payload-type identification – Indication of what kind of content is being carried;
Sequence numbering – PDU (Protocol Data Unit) sequence number;
Time stamping – allow synchronisation and jitter calculations;
Delivery monitoring.

The following sections will distinguish between RTP and its control protocol RTCP.

Identifiers and their uniqueness

RTP

The RTP packets themselves do not contain any direct privacy relevant information, only media information. Of course RTP uses UDP, which itself contains the IP address of the sender of the packet.

RTCP

More privacy relevant information than in the RTP protocol is contained in the RTCP packets, i.e., the source description packets, which can include personal data. RTCP is not mandatory for RTP, but it helps the sender to synchronise and optimise a RTP stream. Although not mandatory, some media providers using RTP may require the use of RTCP.

The RTCP may contain data like the names and affiliations of participants in a communication. This data is user-defined, it depends on the application whether users can control the sending and the content of this information or not. Furthermore, RTCP sends a canonical name (CNAME) with each packet. This CNAME includes the IP address of the sender and the user name of the participant. The IP address is a unique identifier to a certain level, the user name is at least unique within a session. The uniqueness is dependant on the frequency of name changes a user applies.

Personal data

RTP

A RTP packet contains no personal data in itself, but of course it contains the media data, which itself may contain personal data, like voice data from a phone conversation.

RTCP

The RTCP may contain personal data like the names and affiliations of participants in a communication, but the provision of this data is optional. The usage of the CNAME field however is mandatory. This CNAME includes the IP address of the sender and his user name. Both can be seen as personal data, especially the IP address, which is often static. Even if the IP address is dynamic, it can leak interesting information like the location of the user. The IP address is also available from the TCP packets sent. But an important issue arises, if the user works in an intranet with network address translation (NAT). NAT hides the user’s internal IP address (i.e., the IP address in the intranet) with an external IP address. This enables many users to communicate with only one external IP address. But the IP address used in the CNAME field is the internal address, thus the receiver obtains both the internal and the external IP address.

Linkability: identifiability and profiling

RTP

Profiling can take place by eavesdroppers who analyse traffic which reveals who is talking to whom and when. This is possible because RTP neither hides the sender’s nor the receiver’s IP address, which is needed for delivering the communication data. Even if both IP addresses are dynamical, they are static within a (communication) session, so profiling is possible at least for one session.

RTCP

RTCP leaks the IP address of the sender of RTCP packet by the use of the TCP protocol. Additionally, the CNAME field leaks the IP address of the sender and his user name. This eases profiling, because there are two fields which can be observed.

Avoidance or circumvention of information disclosure

RTP

To protect the confidentiality of the sent media data, i.e., the conversation content, the RTP payload (data) has to be encrypted. The RTP standard provides support for both RTP and RTCP encryption. To encrypt RTP data packets, the payload may need some padding in order to have a length which is supported by the used encryption method (e.g., DES). This is illustrated in Figure 5 where in step 1 the padding is done and in step 2 the encryption takes place (cf. Perkins 2003). Typically used ciphers for encryption are DES (not considered as secure anymore), triple DES or AES.

Figure : Encryption of the RTP payload

Alternatively SRTP (Secure RTP) can be used to provide confidentiality, and also authentication. Both SRTP and RTP with encryption rely on a secure key exchange via external protocols like MIKEY (Perkins 2003).

Another issue is the confidentiality of the circumstances of a communication, i.e., the information that a conversation is taking place at all. This problem cannot easily be solved as long as IP is used as the underlying transport protocol. One possibility is to use anonymisation services like Tor or AN.ON (cf. Section ). However, these solutions add extra latency. As low latency is one of the key factors for quality of service concerning real-time communication, this prevents the adoption of anonymisation services in most cases.

RTCP

The RTCP may contain personal data like the names and affiliations of participants in a communication. This might not be a problem for a teleconference within a company, but it could be inappropriate for someone listening to a radio stream. As the transmission of the personal data is optional, applications should let users control the usage of this “feature”.

Figure : RTCP encryption

To provide confidentiality for the instructions and information contained in RTCP packets, encryption can be used, as depicted in Figure 6 (cf. Perkins 2003). In the first step a random prefix is added. This is necessary, since many fields of a RTCP are static and well-known, thus easing attacks. The second step involves the encryption of the random data, the RTCP receiver report (RTCP RR, information about the received data which is important for QoS adjustments for the sender) and the RTCP sender description (RCTP SDES).

The header fields IP header and UPD header cannot be encrypted, since this is information needed for the delivery of the RTCP packets. Thus traffic analysis is still possible - only the content of the RTCP packets can be protected by the encryption.

schulte

9 / 30