You are here: Resources > FIDIS Deliverables > HighTechID > D12.2: Study on Emerging AmI Technologies > 
Supporting Emerging Technologies  Untitled
HOW GRID COMPUTING WORKS
 Context aware software and systems

 

How Grid Computing works

As already mentioned, the Grid requires that resources are shared beyond the local administrative domain. These resources include disk storage, memory, processing units, data, peripherals, scientific instruments, software, licenses or even individuals (e.g., experts). This dynamic collection of institutions, individuals, software and computational resources forms a new administrative domain, which is called Virtual Organisation (VO). Each entity participating in the VO must clearly and carefully specify what it wants to share, who will be allowed to share, which the sharing conditions will be. Examples of VOs are: the suppliers, distributors, stock managers, application service providers, storage service providers working in the supply chain management on one hand; the simulation systems, the models, storage service providers, technicians, engineers, involved in aerospace engineering on the other hand. Both examples include VOs which are collaboration-demanding with the first one having mainly strong requirements in data management, security and user-friendliness and the second one being computationally and data demanding. As it is clearly seen in the examples above, the purpose, the size, the structure, the lifetime and the scope of VOs may vary. Nevertheless, a thorough technical analysis of their requirements forces the identification of common concerns and requirements addressed by a set of services which implement these common core functionalities [Foster et al., (2001)]. These core functions of the Grid include execution management, data management, resource management and virtualisation, security and portals. In the following figure an example of an architecture integrating these core functionalities with a user (individual or application) accessing the Grid through the portal is depicted.

 

 


Figure : Example of a user accessing the Grid

 

In order for a user to be granted access to a grid resource, the service provider (SP) requires that a contract - the Service Level Agreement (SLA) - is firstly signed between the parties involved. The SLA should contain not only the legal description of the parties, but also offered and requested quality of service (QoS), technical and organisational security measures (Security Service Level Agreements, SSLAs), expected usage of the resources, SLA lifetime, compensations, penalties for not meeting service requirements, priority, as well as issues of pricing. In other words, the SLA specifies the terms of the agreement between the SP and the customer, including the charging of the services provided. Examples of the contents of the agreement include the amount of time the SP guarantees that the service will be up and running, the response time of the service, the mean time between failures of the service, the amount of time the service will require to be brought back up, the information rate. The SP publishes the services he can offer and the user-customer publishes the quality of service he wishes to be provided with. A cycle of negotiations takes place at this point with each of the two sides aiming at making the contract as profitable as possible for themselves. As soon as the SLA has been agreed on, the SP must guarantee that during service provision the agreed-upon QoS is actually offered.  

When the user wants to access the grid resources he has an SLA for, he accesses the Grid through an interface provided by the application he is running or an interface the grid administrator provides, the portal. The portal is a web interface that enables the user to submit jobs for execution on remote computing resources, monitor their execution by viewing job status, manage their execution by suspending, resuming or cancelling them, view job history, visualise the executed jobs (e.g., simulation results) and manage user accounts and their access rights. The term job in grid computing includes simple processes, such as a database query, or complicated processes such as simulation of a car crash or visualisation of a car virtually assembled by its various components. 

Once the user submits the job in the Grid, the Execution Management Services (EMS) take action. According to OGSA, the Open Grid Services Architecture [Foster et al., (2005)], developed by the Global Grid Forum (GGF), the EMS are responsible for finding candidate locations for the execution of the requested job, selecting the location based on different policies or service level agreements, preparing the environment at the selected location for the execution, initiating the execution of the job (e.g., registering the service to other services) and managing the execution of the job. Thus, after the job has been submitted by the user to the system, EMS perform resource discovery and reservation for the timeframe the user has requested. At this point issues such as priority, job queues and current allocation of resources (workload) determine the scheduling of the job. Ideally the system must be able to match peaks of some resources usage with lower usage of others so that the resources are being exploited in the best possible way and the jobs submitted are best-executed so that the user gets the impression that the job is being executed locally. After the resources that satisfy the SLA are found and are reserved, these resources are prepared – if necessary – (e.g., installation of a program on the machine the job is to be executed on) and then the job execution starts.

In general, while being processed, a submitted job may require the transfer of data, as well as the access to data from different locations, which may be heterogeneous and encounter various protocols. The Data Management Services are responsible for these operations. While the job is being executed, there is a chance that execution failure occurs due to a number of reasons, including network failure, program crash, data server going down. Thus, fault tolerance mechanisms need to be implemented. The action to be taken may be communicated by a Policy Manager Service, which stores and manages the policies. The Execution Management Services can reallocate resources and continue the execution of the job to the new resources or attempt to reallocate the initial resources. Moreover, the Data Management Services may offer data replication - multiple copies of the data (replicas) being maintained, which must be synchronised - so that in case of inability to access a database, a replica is being accessed and the execution of the job continues unobstructed. Furthermore, balancing of overly utilised storage with under-utilised storage improves the performance of the system.

There is a chance that an SLA violation occurs while the job is being executed, such as the time the service required for getting back up and running was more than the one in the SLA contract or more disk space was used. The system must then decide according to the Policy Manager Service and the agreed-upon SLA what actions it should take, such as stop the execution of the job or charge the user for the extra resources (penalty) according to the charging scheme applied to the user. 

As far as the charging of the grid usage is concerned, the SLA Management Service must have collected the data related to the consumption of the resources the metrics for which have been defined in the SLA by keeping accounting records. Various charging schemes can be applied. These can be based on the service usage or be fixed. Moreover, discounts for provision of lower quality of service as well as penalties in cases of SLA violation can be taken into account [Yeo and Buyya (2005)]. It should be noted, however, that the system is only responsible for maintaining the accounting records and applying the respective charging schemes, whereas real money transactions take place using the normal channels (credit card transactions, invoices in the mail, and other payment schemes).

 

Security Aspects of Grid Computing

As stated in the introduction, Grid Computing is one enabling technology for AmI spaces. A lot of sensor collected data will be processed for profiling using Grid Computing. Therefore it is important that the processing of this personal data happens in a secure and privacy preserving way. 

As described in the technical section, one of the main visions of Grid computing is that the computing power of the Grid should be provided to its users in pretty much the same way as electrical power is provided by the power grid. Especially the user needs not to care about who is providing the resources. Therefore one can not assume that any kind of trust relationship between a Grid user and the Grid resources providers exist.

Hence a fundamental security issue is that the user of the Grid computing wants not only to protect the data which should be processed by the Grid resources against some outsiders (eavesdroppers on the network links etc.) but also against the administrators and owners of the Grid resources. 

In general security is one part of the Open Grid Services Architecture (OGSA) [Foster et al., 2005] as developed by the Open Grid Forum (OGF). But regarding the OGF documents and specifications one has to differentiate between identified security requirements, offered security mechanisms according to the specifications and actually implemented features. The latter is analysed using the Globus Toolkit [Foster et al. (2006)] as reference implementation.

Most of the described security requirements, features and mechanisms are related to authentication and authorisation. The most fundamental assumption is that each user and principal will have a Grid-wide identity that all the other Grid principals can verify [Humphrey et al. (2003)]. This means especially that the anonymous usage of Grid resources is not intended. For the use case of Grid computing within (classical) AmI spaces this is not a serious drawback, as the AmI space service provider is the user of the Grid - and not the person who enters a certain AmI environment.

But it will become a problem for privacy enhanced AmI space if techniques like user centric identity management are used and the necessary processing of data is outsourced from the personal devices of the user to the Grid. In this case the user of the AmI space will also become the user of the Grid. Therefore all the profiling techniques described in the FIDIS deliverable D7.2 could harm the privacy and protection of personal data of the user. Hence the further development of the OGSA specifications should consider the anonymous and pseudonymous usage of Grid resources. 

In [Humphrey et al. (2003)] different security related scenarios of Grid usage are illustrated. From these scenarios the OGSA security requirements for Grid computing are derived. Most of these scenarios deal with the problems of protecting the Grid resources (or the Grid at large) against malicious users (outsiders). Although it is an important prerequisite for achieving confidentiality and integrity of the processed data, it is not sufficient (as explained above).

The authors of [Humphrey et al. (2003)] are (to some extent) aware of this fact. Therefore they have specified that “[a]n individual Grid user should have the capability to constrain the manner in which she interacts with the collective Grid services.” Implications of this requirement are:

  1. “Services must recognise the rationale for per-user security configuration and be designed accordingly. 

  2. There must exist an easy mechanism for users to specify such constraints. 

  3. There must be a secure and efficient mechanism to propagate or otherwise convey a particular user’s integrity and confidentiality parameters from the user to the services.” 

In an extension to this [Humphrey et al. (2003)] identifies also that “a user may want to specify that certain files be encrypted or all the data at a given site be encrypted. The user may also wish to specifically mandate that a server that acts on her behalf store all data related to her encrypted.”

Even though these requirements supporting the usage of the Grid in a privacy preserving manner they still miss some important points. First of all the confidentiality of the processing of the stored data is not covered. Secondly it remains unclear who can decrypt the encrypted data - only the Grid user or also the administrator of the Grid resource?

If it comes to the specifications of the actual architecture, then it turns out that the requirements mentioned above are not implemented rigorously. Besides authentication and authorisation - which constitutes the main part of the security specifications - only transport level security is considered. This will offer confidentiality against outsiders but not insiders.  

Also the Grid Security Infrastructure (GSI) - which is the name of the portion of the Globus Toolkit that implements security functionality - will not achieve more than simple access control of Grid users to resources. In [4] an example is given how GSI will work for that purpose. Moreover the paper claimed that “[f]or simplicity many details of the security process, such as […] privacy are omitted. These functions would be implemented similarly, with the hosting environments using OGSA services to provide the needed functionality.” 

Unfortunately a ‘simple’ adoption of the mechanisms and techniques used for authentication and authorisation for achieving privacy and protection of personal data is not possible - especially if protection even against the providers of the resources is a requirement. Hence even ‘plain’ confidentiality of the data processed within the Globus Grid is an open issue and only rudimentary solved by implementing (optional) transport level encryption. 

Summing up one can state that secure and trustworthy computations with personal data without privacy risks are not possible using today’s Grid architectures. It seems that protection against the operators of the Grid infrastructure and the various resources is not a big issue for the OGSA. Nevertheless solutions for that problem are thinkable by the means of cryptographic technologies like secure multiparty computation or the application of Trusted Computing. 

Another open issue is the conflict between ‘transparency’ of the Grid i.e. that a Grid user need not (or even could not) know who is providing the resources, where the data is etc. and the requirements by law, when it comes to the processing of personal data.  

In [Grimm et al. (2006)] it is illustrated using a medical research Grid application as an example. The patients taking part in the related research have to consent to the processing of the personal data - and they have the right to revoke this. This revocation requires that all data is deleted in a comprehensible and verifiable way. But this requires that one knows all the involved Grid resources - which fundamentally violates the transparency vision of the Grid. One needs secure logging and auditing to reproduce the spread of the personal data. Even though this is also relevant for OGSA (for accounting reasons) it is unclear how this logging could be done so that the logs or audit trails could not be manipulated.

The processing of personal data might not only imply problems from a user’s point of view but also from the resource provider’s one. The processing of personal data may impose additional duties to him, e.g. to verify that the affected user has given his consent. 

One can conclude that the OGSA should be enhanced in a way that allows the formulation of policies which consider the type of the data processed by the Grid resources. This would allow a resource provider to express if he is willing (and able) to process personal data or not or could be used by a Grid user to inform a resource provider that he should handle certain data according to the rules for personal data.

Besides this one should rethink if the current data protection and privacy laws - made with centralised service providers in mind - are still appropriated for the case there the personal data is processed by a highly distributed system formed by hundreds of different organisations without any central control.

 

Peer-to-Peer network architectures

The overall goal of Peer-to-Peer (P2P) based systems is to provide (share) resources (like computing power, bandwidth or storage) with a high level of quality of service in a cost-efficient way to (with) its participants. A fundamental principle of the P2P paradigm is equality.

 



Figure : Client-Server based system

Figure : Peer-to-Peer based System

In classical client-server based systems (see ) a clear distinction of the roles and capabilities of the communicating entities could be made. Also a clear “direction” of the flow of information could be identified. One entity (which is called client) makes a request to another entity (which is called server). The server then processes the request and sends an appropriate response back to the client. Especially the clients never communicate directly with each other.

There are two fundamental problems with client-server based architectures centred on one server: scalability and robustness (availability). If the number of clients accessing a given server increases, then with nearly the same rate the resources of the server (in terms of processing power, storage and bandwidth) need to be increased to offer the same level of quality of service (e.g. in terms of latency) to its clients. This increase of server resources is connected with an (often non linear) increase of costs for the operator of the service. Also there may exist some ‘physical’ limitations, such as the maximum bandwidth the network connection could provide for a single server. 

Therefore very often the server is not a single instance but in fact is a distributed system which implements some kind of load balancing among the nodes which forms the distributed server system. In this way the quality of service offered to the clients can be kept at a high level. Nevertheless the whole service infrastructure is still operated by the service provider and he bears the burden for all of the costs. 

In a P2P based system (see ) every participant (also called node or peer) is simultaneously ‘client’ and ‘server’, meaning that it uses resources provided by others (acting as ‘client’) and offers resources to others (acting as ‘server’). In this way the costs of operating / offering a given service are not anymore exclusively by the service provider - but shared among the participants. The service provider itself needs only to operate a small part of the infrastructure which forms the whole P2P based service.

P2P based systems in general offers a better scalability compared to client-server based systems. They are also more robust, meaning that unavailability of some nodes will not result in unavailability of the whole service. Because of the ‘equality’ paradigm other nodes of the system can take over the tasks of the crashed nodes. Or in other words: the existence of clients requesting a given service implies the existence of servers offering the requested service.  

A P2P based system can be categorised by different properties: 

  1. if it is a pure P2P based system or a hybrid system

  2. how the data transmission is organised: single source download or multi source download

  3. how information is found: Centralised Directory Model (CDM), Flooded Requests Model (FRM) or Document Routing Model (DRM)

A pure P2P based system strictly sticks to the ‘equality’ paradigm, i.e. all involved nodes are absolutely identical. This kind of P2P based systems is the most robust system among the different types of P2P based systems. Such robustness also exists against unintentional failures of nodes (like crashes which let them disappear), as purposeful actions targeted for example to shutdown the whole network or to censor the available content.

In a hybrid P2P based system the ‘equality’ requirement is less stringent. In such systems one can distinguish different kinds of peers depending on the functionality they offer to the P2P network. The predefinition which node has to offer which kind of special functionality to the network could either be done statically (mostly at the time of deployment of the network) or dynamically (at the runtime of the network). In any case the proper work of the network depends on the existence of the special nodes.  


Figure : Hybrid P2P based system which uses the Centralised Directory Model for information retrieval

A typical example for a static hybrid P2P based system is a P2P network with some nodes dedicates as (centralised) directory server. The ‘normal’ peers register within the directory. If a ‘normal’ peer needs a given resource (e.g. a file) it asks the directory for locations of that file. The file transfer itself then is done by direct interaction between the ‘normal’ peers.  

The described mechanism of finding information is called Centralised Directory Model (see ). The well known file sharing service ‘Napster’ (in its original version) used this model. A disadvantage of the CDM is the limitation in scalability. Also privacy and data protection related problems arise. Because all search requests are managed by a centralised entity, this entity can build profiles of the peers, if no additional PET technologies are integrated into the P2P network.

Another property of CDM based P2P systems is related to controllability. The one who can control the centralised directory service can control which information is available to the ‘normal’ peers within the network. Depending on the application this could either be an advantage or a disadvantage of this type of P2P based systems. Also note that the very existence of the possibility of control may already burden liabilities (from a legal point of view) to the service operator. This was the case with the original Napster system. 

 


Figure : Pure P2P based system using the Flooded Request Model for information retrieval

Examples of the more dynamic hybrid P2P based systems are networks where nodes with outstanding capabilities (e.g. available bandwidth or computing power) are ‘elected’ to offer special functionality to the ‘normal’ peers. This special functionality could be being part of the search infrastructure within the network - functionality ‘normal’ peers do not offer. Another common special functionality is to connect nodes which can not communicate directly otherwise (for instance because of firewall restrictions). The well known VoIP service provider Skype™ uses these hybrid technologies. They call their special nodes ‘supernodes’. 

Besides the already discussed CDM for information retrieval (which implies a hybrid P2P network), the Flooded Requests Model was developed for pure P2P networks (see ). The basic idea is that each search request for a given piece of information would either be answered by a given node or this node will forward the request to all its direct neighbours. These neighbouring nodes will do the same so that in the worst case the whole P2P network gets flooded by the search request. Although this search strategy will always find the requested information (if it is available somewhere in the network) without the need for centralised directory nodes - the disadvantage of limited scalability overrules it.

The more peers joining the network and the more search requests they do, the worse the performance becomes and the more resources are needed. Therefore the FRM is suited only for relative small P2P networks with a limited number of participants. But the simplicity of FDM and the absence of complex routing or synchronisation protocols etc. makes it still appropriated for this case. 

The original version of Gnutella is a prominent example for a P2P network which uses the FRM. But it also shows the performance problems mentioned above. 

To overcome these scalability and performance problems of CDM and FRM on one side, but preserve the robustness of pure P2P based systems on the other side an alternative approach was developed. This approach is called the Document Routing Model and is based on distributed hash tables. P2P systems which use the DRM are often dynamic hybrid systems. 

The basic idea of DRM is that each peer has a random number (NodeID) assigned to it. Every content (or piece of data) which should be published gets also a number (ContentID) assigned to them (usually this value is computed by applying a hash function to the data). The peers whose NodeID are ‘sufficiently similar’ to the ContentID are responsible for storing the content (or more general for storing information about the location of the content). If now a peer receives a search request for a given ContentID it tries to ‘route’ this request to those peers he knows about whose NodeID are ‘close’ to the ContentID. 

The average search performance of DRM is O(log N); N is the number of peers (with a worst case performance of O(N)) making this a mechanism which scales very well with a growing number of peers. 

After some piece of data was located, it needs to be transferred to the requesting peer. In a simple case this is done by downloading the whole information from exactly one peer. This is called single source file transfer. The biggest disadvantage of this technique is that the download speed is limited by the upload bandwidth of the offering peer. This becomes especially a problem in P2P based systems where the overwhelming majority of peers are normal home users with their asymmetric Internet connection lines (such as ADSL).

To overcome this limitation the multi source file transfer (MFT) was developed. In its simplest form a peer just downloads different parts of the requested content form different peers. More sophisticated algorithms try to optimise the way the initial distribution of content is organised to ensure that the content becomes quickly available to all interested peers without overloading the initial contributor. 

gives an example for these techniques. In a first step peer A sends parts of a file to peer B and peer C. Note that these are different parts of the same file. In a next step peer B and C can exchange their parts. Moreover A could also send an additional part to one of them. If later peer D takes part in the network, it can download simultaneously three parts of the file from A, B and C.


Figure : Multi source File Transfer within a P2P based system

A usual problem with P2P based systems which deals with content distribution is that often the peers are interested in downloading some data but not willing to provide upload resources. As a solution to this problem P2P networks can implement ‘payment’ (economic) or reputation systems. Various kinds of these systems exist. One of them is called ‘Tit for Tat’ strategy. It means that the amount of available download resources (e.g. in terms of bandwidth) depends on the upload resources a peer provides to others. 

One of the relevant P2P networks and protocols which implements multi source file transfer and uses the ‘Tit for Tat’ strategy is called ‘BitTorrent’. The BitTorrent network belongs to the category of hybrid P2P based systems. The basic idea is that all peers who are interested in a certain file form a logical group, which is called a swarm. A member of a given swarm is either a seeder or a normal peer. A seeder owns the file whereas normal peers only have parts of them. In order to coordinate the seeders and the normal peers a third component called tracker exist. A tracker manages information about which seeds are available for a given file and which peers have which parts. The tracker intelligently responds to requests done by the peers for parts of the file ensuring that all parts of the file are evenly distributed among the swarm. This intelligent management of swarm resources enables a BitTorrent network to distribute content very quickly and robustly.

The BitTorrent network itself does not offer any search capability. If a client is interested in certain content it needs a special file called .torrent which contains information about the tracker and some other meta-information necessary to start the download.

In contrast to many other P2P based system for content distribution (or file sharing) there does not exist one big BitTorrent network where all the content is distributed. Rather there will be created temporary swarms which then work together to share the content. The tracker is always in the position to pinpoint which content it will support and which not. 

The last-mentioned property makes even ‘serious’ companies very interested in this technology. The computer game producer Blizzard uses BitTorrent to distribute beta version of their game ‘World of Warcraft’ among interested testers. Also all well known Linux distributions use BitTorrent for spreading their software. A new trend is the legally compliant distribution of media content (TV shows, movies etc.) using P2P based technologies, especially BitTorrent. The company Azureus Inc. achieved an agreement (content partnership) with the BBC Worldwide Limited (a subsidiary of the British Broadcasting Corporation (BBC). This agreement allows Azureus Inc. to offer BBC content (comedies, dramas etc.) to the users of its new BitTorrent based digital media platform called Zudeo. 

The widespread usage of the BitTorrent protocol is supported by many different activities like integration of a BitTorrent client into the Opera Web-Browser, developing of special hardware chips enabling even limited (embedded) devices to utilise it etc.  

Nevertheless BitTorrent and other P2P based content distribution systems are still stigmatised as illegal and for supporting of unlawful copyright violations. From a regulations and legal point of view it is absolutely necessary not to forbid or regiment a whole technology but rather use cases of certain applications of this technology. 

As argued and explained so far, P2P based systems and especially networks are emerging technologies which are particularly useful for distributing content in a very cost-effective way. Therefore they are one candidate for the architecture of the communication backbone in upcoming AmI spaces. 

Hence it is necessary to have a closer look at the identity and identification, privacy and data protection related risks linked to this technology. They are mainly the same as the ones mentioned in the Grid computing section: if the data is not appropriately secured (e.g. encrypted) then it could be revealed to all participating peers. Note that encryption or integrity protection is not a build-in feature of most of today’s P2P based systems. Besides these confidentiality and integrity problems a peer (or a colluding group of them) could even try to profile a user. This is especially an issue with hybrid P2P systems, where some special (centralised) nodes are in a good position to do this. 

More research is necessary to develop real privacy enhanced P2P based systems. But this requires that the society accepts P2P as a useful technology and will not hinder privacy researchers to enhance P2P systems. 

 

Supporting Emerging Technologies  FIDIS_D12.2_v1.0.sxw  Context aware software and systems
12 / 26