You are here: Resources > FIDIS Deliverables > Identity of Identity > D2.1: Inventory of Topics and Clusters >

D2.1: Inventory of Topics and Clusters

Title:
DEFINING THE ONTOLOGY CONCEPT

Defining the Ontology concept

The specification of conceptualisation in History

The explicit and unambiguous specification of the concepts (i.e. the definition of ontologies) of a given domain has been the object of a lot of attention along History.

First in the Greece antiquity, the philosophers had come across the need to conceptualize the concepts in their aim of better understanding the nature of the Being. They invented term Ontology that they defined as the branch of metaphysics relating to the nature and relations of being. At this time this conceptualization was mainly done through writing and discourse. Since then, Ontology has in various times received the attention of the philosophers.

Then in the Middle-Age and later at the Renaissance, people have began to more systematically and explicitly specify the conceptualisation of a domain by using dictionaries and encyclopaedias (the reference first reference to the term dictionary be traced in the 13th-century, and the modern Encyclopaedia can be dated at the beginning of the 16th-century). Dictionary and encyclopaedia represents a way of specifying a conceptualisation that is based on definition, in alphabetical order, of the terms or words of a domain (dictionary), or on the subjects of a domain (encyclopaedia).

In the 19th century, classification played a key role in Natural Science, and one can cite the work related to the classification of species of Lamark, Buffon and Darwin that played a considerable influence in this area (and is at the root of genetics). Classification relies on the idea of conceptualising a domain based on the identification of a set of characteristics that can be own by an object and that is usually hierarchically structured (example of classification: the library classification of subjects; or the classification of species in biology).

Computer Sciences has shown early an interest in the very explicit specification of concepts. The aim was at making the specification of concepts comprehensible by machines. For instance, as a necessary condition for conducting automatic operations and reasoning, the domain of Artificial Intelligence or Advance Computing started early trying to define explicit and formal specifications of knowledge (Aiii, 2004): Examples include Allen Newell’s research on symbolic computation in the mid 50ies, then Ted Nelson’s invention of Hypertext in the 60ies, then Marvin Minsky with the introduction of the concept of Frames in the 70ies, and later Douglas Lenat with his work on the Cyc framework aiming at representing common sense in the 80ies.

More recently with the advent of the Internet, the Computer Sciences field has generated a lot of activities around the Semantic Web (Berners-Lee, Hendler and Lassila, 2001). In this context ontology work relies mainly on the idea of conceptualising a domain in term of objects and semantic relationships. This trend towards the semantic web has dynamised research and use in this domain, even to the extent of creating dedicated standards (such as OWL) for representing ontologies.

In a parallel track, knowledge construction and categorisation has flourished, and new approaches have been invented such as combination hyper-textual and collaborative knowledge construction which is best exemplified by Wiki systems.

What is an “ontology”

As we have seen previously, we define ontology (with a small ‘o’) as a particular specification of a conceptualization of a given domain (Gruber, 1993), and Ontology (with a big “O”) as the discipline which is concerned with the study of ontologies.

Ontologies represents a broad concept that can take a variety of forms of diverse degree of complexity ranging from simple textual descriptions (aiming at the explicit description of a set of concepts), inventories of terms and their definition (such as dictionaries and encyclopaedias), taxonomic hierarchies (such as classification organised as trees), and in the more complex case semantic networks (set of objects connected with one another with relationships) which structure is specified in a meta-model (specifying the structure of the objects, and providing some categorisation).

The different categories of ontologies differ in the level of deepness and formalisation they adopt for specifying the conceptualisation.

In the case of the simple text-based ontology, the level of formalisation is usually reduced to a minimum, and the description of the structures is totally implicit (in the best case the text provides a structure helping to make the organization of the concepts visible). More advanced ontologies are based on structural representations (lexical, syntaxic or semiotic). For instance the knowledge can be organised into synonym sets, each representing one underlying lexical concept. This later approach is typically adopted by the more advanced dictionary approaches (such as the Worldnet system). In that case however, the structure is not used to represent the semantic. Classifications (taxonomical) start to use the structure to capture certain aspects of the semantic. For instance the different objects belonging to a same branch share some identical properties. Finally, the more sophisticated forms of ontologies (the ones that are promoted by the Semantic Web, and which aim at being interpreted by machines) operate directly at the level of the semantic. In this case “semantic ontologies” specify the concepts in term of semantic definitions of the objects that intervene in the domain and of the semantic relationships that hold amongst them. Besides, in order to make this specification more explicit, a distinction is generally done between the semantic network describing the domain (called the instance Ontology) and the meta-model used to describe the different classes of objects and relationships that form this network.

It appears legitimate now to ask the question of the most adequate form of ontology to use to specify a conceptualisation, in particular when you consider the important difference in complexity between the different forms of Ontologies. Indeed, very explicit conceptualisations like the semantic one can require a considerable amount effort that may not be justify, and can even in some case prove counter productive (by reducing the flexibility in the case of a domain continuously changing). On the other hand too shadow conceptualisations like the textual and can be ambiguous, and lead to partial understanding. They can also be more difficult to exploit by the information technologies.

The answer is not simple and varies according to the expectations in term of the quality of the ontologies (for instance their level of deepness and completeness in the specification of a conceptualisation), the nature of their exploitation (are they to be used in the context of an information system infrastructure?), but also the size of the domain to be conceptualized and the effort and expertise available for their design.

Actually, the different approaches for conceptualising a domain can be considered as complementary. Simplest ontologies (typically the textual ones) can be used in a first stage to clarify the domain. More sophisticated conceptualisations (categorisation, semantic representations, etc.) can later be used once the domain is better understood (thanks to the previous conceptualisation).

However, some too elaborated conceptualisation are not always desirable. They can indeed be unnecessary complex and create rigidity (for instance when a domain is still in a stage of continuous evolution). They may also require tools and computer resources that are currently not available. For instance, some applications (such as data mining or machine learning applications) may have to manipulate huge amounts of data, making the use of in-depth ontologies unpractical and unnecessary.

What is an Ontology used for

It is important to question the reasons why Ontologies are built, and indeed what the usages of the Ontology actually are.

One of the main objectives of Science is to create very explicit models of the functioning of the world, that can later be applied without any other necessary or hidden knowledge (so that a phenomenon can be deterministically reproduced). Ontology work, by providing explicit and well understood definitions of the concepts, facilitates the specification of scientific models and in particular, contributes to the description of the scientific models in a way that is concise (no need to describe concepts that have already defined) and unambiguous (reducing the risk of multiple and / or erroneous interpretation).

The second reason, that is at least as important, is the creation of a common language that facilitates the sharing, exchange and reuse of knowledge amongst the community of people (researchers, practitioners, final users) who deal with these concepts. For instance, it is generally accepted that well formed ontologies can significantly improve communication and provide a basis for shared understanding and reuse of information (Ushold and Gruninger; 1996; Clark & Brennan, 1991).

Finally, another function of Ontology work is to provide a mechanism to stimulate the construction of knowledge in the community, and a means for this community to develop its identity. Most specifically, making more visible the most important concepts and making people use the same terms to refer to them (versus many different terms to refer to the same concept) help to the development of a sense of belonging to this community and resemblance. This sense of belonging is important, since it leads to important outcomes by increasing organizational citizenship behaviour -loyalty, civic virtue, altruism, and courtesy-, motivation and commitment, and can be associated with involvement in community activities (Blanchard and Markus, 2002). Resemblance can contribute to the establishment and development of people relationships, since people tend to establish relationship with other people that are similar to themselves (Berscheid and Reis, 1998).

Note: As a consequence of this support for the community process, an Ontology should not be considered as a finished and static piece of information, but a living information body that is continuously growing (and in particular integrating new terms and concepts as they emerge) and adapting to the evolution of the focus of the community that is using it. It is therefore important that the mechanisms for adding vocabulary to this language are not seen as an external component of this ontology, but they are, on the contrary, directly built into them.

Ontology design

The ontology design challenge

The design of good ontological constructions is a non-trivial operation that typically requires a lot of time, effort and resources.

The design of Encyclopaedias and dictionaries are known to have represented a major effort, that had consisted in the identification of a very important number of terms and their definition or their illustration by examples (in encyclopaedia).

In Natural Science, the definition of classifications had required decades of work, and successful models (such as the classification of species) were considered as the major achievements of some famous scientists (Buffon, Lamarck, etc.). Scientist went through a laborious and extensive collection of data about animals that were later classified using empirical methods (typically finding similarities in the data that were collected).

More recently, Computer Sciences, and more particularly the semantic web, has defined some more systematic approaches, methodologies (Holsapple, and Joshi, 2002; Guarino and Welty, 2002, Denny, 2002; Prieto-Diaz, 2002; Noy and McGuinness, 2001) and tools to conceptualise a domain in a way that is interpretable by computers. However as Guarino and Welty (2002) points out, “The process of building or engineering for use in information systems remains an arcane art form” that would need to become a rigorous engineering discipline. (Missikoff, Navigli and Velardi, 2002) report some experiences in the designing of domain ontology taking several months. It also stresses the importance of designing Ontologies good enough to be usable and that have in particular a good level of: coverage (level of completeness), consensus (agreed upon), and accessibility (easy to use). Finally Noy and McGuinness (2001) acknowledge that “there is no single correct ontology-design methodology”.

To conclude, it appears impossible to identify an agreed and usable Ontology design methodology. In the next paragraph we will therefore give some indications of an approach for building Ontologies that we belie could be used in the FIDIS context.

The design process

Denny (2002) proposes the following steps to build an Ontology domain.

Acquire domain knowledge.
Assemble appropriate information resources and expertise that will define, with consensus and consistency, the terms used formally to describe things in the domain of interest. These definitions must be collected so that they can be expressed in a common language selected for the ontology.
Organize the ontology.
Design the overall conceptual structure of the domain. This will likely involve identifying the domain’s principal concrete concepts and their properties, identifying the relationships among the concepts, creating abstract concepts as organizing features, referencing or including supporting ontologies, distinguishing which concepts have instances, and applying other guidelines of your chosen methodology.
Flesh out the ontology.
Add concepts, relations, and individuals to the level of detail necessary to satisfy the purposes of the ontology.
Check your work Reconcile syntactic, logical, and semantic inconsistencies among the ontology elements.
Consistency checking may also involve automatic classification that defines new concepts based on individual properties and class relationships.
Commit the ontology
Incumbent on any ontology development effort is a final verification of the ontology by domain experts and the subsequent commitment of the ontology by publishing it within its intended deployment environment.

Whilst this approach can be a source of inspiration, we believe this approach is not totally appropriate in the case of FIDIS, for which the domain is still considered as relatively fuzzy, and subject to evolution.

Besides, in WP 2, we are interested to make use of an approach which is more consistent with the WIKI tool, which we believe can appear to be particularly well adapted for the collection of resource from the all FIDIS community.

The construction process of the ontologies that we propose consists in a series of iterations involving the following operations:

Identification of terms, concepts, etc. (corpus analysis)
Categorisation of these terms
Definition of their semantic
Illustration (situating them with examples)
Some tentative formalisation of the conceptual relations

The Identification of terms, concepts, etc… (corpus analysis)

The identification process of terms and concepts consisted in going through some content material belonging to the domain, and extracting the vocabulary of the terms that are the most significant.

The categorisation

The objective of the categorisation of the different terms, and the identification of the main concepts was the creation of a tree-like cognitive map of the domain, connecting the different terms with one another. The tree representation is very intuitive and user-friendly. This type of representation allows user an easy navigation among the terms and an understandable way to handle the relation between the terms with one another.

The definition of the semantic

Each identified term and concept is progressively defined. This definition consists of providing a basic definition, structuring it, and beginning to link the term with other terms that appear to be related. The semantic is closely related to the term use context. This definition process is done via the extraction of content from different material, and from the active and collaborative contribution of the different participants.

The illustration (Situating the concept in concrete contexts)

As part of the definition process, a set of illustrative examples are provided in order to better link the term or concept to a concrete context, and thus making clearer to the reader what the different associated identity issues are.

Formalisation

Some tentative formalisation can be initiated in order to homogenise the definitions of the different terms and their relation with one another (and for instance, identifying the conceptual relation between the different terms). The formalisation however is expected to emerge (rather than be imposed from the outset), as more definition of terms are incorporated.

Some tools (ontology editors, WIKI, etc.)

Different tools can help this process of collection and specification.

7 / 29