Relational structure
The usefulness of a well-structured domain-specific thesaurus for the management of information is rather acknowledged. However, there is a widespread opinion that the traditional thesaurus format doesn’t completely fit the current needs. One of the main problems posed by thesauri seems to be the fact that they provide a poorly differentiated set of relationships between terms, distinguishing only among hierarchical relationships, associative relationships and equivalence relationships. In our thesauri, a more refined set of semantic relationships is being implemented. Standard relationships are enriched by attributes, whose semantic content is specified. The augmentation of thesaurus relationships will ensure a stronger semantic control, also because different relationships can hold each other in check, and open up new possibilities for information retrieval applications. The enrichment of the relations and their increased semantic clarification could enable, for example, a better semantic description of Web resources and guide a user in meaningful information discovery on the Web. Besides, it will increase the possibility of using them also for artificial intelligence applications.
Hierarchical structure
Thesaurus standards and the scientific literature include three kinds of hierarchical relations: generic, partitive and instance, which are conflated into one generic hierarchical relationship. Perhaps this is the most misused relation. Many existing thesauri, claiming to be ISO standard consistent, provide relations that are labelled as BT/NT but they could be better interpreted as associative relations. They are, in fact, based on a document-retrieval definition of broader-narrower that is of pragmatic nature and oriented towards the function of the search process. In EARTh only hierarchies that are logically based will be included. Moreover, we will differentiate the different types of relations and as a second step subtypes will be identified.
Associative relation is quite difficult to describe because it covers a heterogeneous and undifferentiated set of relations. ISO 704 defines it as a relation that exists when a thematic connection can be established between concepts by virtue of experience. It can express many kinds of association between terms that are not hierarchically based. Such links should be made explicit in a thesaurus since they suggest additional terms that can be used in indexing or retrieval. In our thesauri, we are trying to specify the nature of the relations and to differentiate RTs in subtypes (i.e., cause/effect, raw material/product, discipline/practitioner, etc.). In this way, by strengthening the transversal relational structure, which is based on associative relations, a knowledge representation model that is net-like structured is being developed. It will emphasize the system of interrelations, the connecting ties that limit the degree of separation of a conceptual field and cannot be represented by the taxonomic-hierarchic tree-like model. In our case, this is very important also to obtain a system able to deal with the environment, which is a domain where the complexity of the systems as well as the web of interlinking, plays a key role.
Equivalence relationship covers at least the following basic types: synonyms, lexical variants and near-synonymy. Synonymy refers to meaning similarity. It has also been defined as interchangeability between terms, although it is very difficult to think about the existence of an absolute or perfect synonymy. Classes of synonyms include, for example, dialectal variants, popular and technical term pairs, generic and trade name pairs, different linguistic origin variants, variant names for emergent concepts, slang or jargon synonyms etc. Lexical variants are different word forms for the same expression and derive from morphological and grammatical variations (i.e., orthographic and syntactic variants). For synonyms as well as for lexical variants we will try to identify different subtypes. The category of near-synonyms as such isn’t included at this stage in the system.
Thematic structure
In EARTh a thematic organisation of terms has been elaborated. A theme or a subject is here conceived as a sector of interest that reassembles the terms related to it, while a tree or faceted structure tends to scatter them under their referral logical category. We have developed a thematic classification that has been utilised to classify the terms and that could also be used for the management of information in the field of research, environmental policy, dissemination to users. From a semiotic point of view, this model should allow meaning representation according to different second-order perspectives and acceptations. The possibility to apply additional classification models would ensure, in fact, openness and flexibility to the model. The RT relation can, of course, help in expressing additional semantic traits.