Markup Languages and Ontologies

XML and Semantics

The eXtended Markup Language is accepted as THE emerging standard for data interchange on the Web. XML allows authors to create their own markup (e.g. <AUTHOR>), which seems to carry some semantics. However, from a computational perspective tags like <AUTHOR> carries as much semantics as a tag like <H1>. A computer simply does not know, what an author is and how the concept author is related to e.g. a concept person. XML may help humans predict what information might lie "between the tags" in the case of <trunk></trunk>, but XML can only help. For an XML processor, <trunk> and <i> and <bookTitle> are all equally (and totally) meaningless. Yes, meaningless.
This has direct consequences for economy on the web.

Electronic Data-Interchange and E-Commerce

E-commerce requires rich data: Retailers require data to flow from wholesalers and wholesalers requires data to flow from producers. Data-exchange of this kind is currently very limited, consisting of tab-delimited dumps or product-specific tables. Specific XML formats for each exchange task improves the situation, but ones misses the network effect of being able to share 90% of the processing software, because the XML data model is too low-level. Approaches like ebXML or the Meta-Data Coalition try to tackle this problem by "aiming at the development of an open XML-based infrastructure enabling the global use of electronic business information in an interoperable, secure and consistent manner by all parties". However, all these approaches have to represent different terminologies used in different types of businesses. This leads directly to the use of ontologies:
For sharing information and knowledge (that means for interoperability) between different applications a shared set of terms describing the application domain with a common understanding is needed. More flexibility is gained, if not just a flat set of terms is defined, but also relationships between these terms. This helps an application to at least partially understand the domain and adds to its flexibility. Such a set of terms is called an Ontology. A DTD can be regarded as a very primitive ontology, however, it is basically just a set of terms and does not define the relationship between different terms (for a more complete description about the relationship between Ontologies and DTDs look at [Erdmann, 1999]). So more advanced applications require a more expressive ontology language. XML itself does not have the built-in primitives to define richer term sets beyond the expressive power of DTDs.

Ontologies

What is an ontology?
An ontology is a specification of a conceptualization.


Ontologies establish a joint terminology between members of a community of interest. These members can be human or automated agents. To represent a conceptualization a representation language is need. Several representation languages and systems are defined.. KIF-based Ontololingua, Loom, Frame-Logic are examples of representation languages based on First Order Logic, but with different expressiveness and computational properties.
However, for applications on the web it is important to have a language with a standardized syntax. Because XML emerges to be the standard language for data interchange on the web, it is desirable to also exchange ontologies using an XML syntax, thus simplifying the task of writing parsers. This requirement lead again to XML-based languages, defining a language on top of XML. Examples include SHOE, Ontology Exchange Language (XOL), Ontology Markup Language (OML and CKML), Resource Description Framework Schema Language (RDFS), and Riboweb. All of them use XML syntax, but with slightly different tag names.
A new proposal extending RDF and RDF Schema is OIL (Ontology Interchange Language). RDF and RDFS are already in use in the library community and has good chances of becoming an accepted standard. A successor of OIL is DAML+OIL, jointly developed by a group of european and us-american scientists. Jim Hendler provides an extended, detailed introduction to using ontologies in marking up web pages with examples from SHOE and DAML+OIL.

Publications about Ontologies:

Research Groups:

Other Resources

SENSUS is a 70,000-node terminology taxonomy, as a framework into which additional knowledge can be placed. SENSUS is an extension and reorganization of WordNet and reachable (including other Ontology related projects at ISI) via http://mozart.isi.edu:8003/sensus2/ Other ontology projects are reachable via http://www.isi.edu/natural-language/projects/ONTOLOGIES.html

Ontology and Metadata Editors

The creation of joint ontologies for a number of agents is a challenging task. Distributed development of ontologies e.g. needs tools for synchronizing between a number of agents. But to acquire an ontology from only one agent is also difficult, because is means to make explicit something that is usally just implicit. Examples of Ontology Editors are Protégé, supporting Ontology- and Knowledge Acquisition from a single user. An example for a Web-based Ontology editor is Webonto, which supports the joint creation of ontologies over the web. An example for an ontology editor, that already supports RDF Schema is http://paranormal.se/perl/proj/rdf/schema_editor/___welcome.html.

OilEd is a simple ontology editor which allows the user to build ontologies using OIL. The intention behind OilEd is to provide a simple, freeware editor that demonstrates the use of, and stimulates interest in, OIL. OilEd is not intended as a full ontology development environment - it will not actively support the development of large-scale ontologies, the migration and integration of ontologies, versioning, argumentation and many other activities that are involved in ontology construction. Rather, it is the "NotePad" of ontology editors, offering just enough functionality to allow users to build ontologies and to demonstrate how we can use the FaCT reasoner to check those ontologies for consistency.

Ontology-aware metadata tools can simplify the creation of metadata for resources available on the web considerably.
A simple Metadata Editor is the Reggie Metadata Editor - Java based Metadata editor created by the Resource Discovery Unit of DSTC that exports HTML 3.2, HTML 4.0 and RDF. Protégé supports the generation of user interfaces based on ontologies, which makes it ideal for entering ontology based metadata.

Ontology Interoperability

Once there are a number of ontologies in the WWW, it is desirable to make them interoperable. Thus one has to define mappings between different ontologies. This problem is tackled in a number of projects, e.g. the Stanford Scalable Knowledge Composition (SKC) project and the Bremer Semantic Translation project.

[Next]