Warning: This page has not been updated since mid 2003; though it may still be somewhat useful, the interested reader is referred to my Database Project Homepage for all the latest work on the SEEK project.
I began this webpage just to avoid losing track of URLs, and to help keep straight in my own mind all these websites, papers, projects, languages, and systems, their relations with each other, and with my own work. The first section below lists and briefly describes some literature I found, mostly on the web, while the second section discusses workflow languages and related software, systems, and theory. The third section describes some relevant results from the social sciences, and gives some tentative principles to consider when building tools for this area. I plan to gradually elaborate all these sections, and might even add some others. Suggestions for additions, corrections, or improvements, as well as any other comments, are very welcome.
The literature related to this project one way or another is so diverse and confused, that it is difficult even to roughly classify it into areas. However, the subsections below represent an attempt to do this with ontologies and related topics, starting with two projects that applied ontology technology to ecology. Subsequent subsections discuss variousf classes of languages and tools that have been proposed for knowledge management, while the final subsection describes some of my own work that is somehow related.
Some interesting work on ecological informatics has been done in collaborations between Brazil and the Software Systems and Processes Group of the AI Department at the University of Edinburgh. The first two items below describe two such projects, the third gives some positive results, while the fourth and fifth focus more on negative results, and the sixth describes some infrastructural support that they developed. Some later, more theoretical research from the Edinburgh group is described in section 1.3 below.
An enormous number of languages, systems, and logics have been proposed for use with information resources (database systems, the web, etc.). Some have a syntax but no semantics, while others have both in a quite rigorous way. An important trade-off that needs to be negotiated is between expressiveness and effectiveness; in general, the more powerful a language is, the more difficult it is to implement effectively. For example, in full first order logic, many important problems are not decidable. Moreover, even when problems are decidable, they can have a very high complexity. Another very important problem is how all these systems relate to one another, and even how to talk about how they relate. A main point will be that all of them are (or should be) institutions (see section 1.4), and that relations among them should be given by institution morphisms.
It is useful to bear in mind that no logic can possibly capture "meanings" in anything remotely like the human sense of that word. For example, a given database might contain a social security number, the number for a secret Swiss bank account, and its balance; their meaning becomes quite different if this data belongs to an FBI agent, a criminal, a terrorist, or an ordinary citizen. No logic can even capture what it means to be a social security number, let alone all the associated nuances, such as how one can be obtained, how it can be used, its historical development, and its cloudy future. Although this seems relatively obvious (at least, to me), it runs contrary to much hype that one sees about the "semantic web" and similar ambitious schemes.
It appears that (at least) three different communities are developing "ontology languages," though they often use different terminologies and tend not to refer to each other's work. One community is that of the semantic web, centered at w3c. A different community is more concerned with databases, and tends to speak of "conceptual modelling languages" rather than "ontology languages," while a third community carries on traditions from AI in knowledge representation, and so tends to speak of "knowledge management" etc.
KIF is from this third community, and was developed at Stanford by Michael Genesereth. KIF is an acronym for "Knowledge Interchange Format," and it contains all of first order predicate logic, plus some "meta-knowledge" features, and some standard data structures. It was originally intended as a standard interchange format for knowledge bases between computers, rather than as a language to be implemented and/or to be used by humans, and its syntax is based on that of Lisp; however, it has a nice formal semantics, and has been used to give formal semantics to other languages. Its great expressive power is also what makes it difficult to implement and inefficient to use. Another language in this family is Stanford's Ontolingua, which again is highly expressive and correspondingly difficult to process.
An interesting family is that of description languages, or description logics, since in general these are proper subsets of first order logic, often the Horn clause subset or something closely related, since this is easily implemented, e.g., with Prolog (though negation can be a problem, since the semantics of negation by failure is not classical; also, you must ignore the blatantly non-logical features of Prolog). There is a Description Logic Community website, which lists 16 different current systems (each with a different language), several hundred research papers, and other resources. Languages in this family subsume many classic (but now extinct) AI knowledge representation languages, e.g., based on frames, semantic networks, and KL-ONE-like features. The downside is that you can't do very much with such systems; mainly they can keep track of classification hierarchies. The best implementations for languages in this family seem to be DLP, FaCT, and RACER (for "Renamed ABox and Concept Expression Reasoner"); all these are recent, sound, complete, efficiently implemented, and fully formalized (unlike many other systems). Cambridge University Press has just published the Description Logic Handbook, which seems quite comprehensive.
Among languages termed resource description languages, perhaps the most often mentioned are RDF, DAML, OIL, OWL, and Topic Maps. The first 4 of these are standards, while the last is a system; all are intended to be used over the web, but Topic Maps is oriented towards individual users. RDF (for "Resource Description Framework") is a w3c standard (like XML), whereas DAML ("DARPA Agent Markup Language") and OIL (for "Ontology Inference Layer") have been defined by the DARPA community; these have been combined, as DAML+OIL, and given precise semantics, both model theoretic and logical. An Axiomatic Semantics for RDF, RDF-S, and DAML+OIL, by Files and McGuinness, gives 137 axioms in KIF for an older version of DAML+OIL. DAML builds on RDF-S, a small extension of RDF, and OIL builds on DAML. The DAML site includes a model theoretic semantics for DAML+OWL. OWL is a newer w3c standard, derived from DAML+OIL, and built on RDF; it seems to be the most recent and promising member of this family. A comparison is given by Lars Marius Garshol of Ontopia, in his paper Topic Maps, RDF, DAML, OIL; they also have a short summary Ten Theses of Topic Maps and RDF. However, one should be alert for commercial bias here, since Ontopia is the corporation that sells Topic Maps; moreover, Topic Maps does not seem to have a formal semantics, though it is said to have a good user interface.
The third family we consider commonly calls its members conceptual modelling languages. This includes ER, UML, CommonKADS, KADS22, CML2, UXF, ORM, and many others. OCML is a system intended to implement Ontolingua; WebOnto is a Java applet and custom web server which has been used to support OCML.
Michael Kifer has some very nice work on F-logic, which is implemented in the Flora-2 language. .....
Some mainly theoretical work on ontology mapping is being done in the Centre for Intelligent Systems and their Applications at Edinburgh, especially by Marco Schorlemmer. These and other papers can be downloaded from Schorlemmer's homepage and his publications page, at the Edinburgh School of Informatics (but not now working, due to their fire).
The first three papers are closely related to my work on "institutions," an abstract axiomization of the notion of "logical system," much of it with Rod Burstall of Edinburgh (see section 1.4). Local logics and Chu spaces are both special cases of institutions, and the duality is also a special case of the syntax-semantics duality that is built into institutions. Moreover, the ontology morphisms of the first paper are a special case of theory morphisms over an institution. (For the cognescenti, V-insitutions generalize Chu spaces, and were proposed for similar applications long before Chu spaces.) Local logics do not appear to allow a sufficiently strong distinction between the object level of ongologies and the meta level of ontology languages; this distinction is much clearer with institution theory, and also, it is know how to obtain much more powerful composition operations in that framework, because composition of parameterized software modules is one of its major applications.
Some interesting related theoretical work has been done in the University of Liverpool Computer Science Department:
Phil Bernstein has been applying category theory, and even institutions, to schema management (which he unfortunately calls "model management," thus conflicting with the usual meanings of "model" in both logic and in conceptual modelling). There is a very suggestive webpage on model management at Microsoft Research, from which some papers can also be downloaded, though none of them mention the category theory advertised on the homepage. This group wants to manage (i.e., translate, integrate, etc.) database schemas.
A paper by Suad Alagic and Philip Bernstein, entitled A model theory for generic schema management, which appeared in DBPL 02, uses not only categories, but also institutions, order sorted algebra, and other abstract goodies is mentioned on the model management page, but without a link; you should instead go to Bernstien's homepage for the link; you may also be able to get it from the Springer Verlag site. However, my recent UCSD DB research seminar talk goes further than this work, in using some more recently developed ideas, including institution morphisms to translate among the languages for describing schema and ontologies, and Grothendieck institutions to include multiple institutions in the same setting. This is necessary because many different languages are used for describing schemas and ontologies.
Below is the beginning of a list of citations for my own papers that seem most relevant, with URLs for those that are on the web (I will try to get the others online eventually). The last four papers discuss various aspects of institution theory.
One very interesting project is Amphion, which provides a graphical interface, allowing NASA scientists to compose numerical subroutines into programs for planning and analyzing interplanetary missions. The basic underlying technology is automated deduction, as described in the following:
.... More to come here .....
This section has two subsections, the first for social science research on classification systems, and the second for tentative design principles for tools to support data integration where meta-data is involved.
Since ontologies are (at least) classification schemes, research on how classification schemes are actually used, in real world work settings, is highly relevant to people working on ontologies. One paper where it is relatively easy to see the relevant is Building bridges: Customisation and mutual intelligibility in shared category management, by Paul Dourish; also available from PARC. This paper points out some of the practical difficulties that arise in managing large classification schemes used by work groups, and also describes a tool that can help to solve some of the difficulties. The book Sorting Things Out by Geoff Bowker and Leigh Star (MIT 1999) also discusses real world classification schemes and how they work in practice, using several case studies and developing much interesting theoretical material (Dourish got some of his key insights from earlier work by Bowker and Star). Substituting "ontology" for "classification" in these works, some of their major observations are that:
The following are some proposed principles for tool development, based on my experience developing research tools for software engineering, and on my consulting experience and teaching in user interface design, social aspects of information technology, and software engineering. Of course principles like these are always subject to revision, qualification, and interpretation.