Notes for the SEEK Project


This webpage is for recording ideas, suggestions, comments, etc. related to my participation the NSF sponsored Science Environment for Ecological Knowledge (SEEK) project. The goals of this project are to : (1) significantly improve how researchers access global ecological data; (2) how researchers locate and use distributed computational services; and (3) to capture, reproduce, and extend the data analysis process itself. Products of the project are to include: a grid for ecologists; a semantic mediation system; an analysis and modeling system; one or more ecological ontologies; and some concept-based taxonomy tools. The project began full scale operation on 1 January 2003, and is connected with the LTER network, consisting of 24 individual ecology research sites in the US. SEEK is a large cooperative project, with the San Diego team consisting at this time of Jenny Wang, Victor Vianu and myself from the UCSD Computer Science Dept, plus Bertram Ludaescher and Tony Fountain from the San Diego Supercomputer Center; there are many additional personnel, at the Universities of New Mexico, Kansas, and California at Santa Barbara, among other places.

Warning: This page has not been updated since mid 2003; though it may still be somewhat useful, the interested reader is referred to my Database Project Homepage for all the latest work on the SEEK project.

I began this webpage just to avoid losing track of URLs, and to help keep straight in my own mind all these websites, papers, projects, languages, and systems, their relations with each other, and with my own work. The first section below lists and briefly describes some literature I found, mostly on the web, while the second section discusses workflow languages and related software, systems, and theory. The third section describes some relevant results from the social sciences, and gives some tentative principles to consider when building tools for this area. I plan to gradually elaborate all these sections, and might even add some others. Suggestions for additions, corrections, or improvements, as well as any other comments, are very welcome.


1. Some Literature Related to Ontologies

The literature related to this project one way or another is so diverse and confused, that it is difficult even to roughly classify it into areas. However, the subsections below represent an attempt to do this with ontologies and related topics, starting with two projects that applied ontology technology to ecology. Subsequent subsections discuss variousf classes of languages and tools that have been proposed for knowledge management, while the final subsection describes some of my own work that is somehow related.


1.1. The Ecolingua Project and Related Work

Some interesting work on ecological informatics has been done in collaborations between Brazil and the Software Systems and Processes Group of the AI Department at the University of Edinburgh. The first two items below describe two such projects, the third gives some positive results, while the fourth and fifth focus more on negative results, and the sixth describes some infrastructural support that they developed. Some later, more theoretical research from the Edinburgh group is described in section 1.3 below.

  1. Towards a Unified Paradigm for Amazonian Knowledge, University of the Amazon, Brazil, 1996. This web document includes parts of the original proposal for a project to create a uniform interchange language for ecological knowledge, called Ecolingua.
  2. The DECaFf-KB (Distributed Environment for Cooperation among Formalisms for Knowledge Bases) project, 1998.
  3. Metadata-Supported Automated Ecological Modelling, by Virgínia Brilhante and David Stuart Robertson. In Environmental Information Systems in Industry and Public Administration, ed. Claus Rautenstrauch and Susanne Patig, Idea Group Publishing (Hershey PA, 2001). A shorter version by Brilhante appears in EDSSAI-99, Papers from the AAAI-99 Workshop on Environmental Decision Support Systems and Artificial Intelligence, 1999, Orlando, Florida, pages 90-95. Technical Report WS-99-07. AAAI Press, Menlo Park, California. See also citeseer.nj.nec.com/305083.html and citeseer.nj.nec.com/brilhante99using.html.
  4. On the Insufficiency of Ontologies: Problems in Knowledge Sharing and Alternative Solutions, by Flavio Correa da Silva, Wamberto Weber Vasconcelos, David Stuart Robertson, Virginia Brilhante, Ana de Melo, Marcelo Finger, and Jaume Agustí. In Knowledge-Based Systems Journal, 15, no. 3, pages 147-167, 2002. The URL does not seem to work, so instead see citeseer.nj.nec.com/382117.html.
  5. Why Ontologies Are Not Enough for Knowledge Sharing, by Flavio Correa da Silva, Jaume Agusti, Ana Cristina Vieira de Melo, Wamberto Weber Vasconcelos and David Stuart Robertson, in Proceedings, International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pages 520-529, Springer, LNAI vol. 1611, 1999. See also citeseer.nj.nec.com/correadasilva99why.html.
  6. A Lightweight Capability Communication Mechanism, by David Robertson, Flavio Correa da Silva, Jaume Agusti, and Wamberto Vasconcelos. In Proceedings, 13th International Conference on Industrial and Engineering Applications of Articial Intelligence and Expert Systems, ed. R. Loganantharaj, G. Palm and M. Ali, Springer LNAI, vol. 1821, pages 660-670, 2000. See also citeseer.nj.nec.com/robertson00lightweight.htm.
The Ecolingua project seems to have failed to meet its very ambitious goals, for reasons including those indicated in the fourth and fifth papers. A complete list of papers from the Edinburgh group is available at www.dai.ed.ac.uk/groups/ssp/publications.html, although links to papers themselves are not currently available there (because their building recently burned down!).


1.2. Ontology Languages, Conceptual Modelling Languages, Description Logics, Resource Description Languages, F-Logic, ...

An enormous number of languages, systems, and logics have been proposed for use with information resources (database systems, the web, etc.). Some have a syntax but no semantics, while others have both in a quite rigorous way. An important trade-off that needs to be negotiated is between expressiveness and effectiveness; in general, the more powerful a language is, the more difficult it is to implement effectively. For example, in full first order logic, many important problems are not decidable. Moreover, even when problems are decidable, they can have a very high complexity. Another very important problem is how all these systems relate to one another, and even how to talk about how they relate. A main point will be that all of them are (or should be) institutions (see section 1.4), and that relations among them should be given by institution morphisms.

It is useful to bear in mind that no logic can possibly capture "meanings" in anything remotely like the human sense of that word. For example, a given database might contain a social security number, the number for a secret Swiss bank account, and its balance; their meaning becomes quite different if this data belongs to an FBI agent, a criminal, a terrorist, or an ordinary citizen. No logic can even capture what it means to be a social security number, let alone all the associated nuances, such as how one can be obtained, how it can be used, its historical development, and its cloudy future. Although this seems relatively obvious (at least, to me), it runs contrary to much hype that one sees about the "semantic web" and similar ambitious schemes.

It appears that (at least) three different communities are developing "ontology languages," though they often use different terminologies and tend not to refer to each other's work. One community is that of the semantic web, centered at w3c. A different community is more concerned with databases, and tends to speak of "conceptual modelling languages" rather than "ontology languages," while a third community carries on traditions from AI in knowledge representation, and so tends to speak of "knowledge management" etc.

KIF is from this third community, and was developed at Stanford by Michael Genesereth. KIF is an acronym for "Knowledge Interchange Format," and it contains all of first order predicate logic, plus some "meta-knowledge" features, and some standard data structures. It was originally intended as a standard interchange format for knowledge bases between computers, rather than as a language to be implemented and/or to be used by humans, and its syntax is based on that of Lisp; however, it has a nice formal semantics, and has been used to give formal semantics to other languages. Its great expressive power is also what makes it difficult to implement and inefficient to use. Another language in this family is Stanford's Ontolingua, which again is highly expressive and correspondingly difficult to process.

An interesting family is that of description languages, or description logics, since in general these are proper subsets of first order logic, often the Horn clause subset or something closely related, since this is easily implemented, e.g., with Prolog (though negation can be a problem, since the semantics of negation by failure is not classical; also, you must ignore the blatantly non-logical features of Prolog). There is a Description Logic Community website, which lists 16 different current systems (each with a different language), several hundred research papers, and other resources. Languages in this family subsume many classic (but now extinct) AI knowledge representation languages, e.g., based on frames, semantic networks, and KL-ONE-like features. The downside is that you can't do very much with such systems; mainly they can keep track of classification hierarchies. The best implementations for languages in this family seem to be DLP, FaCT, and RACER (for "Renamed ABox and Concept Expression Reasoner"); all these are recent, sound, complete, efficiently implemented, and fully formalized (unlike many other systems). Cambridge University Press has just published the Description Logic Handbook, which seems quite comprehensive.

Among languages termed resource description languages, perhaps the most often mentioned are RDF, DAML, OIL, OWL, and Topic Maps. The first 4 of these are standards, while the last is a system; all are intended to be used over the web, but Topic Maps is oriented towards individual users. RDF (for "Resource Description Framework") is a w3c standard (like XML), whereas DAML ("DARPA Agent Markup Language") and OIL (for "Ontology Inference Layer") have been defined by the DARPA community; these have been combined, as DAML+OIL, and given precise semantics, both model theoretic and logical. An Axiomatic Semantics for RDF, RDF-S, and DAML+OIL, by Files and McGuinness, gives 137 axioms in KIF for an older version of DAML+OIL. DAML builds on RDF-S, a small extension of RDF, and OIL builds on DAML. The DAML site includes a model theoretic semantics for DAML+OWL. OWL is a newer w3c standard, derived from DAML+OIL, and built on RDF; it seems to be the most recent and promising member of this family. A comparison is given by Lars Marius Garshol of Ontopia, in his paper Topic Maps, RDF, DAML, OIL; they also have a short summary Ten Theses of Topic Maps and RDF. However, one should be alert for commercial bias here, since Ontopia is the corporation that sells Topic Maps; moreover, Topic Maps does not seem to have a formal semantics, though it is said to have a good user interface.

The third family we consider commonly calls its members conceptual modelling languages. This includes ER, UML, CommonKADS, KADS22, CML2, UXF, ORM, and many others. OCML is a system intended to implement Ontolingua; WebOnto is a Java applet and custom web server which has been used to support OCML.

Michael Kifer has some very nice work on F-logic, which is implemented in the Flora-2 language. .....


1.3. Ontology Mappings

Some mainly theoretical work on ontology mapping is being done in the Centre for Intelligent Systems and their Applications at Edinburgh, especially by Marco Schorlemmer. These and other papers can be downloaded from Schorlemmer's homepage and his publications page, at the Edinburgh School of Informatics (but not now working, due to their fire).

  1. Information-Flow-based Ontology Mapping, by Yannis Kalfoglou and Marco Schorlemmer. In On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE, Lecture Notes in Computer Science 2519, pages 1132-1151. Springer, 2002.
  2. Duality in Knowledge Sharing, by Marco Schorlemmer. In Seventh International Symposium on Artificial Intelligence and Mathematics. Fort Lauderdale, January 2002.
  3. Automated Support for Composition of Transformational Components in Knowledge Engineering, by Marco Schorlemmer, S. Potter, and David Robertson. Informatics Research Report EDI-INF-RR-0137.
  4. Formal Knowledge Management in Distributed Environments, by Marco Schorlemmer, S. Potter, David Robertson, and Derek Sleeman. In ECAI 2002 Workshop on Knowledge Transformation for the Semantic Web, p. 111. Lyon, July 2002.
  5. Enabling Services for Distributed Environments: Ontology Extraction and Knowledge-Base Characterisation, by Derek Sleeman, David Robertson, S. Potter, and Marco Schorlemmer. In ECAI 2002 Workshop on Knowledge Transformation for the Semantic Web, pages 85-92. Lyon, July 2002.
The first paper uses the channel logic of Barwise and Seligman to define ontology morphisms, which can be used to integrate ontologies. The second paper uses the local logics of channel theory and Chu spaces to formalize a duality between ontologies and their instantiantiations; this paper also cites my early work on using colimits (from category theory) for knowledge integration, and also includes some interesting examples of integrating knowledge in ontologies over different logics. The third paper discusses composition operations on ontologies languages, again using the local logics, and also discusses the akt editor for applying such operations. The fourth paper discusses a tool for evolving large distributed knowledge based on ontologies, making use of histories of transformations.

The first three papers are closely related to my work on "institutions," an abstract axiomization of the notion of "logical system," much of it with Rod Burstall of Edinburgh (see section 1.4). Local logics and Chu spaces are both special cases of institutions, and the duality is also a special case of the syntax-semantics duality that is built into institutions. Moreover, the ontology morphisms of the first paper are a special case of theory morphisms over an institution. (For the cognescenti, V-insitutions generalize Chu spaces, and were proposed for similar applications long before Chu spaces.) Local logics do not appear to allow a sufficiently strong distinction between the object level of ongologies and the meta level of ontology languages; this distinction is much clearer with institution theory, and also, it is know how to obtain much more powerful composition operations in that framework, because composition of parameterized software modules is one of its major applications.

Some interesting related theoretical work has been done in the University of Liverpool Computer Science Department:

  1. Formalising ontologies and their relations, by Trevor Bench-Capon and Grant Malcolm. In Trevor Bench-Capon, Giovanni Soda and A. Min Toa (eds.), Proceedings of the 16th International Conference on Database and Expert Systems Applications (DEXA '99), Springer Lecture Notes in Computer Science, volume 1677, pages 250-259. Springer, Berlin, 1999.
  2. Semantics for Interoperability: relating ontologies and schemata, Trevor Bench-Capon, Grant Malcolm and Michael Shave. 2000.
The first paper suggests a very natural formalization of ontologies as theories over a certain logic, using colimits for integration. The second paper extends this work by formalizing the relations that should hold between database schemas and their onologies.

Phil Bernstein has been applying category theory, and even institutions, to schema management (which he unfortunately calls "model management," thus conflicting with the usual meanings of "model" in both logic and in conceptual modelling). There is a very suggestive webpage on model management at Microsoft Research, from which some papers can also be downloaded, though none of them mention the category theory advertised on the homepage. This group wants to manage (i.e., translate, integrate, etc.) database schemas.

A paper by Suad Alagic and Philip Bernstein, entitled A model theory for generic schema management, which appeared in DBPL 02, uses not only categories, but also institutions, order sorted algebra, and other abstract goodies is mentioned on the model management page, but without a link; you should instead go to Bernstien's homepage for the link; you may also be able to get it from the Springer Verlag site. However, my recent UCSD DB research seminar talk goes further than this work, in using some more recently developed ideas, including institution morphisms to translate among the languages for describing schema and ontologies, and Grothendieck institutions to include multiple institutions in the same setting. This is necessary because many different languages are used for describing schemas and ontologies.


1.4. Some Related Local Work

Below is the beginning of a list of citations for my own papers that seem most relevant, with URLs for those that are on the web (I will try to get the others online eventually). The last four papers discuss various aspects of institution theory.

  1. Interactive Schema Matching with Semantic Functions, by Guilian Wang, Joseph Goguen, Young-Kwang Nam, Kai Lin. Submitted to Semantic Integration Workshop, Sanibel Island FL, October 2003.
     
  2. Integrating Data by Integrating Meta-Data and Meta-Data Languages, the abstract for a talk given at the UCSD CSE database research seminar, 7 February 2003.
  3. Ontologies, Ontology Languages, and Data Integration by Metadata Integration. These are webpages used as slides for a lecture; see especially the page Federating the Kingdoms of Ontology.
  4. My Data Integration page.
  5. A Categorical Manifesto, in Mathematical Structures in Computer Science, Volume 1, Number 1, March 1991, pages 49-67.
  6. Putting Theories Together to Make Specifications, by Joseph Goguen and Rod Burstall, in Proceedings, 5th International Joint Conference on Artificial Intelligence (MIT, Cambridge, Massachusetts), 1977, pages 1045-1058. This was the first paper on Clear, which we saw at that time as mainly for structured knowledge representation.
  7. A Metadata Integration Assistant Generator for Heterogeneous Distributed Databases, by Young-Kwang Nam, Joseph Goguen, and Guilian Wang, in Proceedings, International Conference on Ontologies, DataBases, and Applications of Semantics for Large Scale Information Systems, Springer, Lecture Notes in Computer Science, Volume 2519, 2002, pages 1332-1344; from a conference held in Irvine CA, 29-31 October 2002.
  8. A Metadata Tool for Retrieval from Heterogeneous Distributed XML Documents, by Young-Kwang Nam, Joseph Goguen, and Guilian Wang. To appear in Proceedings, International Conference on Computational Science, Springer, Lecture Notes in Computer Science, 2003. Describes a a later version of the same tool, with application to distributed collections of XML documents.
     
  9. The institutions homepage gives some intuitions and references (this will gradually be improved).
  10. Institutions: Abstract Model Theory for Specification and Programming, by Joseph Goguen and Rod Burstall, Journal of the Association for Computing Machinery, Volume 39, Number 1, January 1992, pages 95-146. (Drafts of this paper go as far back as 1985.)
  11. A Study in the Foundations of Programming Methodology: Specifications, Institutions, Charters and Parchments, by Joseph Goguen and Rod Burstall, Proceedings, Conference on Category Theory and Computer Programming (Guildford, Surrey, U.K.), edited by David Pitt, Samson Abramsky, Axel Poigne, and David Rydeheard, Springer, Lecture Notes in Computer Science, Volume 240, 1986, pages 313-333.
  12. Institution Morphisms, by Joseph Goguen and Grigore Rosu, in Formal Aspects of Computing 13, 2002, pages 274-307; special issue edited by Don Sannella, in honor of the retirement of Prof. Rod Burstall.
The first paper gives a number of principles for applying category theory in a principled way, including the principle that knowledge integration should be done by colimits in suitable categories, an idea that was introduced in the second paper, in the context of the Clear knowledge representation language. The third paper describes a tool for integrating heterogeneous databases. The fourth is the most complete introduction to the basic ideas of institution theory; the fifth contains the notion of V-institutions, where V is a set, or some other structure, of truth values (as needed for Chu spaces etc.). The sixth paper is our latest on institutions, a systematic study of translations among institutions; this should be useful for translations among different ontology languages. The last three items above are websites that describe on-going related research, though not yet in very great detail.


2. Workflow, Process Composition, etc.

One very interesting project is Amphion, which provides a graphical interface, allowing NASA scientists to compose numerical subroutines into programs for planning and analyzing interplanetary missions. The basic underlying technology is automated deduction, as described in the following:

The Amphion website has links to a number of other papers, including both some scientific and some education applications.

.... More to come here .....


3. Some Related Social Science Research and Tentative Design Principles

This section has two subsections, the first for social science research on classification systems, and the second for tentative design principles for tools to support data integration where meta-data is involved.


3.1. Social Science Research on Classification

Since ontologies are (at least) classification schemes, research on how classification schemes are actually used, in real world work settings, is highly relevant to people working on ontologies. One paper where it is relatively easy to see the relevant is Building bridges: Customisation and mutual intelligibility in shared category management, by Paul Dourish; also available from PARC. This paper points out some of the practical difficulties that arise in managing large classification schemes used by work groups, and also describes a tool that can help to solve some of the difficulties. The book Sorting Things Out by Geoff Bowker and Leigh Star (MIT 1999) also discusses real world classification schemes and how they work in practice, using several case studies and developing much interesting theoretical material (Dourish got some of his key insights from earlier work by Bowker and Star). Substituting "ontology" for "classification" in these works, some of their major observations are that:

One particular conclusion is that ontologies must evolve if they are to be useful, and inevitably do so if they are used; therefore support for incremental change is essential. It is also worth noticing that there has been considerable recent evolution in the languages and tools used for defining and processing ontologies.


3.2. Tentative Design Principles

The following are some proposed principles for tool development, based on my experience developing research tools for software engineering, and on my consulting experience and teaching in user interface design, social aspects of information technology, and software engineering. Of course principles like these are always subject to revision, qualification, and interpretation.

  1. Tool design should be preceeded by a careful ethnographic study of what real users really do.
  2. Lightweight tools are better than heavyweight tools, since they are easier to adapt to new applications and circumstances.
  3. Tools should be evaluated with real users.
  4. Semantic tools should have rigorous semantic foundations.
  5. Insofar as possible, tools should run over the web, using state of the art technology.


This material is based on work supported by the National Science Foundation under Grant No. 9901002. Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.


To the UCSD Meaning and Computation Lab homepage
To my research projects homepage
Maintained by Joseph Goguen
Last modified: Thu Aug 14 09:24:41 PDT 2003