CSE151 S00 Project#2
Biological Knowledge Extraction

Your assignment is to use natural language parsing techniques to "extract" knowledge from journal abstracts like the following:

UI - 10753721

AB - Transforming growth factor beta-1 (TGF-beta1), which is present in lung tissue, has been suggested to play a role in modulating vascular cell function in vivo. The action of TGF-beta1 in vivo, especially at the local site of application to connective tissue, is anabolic and leads to pulmonary fibrosis and angiogenesis, strongly indicating that TGF-beta may have practical applications in repair of tissue injury caused by burns, trauma, or surgery. In the present study, we have used cultured bovine pulmonary artery endothelial (BPAE) cells as a model system. Expression of various proteins, including SPARC (secreted protein acidic and rich in cysteines), type IV procollagen and fibronectin (FN) was examined by radiolabeling the cells with [3H]proline, immunoprecipitation with specific antibodies, and Northern blot analyses by using specific cDNA probes. Cultured cells were labeled with [3H]proline for 24 h in either the absence or in the presence of TGF-beta1 (0-20 ng/ml). Incorporation of radioactivity was observed in a concentration-dependent manner, maximal at 5 ng/ml. Northern blot hybridization demonstrated that TGF-beta1 (5 ng/ml) treatment of BPAE cells caused an increase in steady-state levels

You should attempt to convert this knowledge into FOPL-like sentences such as the following.

Pay special attention to characterizations of ambient conditions (that the author may or may not make explicit!) which qualify the general statement of the fact.

We are collecting a list of relevant biomedical dictionaries and thesauri

D. Swanson's search for "implicit knowledge" is one inspiration. For a quick gloss, see: Finding Out About, Sect. 6.5.3

Ontologies especially designed for biological concepts include EcoCYC, RiboWeb (Russ Altman) and (a recent effort) TAMBIS.

Obviously, attempting to bring structure to these unstructured data sources is likely to mediated using XML resources. A couple emerging standards especially designed for biomedical content are: BSML and BioML