NLM Contract extension for adding abdominal terms to UMLS.

Preparation of deliverables for September 1998

I. Overview

Subject to approval by our group and NLM staff I suggest we initially provide deliverables in the same format as we did last year. This format will certainly evolve but it provides a good starting point.

The previous output format is described in the final report, which I have converted to Word and html, for ease of communication. The last deliverables were sent as a tar file to NLM. The untarred directory can be found in neuron:~brinkley/synapse/brinkley/Projects/UMLSThorax/deliverables/LM43546-v1.2-7.21.97.

The format, however, does not contain the criteria I used to generate the output files. These are embodied in a set of Lisp programs I wrote that connect to the sybase database that holds the Knowledge Base, extract the relevant terms and links, and generate the output files that are contained in the delvierables. This program is called kbd, and it can be found in neuron:~brinkley/synapse/brinkley/Projects/UMLSThorax/kbdreport. The criteria for selection (which I repeat below) are contained in the function id-relevant, which is in kbexport.lsp. The main task for Darren is to write a perl program that generates the same format output as that generated by kbd, and which uses the same filtering criteria for deciding which terms and links to export.

I have now generated a final report for Sept 1998, as a modification of last year's report. Here is the html version. The original is in Word 97.

 

II. Criteria for exporting terms and links

The criteria are designed to remove inconsistencies (making sure all terms referred to by links are in the terms table), and to remove all neuronames that are not being used in the DA knowledge base. The exported terms and links will be a superset of those exported last year.

A row (link) in the links table should only be exported if:

  1. The link type is one of isa, branch of, tributary of, or part of.
  2. Both the parentID and the childID are valid rows in the terms table

A row (term) in the terms table should only be exported if it is a parent or child in at least one row in the exported links table, where the link type is isa.

 

 

III. Suggested approach

I suggest we first remove all terms and links from our existing knowledge base that don't satisfy the above criteria. This way our working copy will be consistent and free from neuronames. We can then export directly all rows in the terms and links table. We can also create a new knowledge base and server that just has the latest neuronames from Doug Bowden, and can use these two servers as a basis for expermenting with methods to make them appear to be one to the client.

Here are the suggest methods for cleaning up the knowledge base, and for exporting the four text files described in last year's report.

  1. Save the current knowledge base
  2. Delete all terms and links that don't satisfy the above criteria.
  3. Obtain feedback from CR and Onard to be sure we haven't lost anything
  4. Export all remaining rows in the terms and links files according to the format specified in the final report.
  5. Export the Stypes.txt rows using a similar method to that found in kbdreport/lsp/classkbnode.lsp:getstypes
  6. Hand edit Defs/txt from last year's deliverables to reflect any changes made by CR and Onard.

 

 

Previous final report