Subject to approval by our group and NLM staff I suggest we initially provide deliverables in the same format as we did last year. This format will certainly evolve but it provides a good starting point.
The previous output format is described in the final report, which I have converted to Word and html, for ease of communication. The last deliverables were sent as a tar file to NLM. The untarred directory can be found in neuron:~brinkley/synapse/brinkley/Projects/UMLSThorax/deliverables/LM43546-v1.2-7.21.97.
The format, however, does not contain the criteria I used to generate the output files. These are embodied in a set of Lisp programs I wrote that connect to the sybase database that holds the Knowledge Base, extract the relevant terms and links, and generate the output files that are contained in the delvierables. This program is called kbd, and it can be found in neuron:~brinkley/synapse/brinkley/Projects/UMLSThorax/kbdreport. The criteria for selection (which I repeat below) are contained in the function id-relevant, which is in kbexport.lsp. The main task for Darren is to write a perl program that generates the same format output as that generated by kbd, and which uses the same filtering criteria for deciding which terms and links to export.
I have now generated a final report for Sept 1998, as a modification of last year's report. Here is the html version. The original is in Word 97.
The criteria are designed to remove inconsistencies (making sure all terms referred to by links are in the terms table), and to remove all neuronames that are not being used in the DA knowledge base. The exported terms and links will be a superset of those exported last year.
A row (link) in the links table should only be exported if:
A row (term) in the terms table should only be exported if it is a parent or child in at least one row in the exported links table, where the link type is isa.
I suggest we first remove all terms and links from our existing knowledge base that don't satisfy the above criteria. This way our working copy will be consistent and free from neuronames. We can then export directly all rows in the terms and links table. We can also create a new knowledge base and server that just has the latest neuronames from Doug Bowden, and can use these two servers as a basis for expermenting with methods to make them appear to be one to the client.
Here are the suggest methods for cleaning up the knowledge base, and for exporting the four text files described in last year's report.