Integrating Meta-1 Semantic Types into the UW BioStructure SI Knowledge Base Technical Note Structural Informatics Systems Development Group Department of Biological Structure, University of Washington, Mailstop SM-20 by Kraig Eno, kraig@u.washington.edu January 25, 1993 This document describes the process of loading the relevant portions of the MRSTY relation of the Meta-1 sources into Sybase for use with our local knowledge base. We had previously loaded the anatomy-related portions of MRMC, the master term list, into a table called "kbterms". We had also loaded into our term list the names of the semantic types themselves, such as "body location or region" and "embryonic structure". Current state of our knowledge base ----------------------------------- Our knowledge base consists of two tables, defined as follows: kbterms = id (int), name (char), usage (char), source (char) kblinks = node (int), target (int), link (char) All anatomic terms, semantic type names, and miscellaneous terms are loaded into the TERMS table; terms which are synonyms of one another all have the same ID, as in the Meta-1 sources. Our network consists of directed links between term identifiers, as stored in the KBLINKS table. The LINK field identifies the particular relation, such as "ISA", and the node and target indicate ID's from the KBTERMS table. In this way, specific medical terms and semantic type names can be related to each other with a minimal database structure; different queries produce different kinds of semantic information from the same two tables. Sequence of Operations ---------------------- Adding the new links was done with standard Unix and Sybase utilities, as follows: 1. make a text file containing unique ID's of terms from our database Sybase> select distinct id into table1 from kbterms (produces 3916 rows) 2. Make an ASCII text file using newline as the record delimiter Unix> bcp table1 out table1.out (Sybase binary copy utility) 2. sort table1.out >table1 (prepare to use the Unix "join" command) 3. join -t\| table1 MRSTY >links.dat (generates 2810 lines of data) 4. load new ISA links into Sybase Sybase> create table table2 (id int, stype varchar(50)) Unix> bcp table2 in links.dat Sybase> alter table table2 add node int null Sybase> update table2 set node= (select id from kbterms where name=(lower(table2.stype)) Sybase> select node,id,link="ISA" into table3 from table2 Unix> bcp table3 out table3.dat Unix> bcp kblinks in table3.dat This resulted in 2810 new records in the KBLINKS table, all labelled ISA. We have a set of ISA links that had been previously defined for use locally; most of these relationships were more specific than those contained in MRSTY, and not in actual conflict. However, a few questions arise when comparing the two lists. Conflict resolution between the UMLS and our local knowledge base ----------------------------------------------------------------- We make the assumption that semantic type assignment in MRSTY is equivalent to an ISA link. That is, when the Meta-1 semantic network SRSHL relation indicates that "Embryonic Structure" is an "Anatomical Structure", and the SRSTY relation defines "yolk sac" as an "Embryonic Structure", that these are different levels of the same hierarchical relation, as in Anatomical Structure --> Embryonic Structure --> yolk sac As noted, we had previously defined 1029 ISA links, some of which used terms from sources other than the UMLS. These included links like white matter --> nerve --> optic nerve There are several incompatibilities between our ISA relationships and the new ones from Meta-1; we will attempt to outline some possible approaches for handling them. We will position our relationships as enhancements to those defined in Meta-1 since the UMLS source is the larger of the two and is in common use elsewhere. The types of differences fall into two categories: 1. terms that do not exist in Meta-1 2. terms that exist in different places in both networks 2a. more specific vs. more general information 2b. contextual differences 2c. synonyms (1) Locally-defined Terms and Links examples: [local] tissue --> white matter [local] embryonic structure --> quadrigeminal plate The UMLS lacks the detailed terminology that is present in some other sources, and which is necessary for a more complete representation of neuroanatomical knowledge. This is no real incompatibility; new links were simply added to existing nodes of the network, to further the depth of knowledge. Q: should the links be merely added, or is there a formal process of categorization by which each term should be checked? Q: how should added information be viewed, since they are unreviewed by the authors of Meta-1? (2a) Specificity Differences example: [UMLS] Body Part, Organ, or Organ Component --> optic nerve [local] nerve --> optic nerve These are similar to (1), except that the terms ARE already present in the UMLS semantic type hierarchy. One or the other of the two should remain, but in most cases we again are extending the depth of the existing UMLS source rather than actually changing the categorization. Q: should both instances be kept? It is possible to refer to the SOURCE field, when doing queries, and different types of queries may be able to make use of the information; while others might give erroneous results if the same term appears in more than one place. (2b) Contextual Differences example: [UMLS] Body Part, Organ, or Organ Component --> gray matter [local] tissue --> gray matter Locally-defined links were entered with neuroanatomy specifically in mind. Therefore, "gray matter" is a tissue type, rather than a collection of cells with a particular function; the UMLS definition of Body Part or Organ includes the phrase "relatively localized in comparison to tissues." If the specific knowledge base specifies the particular types of tissue which can be classified as "gray matter", as ours does, then the gray matter is seen not as a large monolithic organ but as a collection of several specialized tissue types. (IS THIS TRUE, JIM?) Q: In the case of incompatibilities, is it possible to retrieve more information about WHY a particular term was given a particular classification? Again, knowledge of the review process may be helpful. (2c) Synonyms example: [UMLS] (What is a good example of this?