Database Table Design

Here are the definitions of our Sybase terms and links tables, in the form of Sybase isql commands.

These tables contain much the same sort of data as the basic UMLS relational files, but in a slightly different layout and without most of the ancillary tables. Our TERMS table defines a list of lexically distinct term names, along with information about who added it to the database and when, and fields for storing cross-references to other taxonomies like the UMLS and SNOMED. The idea of a "concept" is the same as that used in the UMLS, where a particular numeric ID can be assigned to several different terms to indicate that they are all synonymous. (One of these terms must be designated as a "preferred" term).

The biggest differences from the UMLS sources are in the way semantic links are stored. In the Digital Anatomist terminology database, there is no separate "semantic type" field for a term or concept; this and all other relationships between terms are represented as rows in the LINKS table. By retrieving subsets of the data a query program can find whole hierarchies of terms that are all related in one way, or else find all the terms related to a specific term with each of the possible link types. Since this information is all rolled into the same table, the same tools we use for viewing and editing the semantic type hierarchy can be used for all other defined relationship hierarchies as well.


Now for a more detailed description of the fields and how they're used. In the TERMS table, we have the following:

        Field Name    Type       Max. Length  Value Required
        ------------  ---------  -----------  --------------
        Name          character  250          yes
        UWDAID        integer                 yes
        UMLSID        integer
        SNOMEDID      integer
        Role          character  30
        Authority     character  50
        Author        character  50
        Modification  datetime
        mark          integer
        isNLMstype    integer

Name is the actual term name, and can be quite long. It can contain any character data, but the Knowledge Manager application prevents the use of control characters, double-quotes, wildcards, and special characters like "|" which are used as delimiters in exported files.

UWDAID is an integer that is unique to a Concept. Several rows in the terms table will have the same UWDAID value if they are synonyms of each other. All values for this field are assigned locally, with the value being incremented by one for each new term added to our database from any source.

UMLSID is the integer portion of the UMLS Concept Identifier for a term, where it is known. Since we are concerned largely with extending the UMLS data sources, this field gives us an easy way to track new vs. existing terms.

SNOMEDID contains the integer ID of a term as found in the SNOMED nomenclature, similar to the UMLSID field.

Role could be called an "enumerated" text value; that is, it can take on a value from among a limited number of possibilities. Right now, a term in our database can be a "Preferred Name" (the preferred form of the name of a concept), a "Synonym", or a "Link" (used for those few artificial terms which are used to identify hierarchies). A Sybase trigger is used to enforce the use of only the allowed possible values.

Authority is a generic character field that can be used to track the original source of a term; for instance, if an author includes a term because it was found in a textbook, a brief bibliographic entry can be made here to record the attribution.

Author contains the name (usually the login name) of the person who created a term. This will usually be assigned automatically but can be overridden by an operator who is performing data-entry tasks for some other author.

Modification is a timestamp used to keep track of when each row was entered or last modified.

mark is an integer flag value that is provided for use in executing set-inclusion operations on the term list.

isNLMstype is an integer flag taking the values NULL, 0, or 1. If the field contains a 1, then this term is a part of the Meta-1 semantic type hierarchy or else a locally-created extension of it. Any term marked this way can be used during data entry to assign a semantic type to a concept, which causes that concept to be linked into the "isa" hierarchy in the appropriate place.