Caveats for Using Sybase and Ingres in a Client/Server Environment Technical Note Structural Informatics Systems Development Group Department of Biological Structure, University of Washington, Mailstop SM-20 by Kraig Eno, kraig@u.washington.edu November 16, 1992 This document describes the idiosyncrasies discovered during the implementation of a client/server-model using two database platforms: Sybase on the NeXT computer and Ingres on the IBM RS/6000. It will detail some features specific to each package, and then list a few differences between the two which we found significant. Please understand that the Sybase/NeXT platform has been our primary server development platform for the past two years, and our experience with Ingres has been quite limited; a small bias may be apparent in the evaluation of the two packages, and there are probably features in Ingres that we do not know about yet. Port Listener daemons All database access for our project is done via a server process written in C, which communicates with the database software using a vendor-supplied library. Our server receives plain-text commands from a client, translates them to one or more SQL commands, and sends the results back after post- processing and reformatting. Clients request a connection by opening a TCP socket to port number 5002 (ports 5003 and 5004 are used for experimental versions of the same system). On the NeXT, we entered service names into the /etc/services file and made appropriate entries in /etc/inetd.conf to cause the INETD process to start our server code for each incoming connection; in the case of the RS/6000, since we had no access to these system files, daemon programs are kept running to handle connections to the TCP ports and fork off the server processes. SYBASE on the NeXT Machine version identifier "SQL Server/4.0/P/NeXT/0.981/58/ of Thu Aug-31-89" Multi-user access The Sybase database server bundled with the NeXT computer is limited to 5 simultaneous user connections. INETD starts one server process for each client session; these server processes stay connected to the Sybase database for the duration of the session. A bug in the server (as reported in comp.sys.next.misc) may cause the entire server to refuse any connection after a particular number of sessions has been reached; we may have seen this behavior 2 or 3 times in the past year. Restarting the server process fixes the problem. Ingres on the IBM RS/6000 model 530 version identifier "6.3/02p (ris.us5/01)" Multi-user access It is possible for a server process on the RS/6000 to become stuck in a compute loop and stay around after a client disconnects; these may lock database resources and prevent other clients from connecting to the database server. The cause of this behavior is unknown, but has only been observed n a few cases. The condition usually produces "GCA protocol" errors in response to connection attempts; it can be rectified by killing all the server processes related to the "bio" database and restarting the listener(s). Also, the Ingres server we used was ostensibly an unlimited license, but did not support more than 4 simultaneous sessions. We found that a COMMIT command had to be issued after each SQL command in order to free up locks placed on database resources so that other users could perform queries in simultaneous sessions; this may not be necessary if the appropriate parameters could be found and set (such as the SET AUTOCOMMIT command). User environment The environment variables of a process which communicates with Ingres must be set correctly, or Ingres may not run correctly (or at all). The specific variables are as follows: PATH=/ing6.3/ingres/utility:/ing6.3/ingres/bin:/bin:/etc:/usr/bin:/usr/ucb:/usr/lib:/ usr/local/bin:/ing6.3/home/bin:. TERM_INGRES=hft II_SYSTEM=/ing6.3 Differences between Sybase and Ingres The SQL language support for the two database packages is similar, but not identical. Our code used a few features that had to be changed when porting between systems. C-language support: The Sybase application environment consists of a set of routines called the DB-Library, which supplies C-callable routines such as dbopen(), dblogin(), and dbsqlexec(); most accept null-terminated strings as arguments. There is a very general mechanism for passing complex queries and extracting the results without worrying about data types. Ingres is based on embedded SQL statements which are converted to C function calls by a preprocessor. This makes source code easier to understand for the knowledgeable SQL user, but makes it harder to find syntax errors since line numbers reported by the C compiler do not match the input to the Ingres preprocessor. Data-type conversion facilities exist but are somewhat cumbersome, requiring CASE statements in the application code to handle all the possible field types. Both Sybase and Ingres supply good error-handling facilities which allow an application routine to report errors back to the client process. String delimiters: Sybase allows either the single-quote (') or the double-quote (") as a string delimiter; Ingres does not accept the use of the double quote. This is important for our application, since many anatomical terms contain the single quote (as in "Broca's area") and using double-quotes in automated query processing reduces the amount of handling necessary. When using single- quotes as delimiters for term names in the server code, the terms must be checked for embedded single-quotes and replaced with two single-quotes. Function calls: Sybase Ingres purpose lower() lowercase() convert string to lowercase upper() uppercase() convert string to uppercase rtrim() trim() trim trailing blanks from a string Data import/export: Using BCP to move data in and out of Sybase is very fast; on the order of several hundred rows per second. This is a generalized import/export facility that apparently has no equivalent in Ingres; the database administrator must write a C program to import data into the database from text files. I have made a program which can read the output generated from BCP and move the data into an Ingres table, but it is slow (perhaps 50 rows per second) and only allows two data types: INTEGER and CHAR. Other fields can be automatically skipped, if necessary. Performance: Ingres is markedly slower when querying a large table with an unindexed column in the WHERE clause. For example, looking for a list of terms with a given "stype" field, or applying the lowercase() function to an indexed field, gives very slow lookups on tables containing more than a few hundred rows. Ingres takes about 15 seconds in this situation, compared to about 1-2 seconds in Sybase. Gaining acceptable performance required the creation of a nonunique index on every field that can appear in a WHERE clause, and not using conversion functions there. Reliability: Both database systems seem to be bulletproof from a data integrity standpoint; that is, no data has ever been lost or corrupted. In one instance, the Ingres-based server returned empty strings occasionally instead of real data, but whether this was the fault of Ingres or of our server code could not be determined. Conclusions Both Ingres and Sybase appear to be well supported, and both perform solidly; neither shows any evidence of the kinds of bugs and performance problems associated with PC- and Macintosh-based database software. Problems, when observed, tend to be confined to a single session and can be resolved quickly by restarting the software. Of the two, Sybase appears to have the edge in utility support (text file import/export using BCP), language flexibility (string delimiters), and performance (more graceful degradation when using the LOWER() function). However, since it limits the number of simultaneous connections allowed, we have had to use both packages for our work on the NLM contract.