Database Portability (Part 3)

(This is part 3 of a 3-part essay; you can view parts 1 and 2 here.)

Commands and Services

The last major concept related to database portability in PHP is itself composed of two related pieces: the SQL command set presented by the RDBMS backend, and the additional services proved by the backend (sequences, stored procedures, views, triggers, and so on). They’re pretty closely related in my opinion, because the services are addressed through the SQL command set for that RDBMS.

The LIMIT Clause

One very important command clause for SELECT statements, the LIMIT (or TOP) clause, can be radically different between databases. For example, Microsoft SQL has no support at all for the LIMIT clause, and its native TOP clause does not support offsets. Thus, a “SELECT * FROM table_name LIMIT 10,50” clause in MySQL (i.e., return 10 rows starting at the 50th row of the results) has no direct correlate in Microsoft SQL (the best you can do with its native SQL command set is “SELECT TOP 60 FROM table_name” and then, when looping through the results, skip the first 50, which sucks for large result sets).

The only way I have found to emulate LIMIT clauses in a reasonable way is the method employed by PEAR DB and MDB; I think AdoDB uses this method as well. Essentially, you pass a SELECT statement into a function and specify the count and offset as function parameters; the function then adds the proper LIMIT or TOP clause to the statement for you (with respect to the specified RDBMS backend). For example, instead of this …

// $db is PEAR DB object
$result = $db->query(“SELECT * FROM table_name LIMIT 10,50”);

… you would do this:

// $db is PEAR DB object
$result = $db->limitQuery(“SELECT * FROM table_name”, 10, 50);

In Solar_Sql, I have streamlined this somewhat so that you can execute the command with bound parameters (in preparation for PDO) and LIMIT count/offset values; the same exec() method is used regardless of whether or not a LIMIT is to be added:

// $sql is a Solar_Sql object
$result = $sql->exec(“SELECT * FROM table_name”, $bind_params, 10, 50);

SQL Functions

Some seemingly standard SQL functions, such as SUBSTRING and CONCAT, are different from backend to backend. This means that you must either code your SQL statments *not* to use such functions, or come up with a way to abstract them out (either with a search-and-replace on your SQL command strings, or with PHP functions).

AdbDB uses the latter method; that is, it abstracts the SQL CONCAT() function, along with other functions, using a related PHP function call. For example, instead of this …

$stmt = “SELECT CONCAT(‘field1’, ‘field2’, ‘field5’)” .
” AS combo FROM table_name”;

… you would use the concat() method to make sure the proper SQL for your backend RDBMS is generated:

$stmt = “SELECT ” . $adodb->concat(‘field1’, ‘field2’, ‘field5’) .
” AS combo FROM table_name”;

While I am unhappy with this way of doing things, I don’t see a less error-prone replacement. The CONCAT() function (and other SQL functions) can be used in almost any SQL command, whether SELECT, INSERT, UPDATE, etc. Attempting to do a search-and-replace for a function placeholder in the plain text of the statement means you need to test and see if the placeholder is literal text to be inserted or updated (as in the text of an example for using the command), or if the placeholder is in fact standing outside quotes and is intended as a function call. As such, using a method to insert the proper SQL seems the most reasonably foolproof way to abstract such SQL-level function calls.

Services

Besides the SQL command set and common SQL functions, we have additional “services” provided by the particular RDBMS that may not be available in other systems. For example, MySQL uses autoincrementing fields to generate unique sequential values for a column, while most other databases use “sequences” for such values (a “sequence” is a special service that increments values independently of the tables in the database).

Similarly, some RDBMSes provide access to stored procedures (these are essentially compiled sets of SQL logic that are invoked by name, much like user-defined functions in PHP) that can be called automatically by certain triggers (ON UPDATE, ON DELETE, and so on).

Indexes are handled differently in some backends. PostgreSQL in particular is touchy about index names; index names on different tables may not share the same name, so if you have table1.id column and a table2.id column, the table1 index named “id” will conflict with the table2 index named “id”. As such, to make your application portable to PostgreSQL, you have to make sure that index names across all tables in the database are unique; Solar_Sql_Entity handles this by prefixing every index name with the table name and suffixing with “_idx”.

Finally, not all database backends are transaction-aware. Transactions allow you to attempto make changes to the database, and if the changes only partly succeed, you can roll back to a known good state, thus preserving data integrity. Some MySQL table types do not support transactions at all. Some PHP transaction commitment functions will commit for all pending statements on the connection, which means that if two separate sets of SQL statments are being handled on the same connection, both will be committed to the database; this makes resource sharing problematic in some cases.

These services, and their related SQL commands, cannot be depended on to behave similarly (or even exist) across all database backends. As such, you need to reduce your expectations to the minimum that can possibly be supported by all RDBMSes. Stored procedures, when supported, are signigicantly different in their implementation between RDBMSes. Triggers, while often similar between backends, are different enough to warrant special treatment in a portability layer; in addition, triggers often depend on stored procedures, which are themselves not very portable.

As such, it appears the only service you can depend on in regular practice is sequences, whether native or emulated. Again, PEAR DB, MDB, and AdoDB have similar solutions. Instead of depending on MySQL’s autoincrement as a native column option, a sequence (or separate sequence table) is generated; you then must acquire the sequence number separately from the SQL command. For example, the following “standard” MySQL command to insert an autoincremented value …

// the ‘id’ column is autoincremented
$stmt = “INSERT INTO table_name (id) VALUES (NULL)”;

… you would get a sequence value, then insert that value explicitly.

// $db is a PEAR DB object
$id_value = $db->nextSequence(‘sequence_name’);
$stmt = “INSERT INTO table_name (id) VALUES (‘$id_value’)”;

Regarding transactions, it may be best to turn them off entirely (i.e., set “autocommit” for each individual statement) so that shared connections do not commit statements being handled by other pieces of PHP code. (This is the mode I use in Solar_Sql.)

Conclusion

While it is relatively easy to create an API portability layer for PHP to access different databases, no RDBMS supports the exact same set of column types, SQL commands, and services. As such, database portability generally means that you have to limit your coding to a lowest common denominator of RDBMS power.

So why attempt database portability at all? For most in-house applications, it may not make sense, especially when speed and specific backend functions are of paramount importance. However, for many applications, especially applications intended for worldwide distribution into heterogenous environmments, coding for portability will reduce debugging problems and make it easy for end-users to install and maintain your application regardless of their particular database backend. While you will be sacrificing some level of power at the SQL level, you will be able to depend on the database results to be exactly the same, regardless of which RDBMS is actually in use, at the PHP level.

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving your codebase, all while keeping it running the whole time.
Share This!Share on Google+Share on FacebookTweet about this on TwitterShare on RedditShare on LinkedIn

2 thoughts on “Database Portability (Part 3)

  1. As always, great stuff Paul.

    One option you left out for converting between different commands (limit/top, for example) was the use of another option. For that matter, an entire query could be constructed of PHP objects:

    $stmt = new SelectSQL();
    $table = new Table(‘tableOne’);
    $stmt->addTable($table);
    $field1 = new Field(‘thisField’, $table);
    $field2 = new Field(‘thatField’, $table);
    $stmt->addField($field1[, $ordering]);
    $stmt->addField($field2[, $ordering]);
    … etc …

    I haven’t personally abstracted it out that much, but I do use a Filter object to specify where statements. If an ObjectSeach::$_filter is not null, the query asks for its string via toString() and the object takes care of generating itself. The only thing I haven’t done is made them environment aware, but it wouldn’t be that hard to put a constant in and have it calculate what needs to be done where based on that value.

    Plus you get the added benefit of being able to validate the data as its entered. Now Field::__construct() can check to make sure that a valid field name is there (no spaces, quotes, etc.).

  2. Hi, Travis —

    You are absolutely right. One can programmatically build SQL statements that way, and it’s a perfectly valid option, especially for “pure” object-oriented devotees. I believe PEAR DB_DataObject, and perhaps Creole, use that approach to ensure portability. However, that kind of thing is so far away from my own methods that I didn’t even think of talking about it.

    My feelings on that topic: While a lot of people prefer the approach you describe, I find it to be effectively non-intuitive and relatively unclear; perhaps this says more about my own habits than any real limitations to the technique. The furthest I go down that path is to separate the various SQL clauses into their own variables (e.g., $select, $from, $where, and so on) so that I can manipulate the separate pieces at what I think is a reasonable level while still maintaining human readability. (I think it makes for less typing when generating new queries, too.)

    Thanks for the comment, Travis, good catch.

Leave a Reply

Your email address will not be published. Required fields are marked *