Database Portability (Part 4)

This is part 4 in a 3-part series (take that, Douglas Adams ;-). You can see parts 1 and 2 here and part 3 here.

A little background: the reason I am so concerned portability is the Solar project. One of the stated goals for Solar_Sql (the RDBMS abstraction layer) is that it should run exactly the same way on every supported database. Right now, that list includes Frontbase, Microsoft SQL, MySQL, PostgreSQL, and SQLite.

Case-Sensitivity in WHERE Clauses

Let’s say we have a table with a column named “the_field” defined as CHAR(2). In the table, we have every possible combination of two letters, picked from the set of all upper-case and lower case letters (i.e., “A-Z” and “a-z”, for a total of 26^2 rows in the table). For example, the “A” combinations are represented by four rows: “aa”, “aA”, “Aa”, and “AA”.

In theory, if you wanted to select the row with “aa” (where both are lower case), you would issue this SELECT statement:

SELECT * FROM the_table WHERE the_field = ‘aa’

How many rows will you get back? You should only get one: the “aa” row. That’s true if the database is PostgreSQL, Frontbase, or SQLite; however, in Microsoft SQL and MySQL, you get four (“aa”, “aA”, “Aa”, and “AA”). That’s because searches in those two RDBMSes are *not* case-sensitive on CHAR and VARCHAR fields. This fails to meet my requirement that Solar_Sql return the same results regardless of the backend.

I think the solution will be to re-define the native type used for “char” and “varchar” fields in those two databases. It turns out that in MySQL, a “CHAR(n) BINARY” and “VARCHAR(n) BINARY” both retain case-sensitivity in searches. In Microsoft SQL, the same appears to be true for “BINARY” (vice “CHAR”) and “VARBINARY” (vice “VARCHAR”).

Does anyone see a problem with using the binary types instead of the plain character types in those two databases, specifically for purposes of case-sensitivity? My appreciation will extend to buying you a beer next time we’re in the same town. 🙂

Solar 0.0.5 Released

Solar is a simple object library and application repository for PHP5. Solar stresses comprehensibility and simplicity; reusable code should be easy to understand, and that’s what I strive for in Solar. It is written with worldwide application distribution in mind, so things like localization are built in from the start.

This release includes a new and very important component: the “Solar_App” class. It takes some superficial hints from Ruby on Rails (but is definitely not comparable to RoR in any meaningful sense). Essentially, Solar_App is an application controller class; when you extend it (e.g. to Solar_App_Bugs, the proof-of-concept bug tracker for Solar) it looks for subdirectories called “controllers”, “models”, “views”, and “helpers”. (So it looks like I have started to adopt “real” MVC methodology after years of shunning it.)

It’s easy enough to define new controller actions: you put your controller scripts in “controllers”, and for each script you can call an action. For example, if you have “edit.php” in controllers/, Solar_App will call it if the “action” URL variable is “edit” (“http://example.com/index.php?action=edit”). There’s a little more to it than that, of course, but the source code is thoroughly commented and should provide a good full-up example for now (it is a proof-of-concept, after all).

One nice thing about distributing apps this way is that they never have to leave the Solar directory itself; you can just create a simple PHP file (5-6 lines) to start Solar, get the application object, and call its output() method; the entire application sits outside the web root. Pretty nifty; you can see an example in the Solar_App_Bugs package (included in the full Solar package).

The full change notes are:

* This is a public development release, and is not yet stable.

* Added Solar_Super class for retrieving superglobals, modified Solar.php to use it

* Added Solar::cookie() method for retrieving cookie values.

* Added Solar::session() method for retrieving session data.

* Added new Solar_App application controller class (with standard directory structure using an MVC architecture inspired by Ruby-on-Rails)

* Solar_User_Auth ‘action*’ keys and values renamed to ‘op*’ (Solar_App now owns the ‘action’ keyword)

* Added validation for ‘new’ and ‘edit’ forms in Solar_Cell_Bugs

* Added Solar_App_Bugs proof-of-concept application based on Solar_App

You can download this release, read the API docs, or join the mailing list — just visit http://solarphp.com.

Database Portability (Part 3)

(This is part 3 of a 3-part essay; you can view parts 1 and 2 here.)

Commands and Services

The last major concept related to database portability in PHP is itself composed of two related pieces: the SQL command set presented by the RDBMS backend, and the additional services proved by the backend (sequences, stored procedures, views, triggers, and so on). They’re pretty closely related in my opinion, because the services are addressed through the SQL command set for that RDBMS.

The LIMIT Clause

One very important command clause for SELECT statements, the LIMIT (or TOP) clause, can be radically different between databases. For example, Microsoft SQL has no support at all for the LIMIT clause, and its native TOP clause does not support offsets. Thus, a “SELECT * FROM table_name LIMIT 10,50” clause in MySQL (i.e., return 10 rows starting at the 50th row of the results) has no direct correlate in Microsoft SQL (the best you can do with its native SQL command set is “SELECT TOP 60 FROM table_name” and then, when looping through the results, skip the first 50, which sucks for large result sets).

The only way I have found to emulate LIMIT clauses in a reasonable way is the method employed by PEAR DB and MDB; I think AdoDB uses this method as well. Essentially, you pass a SELECT statement into a function and specify the count and offset as function parameters; the function then adds the proper LIMIT or TOP clause to the statement for you (with respect to the specified RDBMS backend). For example, instead of this …

// $db is PEAR DB object
$result = $db->query(“SELECT * FROM table_name LIMIT 10,50”);

… you would do this:

// $db is PEAR DB object
$result = $db->limitQuery(“SELECT * FROM table_name”, 10, 50);

In Solar_Sql, I have streamlined this somewhat so that you can execute the command with bound parameters (in preparation for PDO) and LIMIT count/offset values; the same exec() method is used regardless of whether or not a LIMIT is to be added:

// $sql is a Solar_Sql object
$result = $sql->exec(“SELECT * FROM table_name”, $bind_params, 10, 50);

SQL Functions

Some seemingly standard SQL functions, such as SUBSTRING and CONCAT, are different from backend to backend. This means that you must either code your SQL statments *not* to use such functions, or come up with a way to abstract them out (either with a search-and-replace on your SQL command strings, or with PHP functions).

AdbDB uses the latter method; that is, it abstracts the SQL CONCAT() function, along with other functions, using a related PHP function call. For example, instead of this …

$stmt = “SELECT CONCAT(‘field1’, ‘field2’, ‘field5’)” .
” AS combo FROM table_name”;

… you would use the concat() method to make sure the proper SQL for your backend RDBMS is generated:

$stmt = “SELECT ” . $adodb->concat(‘field1’, ‘field2’, ‘field5’) .
” AS combo FROM table_name”;

While I am unhappy with this way of doing things, I don’t see a less error-prone replacement. The CONCAT() function (and other SQL functions) can be used in almost any SQL command, whether SELECT, INSERT, UPDATE, etc. Attempting to do a search-and-replace for a function placeholder in the plain text of the statement means you need to test and see if the placeholder is literal text to be inserted or updated (as in the text of an example for using the command), or if the placeholder is in fact standing outside quotes and is intended as a function call. As such, using a method to insert the proper SQL seems the most reasonably foolproof way to abstract such SQL-level function calls.

Services

Besides the SQL command set and common SQL functions, we have additional “services” provided by the particular RDBMS that may not be available in other systems. For example, MySQL uses autoincrementing fields to generate unique sequential values for a column, while most other databases use “sequences” for such values (a “sequence” is a special service that increments values independently of the tables in the database).

Similarly, some RDBMSes provide access to stored procedures (these are essentially compiled sets of SQL logic that are invoked by name, much like user-defined functions in PHP) that can be called automatically by certain triggers (ON UPDATE, ON DELETE, and so on).

Indexes are handled differently in some backends. PostgreSQL in particular is touchy about index names; index names on different tables may not share the same name, so if you have table1.id column and a table2.id column, the table1 index named “id” will conflict with the table2 index named “id”. As such, to make your application portable to PostgreSQL, you have to make sure that index names across all tables in the database are unique; Solar_Sql_Entity handles this by prefixing every index name with the table name and suffixing with “_idx”.

Finally, not all database backends are transaction-aware. Transactions allow you to attempto make changes to the database, and if the changes only partly succeed, you can roll back to a known good state, thus preserving data integrity. Some MySQL table types do not support transactions at all. Some PHP transaction commitment functions will commit for all pending statements on the connection, which means that if two separate sets of SQL statments are being handled on the same connection, both will be committed to the database; this makes resource sharing problematic in some cases.

These services, and their related SQL commands, cannot be depended on to behave similarly (or even exist) across all database backends. As such, you need to reduce your expectations to the minimum that can possibly be supported by all RDBMSes. Stored procedures, when supported, are signigicantly different in their implementation between RDBMSes. Triggers, while often similar between backends, are different enough to warrant special treatment in a portability layer; in addition, triggers often depend on stored procedures, which are themselves not very portable.

As such, it appears the only service you can depend on in regular practice is sequences, whether native or emulated. Again, PEAR DB, MDB, and AdoDB have similar solutions. Instead of depending on MySQL’s autoincrement as a native column option, a sequence (or separate sequence table) is generated; you then must acquire the sequence number separately from the SQL command. For example, the following “standard” MySQL command to insert an autoincremented value …

// the ‘id’ column is autoincremented
$stmt = “INSERT INTO table_name (id) VALUES (NULL)”;

… you would get a sequence value, then insert that value explicitly.

// $db is a PEAR DB object
$id_value = $db->nextSequence(‘sequence_name’);
$stmt = “INSERT INTO table_name (id) VALUES (‘$id_value’)”;

Regarding transactions, it may be best to turn them off entirely (i.e., set “autocommit” for each individual statement) so that shared connections do not commit statements being handled by other pieces of PHP code. (This is the mode I use in Solar_Sql.)

Conclusion

While it is relatively easy to create an API portability layer for PHP to access different databases, no RDBMS supports the exact same set of column types, SQL commands, and services. As such, database portability generally means that you have to limit your coding to a lowest common denominator of RDBMS power.

So why attempt database portability at all? For most in-house applications, it may not make sense, especially when speed and specific backend functions are of paramount importance. However, for many applications, especially applications intended for worldwide distribution into heterogenous environmments, coding for portability will reduce debugging problems and make it easy for end-users to install and maintain your application regardless of their particular database backend. While you will be sacrificing some level of power at the SQL level, you will be able to depend on the database results to be exactly the same, regardless of which RDBMS is actually in use, at the PHP level.

Database Portability (Parts 1 and 2)

Overview

Via John Lim of AdoDB fame, we find that Daniel Convissor, current steward of the PEAR DB package (his work has been both solid and spectactular there), has put up a great presentation about database portability. I’d like to follow up on his posting with my own opinions on this topic.

When we talk about database “portability” in PHP, we need to discuss three separate concepts.

  1. API portability
  2. Data type portability
  3. Command and service portability

I’ll address points 1 and 2 in this posting, and point 3 in a later one.

A little teaser: what’s going to really bake your noodle is what I think about data type portability. Skip ahead if you like.

API portability

PHP has a different application programming interface (API; i.e., the function calls with their parameter and return signatures) for each of its supported databases. The calls for MySQL are different from those for PostgreSQL, which are different from Microsoft SQL, and so on. This makes is difficult to switch a codebase from one RDBMS backend to another. For example, if you hard-code for MySQL and want to convert to PostgreSQL, you would need to convert all the mysql_*() calls to pgsql_*() calls, and then convert the parameters and check the returns differently.

The benefit of an API portability (or abstraction) layer is that you no longer use the native PHP calls for database interface. Instead, you use an abstracted set of methods or functions that work exactly the same way for every database backend. Then, if you need to change from MySQL to PostgreSQL, essentially all you need to do is change a config file value from “mysql” to “pgsql”.

Until recently, API portability was handled almost exclusively by userland code such as AdoDB, PEAR DB, Metabase/MDB, ezSQL, and so on. But with the advent of PDO, database APIs are being unified centrally at the “internal” PHP level, so that aspect of the userland code will quickly become irrelevant.

Data Type Portability

However, the API only deals with the PHP code itself. Different database backends have their own, frequently incompatible, data types. You know: CHAR, VARCHAR, INT, DATE, TIME, TIMESTAMP, and so on. The problem is that a VARCHAR in one system may have a max of 255 characters, and in another have a max of 8000 chararacters. DECIMAL in one may mean the same thing as NUMBER in another. Especially pernicious are the date and time types; each backend has different storage requirements and min/max limits.

The vagaries and pitfalls of disparate data types show up in number of common operations. Some of them are …

  • When using CREATE: The data types and their parameters are not the same from system to system, so you can’t just issue “CREATE TABLE table_name (id INT4)” and expect it to work everywhere.
  • When using SELECT: Writing “SELECT * FROM table_name WHERE date_field = ‘20041224’” would make no sense to a system that stores dates with the dashes or in a non-ISO-standard format.
  • When processing results: Again, the backend may not return dates and times in an expected format, which makes it hard to use or update that information.

The common solution for this is to build your SQL queries so that the vagaries of the particular backend are handled by a PHP method that adds the proper SQL for you. For example, using something like MDB, you would not write a SELECT in plain text; instead, you would do something like this:


// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
	" WHERE date_field = " .
    $mdb->getDateValue('2004-12-24');

… and MDB would convert the date into a format expected by the database backend. (Not sure about the specifics; I am sure Lukas will correct me post haste. 😉

Frankly, that kind of thing drives me nuts.

And in turn, I am about to drive a lot of formal database gurus nuts themselves.

If you are a database professional you might not want to read any further, as my solution to the data type abstraction problem (from a database optimization perspective) is inefficient and non-standard and disrespectful and will make you want to claw your eyes out. But from a programmer’s perspective, someone who just wants his app to work the same way everywhere, this is pretty simple.

Still with me? OK.

Instead of asking the database how it expects to store dates and times, we will force it to store them the way we want. My projects force the database store dates and times as character fields in ISO format. Thus, a date is always created as CHAR(10) and stored as “yyyy-mm-dd”. A time is always created as CHAR(8) and stored as “hh:ii:ss”. I do this in DB_Table and in Solar_Sql_Entity from the Solar project, with great success.

The beauty of this is that you don’t need a method to convert to or from anything, ever. Writing a query on a date? Use ISO format in plain text; no need for a method to change from one format to another.


// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
    " WHERE date_field = '2004-12-24'";

Processing results? The date always comes back the same way without any special considerations or intercessor methods; it’s just a character string, same as any other, in standard ISO format for dates.

(I hear the database gurus screaming at me even now. Take a breath, then look at SQLite and how it does not even *have* strict data types; the typing is a hint, not much more, to SQLite. This is not a bad thing, in my eyes.)

Another advantage of this forcing of data types is that, if you convert from MySQL to PostgreSQL, your export and import SQL commands will be transparent: no need to convert from one database format to another for dates and times; they’re just strings.

“But what if I want to use my database’s date and time SQL functions in my queries?” Well, if you want to use SQL that is specific to your database, then you just lost portability, and this essay is about portability. So it’s not a problem: just don’t use those commands, because they are unlikely to work in any other backend.

This brings us to the last bit of portability: the SQL command set and the services provided by different RDBMS backends. Look for that in an upcoming entry.

Update: The conclusion of this essay is here.

Update (2005-06-13): Updated the link to Daniel Convissor’s presentation.

PDO at wiki.cc

You can see the PDO Basics entry at wiki.cc here. PDO unifies the various PHP RDBMS function calls; database API abstraction in PHP is a Good Thing.

But PDO brings something I did not expect: table instrospection. You can get meta-information about the statement results that tells you what the native type, size, precision, etc. are for the returned columns. This can make automated type-checking and conversion super-easy. Just one more reason I was lucky to think of making Solar PDO-ready.

Hey, PDO guys: neat stuff, and thanks!

Quick Hits: DB_Table, Text_Wiki

I just released the first 1.0.0 stable version of DB_Table, and released RC2 of Text_Wiki earlier today. That’s another two things I don’t need to think about any more. 🙂

Now that I think about it, it’s been a good couple of days for getting software out the door. I guess this is one way to spend my spring break. 😉

Savant3: Second Development Release

The second development revision of the Savant template system version 3 is now available; you can get it from the usual location here. The change notes are:

* second development release

* error codes are now strings, not integer constants (e.g., ‘ERR_UNKNOWN’ instead of -1).

* added ‘freeze’ property to Savant3_Plugin_form

* added auto() and fullauto() methods to Savant3_Plugin_form

* changed element description arrays in Savant3_Plugin_form

* all plugins using htmlspecialchars() to filter output now use Savant3::scrub plugin instead

* added error resource for Solar errors

* updated comments throughout

As always, if you find any problems, please let me know via the Savant mailing list.

Please note that Savant2 is still being actively maintained for PHP4 and non-strict PHP5. In fact, I want to backport the Savant3 version of the ‘form’ plugin to Savant2; that plugin is just too good to be limited to PHP5.

Solar 0.0.4 released

I released a new revision of Solar today; we’re at 0.0.4 now. (UPDATE (2004-03-07): I failed to put the tarball on the Solar website earlier; it is now in place. Thanks to Jens for pointing it out.)

Solar is still in development and as such is unstable. However, I am continuing work on the first proof-of-concept application, a simple bug tracker, which combines the generic commenting application component (called a “Cell” in Solar parlance, in this case “Solar_Cell_Talk”) with the specific data maintained by the Solar_Cell_Bugs component. All it lacks is edit restrictions based on user name and user role.

The change notes for this revision are:

* This is a public development release, and is not yet stable.
* Added Solar_Form class (Solar/Form.php) for form element collation and
  input validation (not for output or rendering)
* Added Solar_Locale class (Solar/Locale.php)
* Solar.php:
  * Modified locale() method signature
  * Added public static $locale object
  * Removed Solar::setLocale()
  * Properties $config and $shared are now public
* Solar_Base: modified to use new locale structure
* Solar_Sql:
  * dropped support for Oracle (30-char limit on names is just too strict)
  * max table name is now 30 chars, max col/idx name is now 27 chars
  * double underscores not allowed in table, index, or column names
    (to prevent name collisions)
  * indexes now suffixed with '__idx', sequences with '__seq'
  * in index names, the table/index separator is now '__'
  * Converted fetch(), fetchNum(), and free() methods to static (now
    there's no need to pass a reference to the driver into the result class)
* Solar_Sql_Result: modified to make static calls to Solar_Sql for free(),
  fetch(), and fetchNum()
* Various modifications to the Solar_Cell classes for Bugs and Talk to go
  with Solar_Sql changes
* Various schema changes to the Solar_Cell classes for Bugs and Talk

Solar is a “simple object library and application repository” for PHP5. It is designed for developers who wnat to publish their code worldwide in heterogenous environments.

For example, the database drivers are highly abstracted and are intended to work identically on all supported RDBMS backends (including date and time representations). In addition, localization is built in from the start, so it is a simple matter to create translated strings and formats for your applications.

Above all, I am striving to make the Solar codebase comprehensible and intelligble for all adoptive developers, whether new or experienced, with as little “tricky stuff” as possible and verbose commenting throughout.

Updates to Savant Site

While there is no new documentation, I have converted the backend to its own database so I can use YaWiki to manage the existing Savant2 documentation in parallel with the new Savant3 documentation. (The Savant3 pages are lamentably lacking, with only a home page, but that should change as I move it closer to a stable release.)

Savant is a lightweight, object-oriented template system for PHP. I call it the simple, elegant, and powerful alternative to Smarty. Savant2 was written for PHP4; while it works with PHP5 out of the box, it is not E_STRICT compliant. Savant3 *is* E_STRICT compliant, sports a nicer plugin system using __call() magic, and comes with a modified set of default plugins.