Database Portability (Parts 1 and 2)

Overview

Via John Lim of AdoDB fame, we find that Daniel Convissor, current steward of the PEAR DB package (his work has been both solid and spectactular there), has put up a great presentation about database portability. I’d like to follow up on his posting with my own opinions on this topic.

When we talk about database “portability” in PHP, we need to discuss three separate concepts.

  1. API portability
  2. Data type portability
  3. Command and service portability

I’ll address points 1 and 2 in this posting, and point 3 in a later one.

A little teaser: what’s going to really bake your noodle is what I think about data type portability. Skip ahead if you like.

API portability

PHP has a different application programming interface (API; i.e., the function calls with their parameter and return signatures) for each of its supported databases. The calls for MySQL are different from those for PostgreSQL, which are different from Microsoft SQL, and so on. This makes is difficult to switch a codebase from one RDBMS backend to another. For example, if you hard-code for MySQL and want to convert to PostgreSQL, you would need to convert all the mysql_*() calls to pgsql_*() calls, and then convert the parameters and check the returns differently.

The benefit of an API portability (or abstraction) layer is that you no longer use the native PHP calls for database interface. Instead, you use an abstracted set of methods or functions that work exactly the same way for every database backend. Then, if you need to change from MySQL to PostgreSQL, essentially all you need to do is change a config file value from “mysql” to “pgsql”.

Until recently, API portability was handled almost exclusively by userland code such as AdoDB, PEAR DB, Metabase/MDB, ezSQL, and so on. But with the advent of PDO, database APIs are being unified centrally at the “internal” PHP level, so that aspect of the userland code will quickly become irrelevant.

Data Type Portability

However, the API only deals with the PHP code itself. Different database backends have their own, frequently incompatible, data types. You know: CHAR, VARCHAR, INT, DATE, TIME, TIMESTAMP, and so on. The problem is that a VARCHAR in one system may have a max of 255 characters, and in another have a max of 8000 chararacters. DECIMAL in one may mean the same thing as NUMBER in another. Especially pernicious are the date and time types; each backend has different storage requirements and min/max limits.

The vagaries and pitfalls of disparate data types show up in number of common operations. Some of them are …

  • When using CREATE: The data types and their parameters are not the same from system to system, so you can’t just issue “CREATE TABLE table_name (id INT4)” and expect it to work everywhere.
  • When using SELECT: Writing “SELECT * FROM table_name WHERE date_field = ‘20041224’” would make no sense to a system that stores dates with the dashes or in a non-ISO-standard format.
  • When processing results: Again, the backend may not return dates and times in an expected format, which makes it hard to use or update that information.

The common solution for this is to build your SQL queries so that the vagaries of the particular backend are handled by a PHP method that adds the proper SQL for you. For example, using something like MDB, you would not write a SELECT in plain text; instead, you would do something like this:


// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
	" WHERE date_field = " .
    $mdb->getDateValue('2004-12-24');

… and MDB would convert the date into a format expected by the database backend. (Not sure about the specifics; I am sure Lukas will correct me post haste. 😉

Frankly, that kind of thing drives me nuts.

And in turn, I am about to drive a lot of formal database gurus nuts themselves.

If you are a database professional you might not want to read any further, as my solution to the data type abstraction problem (from a database optimization perspective) is inefficient and non-standard and disrespectful and will make you want to claw your eyes out. But from a programmer’s perspective, someone who just wants his app to work the same way everywhere, this is pretty simple.

Still with me? OK.

Instead of asking the database how it expects to store dates and times, we will force it to store them the way we want. My projects force the database store dates and times as character fields in ISO format. Thus, a date is always created as CHAR(10) and stored as “yyyy-mm-dd”. A time is always created as CHAR(8) and stored as “hh:ii:ss”. I do this in DB_Table and in Solar_Sql_Entity from the Solar project, with great success.

The beauty of this is that you don’t need a method to convert to or from anything, ever. Writing a query on a date? Use ISO format in plain text; no need for a method to change from one format to another.


// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
    " WHERE date_field = '2004-12-24'";

Processing results? The date always comes back the same way without any special considerations or intercessor methods; it’s just a character string, same as any other, in standard ISO format for dates.

(I hear the database gurus screaming at me even now. Take a breath, then look at SQLite and how it does not even *have* strict data types; the typing is a hint, not much more, to SQLite. This is not a bad thing, in my eyes.)

Another advantage of this forcing of data types is that, if you convert from MySQL to PostgreSQL, your export and import SQL commands will be transparent: no need to convert from one database format to another for dates and times; they’re just strings.

“But what if I want to use my database’s date and time SQL functions in my queries?” Well, if you want to use SQL that is specific to your database, then you just lost portability, and this essay is about portability. So it’s not a problem: just don’t use those commands, because they are unlikely to work in any other backend.

This brings us to the last bit of portability: the SQL command set and the services provided by different RDBMS backends. Look for that in an upcoming entry.

Update: The conclusion of this essay is here.

Update (2005-06-13): Updated the link to Daniel Convissor’s presentation.

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving your codebase, all while keeping it running the whole time.

5 thoughts on “Database Portability (Parts 1 and 2)

  1. Hey Paul. I’m writing a preference system where I’m storing multiple types of values in a table with the following structure:

    CREATE TABLE prefs (
    key VARCHAR(50) NOT NULL,
    value VARCHAR(255) NOT NULL,
    owner_id INT);

    Its similar to your straight text approach, except I’m taking it one step further. Prior to any data being stored in the value field, I’m running it through serialize(). You could take the same approach you’re using and run the data through serialize(), then when you application unserialize()’s it on the other end, it would be ready to use. This becomes extremely useful when you’re using an OO-approach. Your datetime example above could come straight out of the database as a DateTime object of your creation.

    Of course, that ties you down to PHP, but if that’s where you’re intending to use the data. And I imagine it wouldn’t be that difficult to write an unserialize() method to port it to other languages.

  2. Hi, Travis — neat idea for storing objects, but not so hot if you want to do a select-by-date (or even an order-by-date) … the serialized object, while *very* useful once unserialized by PHP, is not that useful when constructing SQL statements. But storing prefs that way, perhaps with some query-able fields as hinting, is pretty cool. 🙂

  3. That would be more prohibitive. However, you could always construct your statements so the data in the WHERE clause is serialize()’d prior to send it to the database server.

    $stmt = sprintf(‘SELECT name FROM table WHERE date = “%s”‘,
    $dao->escape($dateObj->serialize()));

    Of course, if you’re using a builder of some sort to build the query for you, the code would be:

    $stmt->addWhere($dateObj);

    And $stmt->execute() would know that any thing added via addWhere() has a serialize() method – class hinting would be the thing here – require a Serializable interface in addWhere. I may play around with this a bit more – sounds like it would be an interesting way to store data. I’m curious how much (if any) of a performance hit this would cause.

  4. As I refractor this in my head I’m seeing the possible issue – searching for ranges. Unless you create some method for storing the object inside a table unto itself (i.e. DateObj in DateObj-Table) – it wouldn’t be feasible. Even thing, the complexity could easily get out of hand.

    Now if you had a table that could take a class construct via ReflectionClass and automatically create a table to break the object apart and store it. You would almost have to build a query language to handle searching though. Something like new SerializedSearch(‘DateObj’, ‘year = “2005”‘);

    It could all work, just one of those “would it really be useful” things… Still – it’s interesting to think about 🙂

  5. Travis I believe the correct thing to do is to build the tables like normal and then add an extra field for storing the object.

    However if you really want to do it your way a possible workaround is to add a meta field you can use for ordering, if one field doesn’t solve the complexity you could build a seperate table that contains structure for each different type and have your table just be a masterrecord of all info…

    Like for a tree struc…
    http://fungus.teststation.com/~jon/treehandling/TreeHandling.htm

    Doing this with a OR/M might be a good idea depending on preformance. So now instead of searching the masterrecord your searching a path and you hook up a callback in the OR/M record to pull objects as their needed? If any record is needed right away you could also do a JOIN or something…

    just a thought, nobody will probably read this… the internet gobbles up more of my words. 😀

Leave a Reply

Your email address will not be published. Required fields are marked *