Database Portability (Parts 1 and 2)
Overview
Via John Lim of AdoDB fame, we find that Daniel Convissor, current steward of the PEAR DB package (his work has been both solid and spectactular there), has put up a great presentation about database portability. I'd like to follow up on his posting with my own opinions on this topic.
When we talk about database "portability" in PHP, we need to discuss three separate concepts.
- API portability
- Data type portability
- Command and service portability
I'll address points 1 and 2 in this posting, and point 3 in a later one.
A little teaser: what's going to really bake your noodle is what I think about data type portability. Skip ahead if you like.
API portability
PHP has a different application programming interface (API; i.e., the function calls with their parameter and return signatures) for each of its supported databases. The calls for MySQL are different from those for PostgreSQL, which are different from Microsoft SQL, and so on. This makes is difficult to switch a codebase from one RDBMS backend to another. For example, if you hard-code for MySQL and want to convert to PostgreSQL, you would need to convert all the mysql_*() calls to pgsql_*() calls, and then convert the parameters and check the returns differently.
The benefit of an API portability (or abstraction) layer is that you no longer use the native PHP calls for database interface. Instead, you use an abstracted set of methods or functions that work exactly the same way for every database backend. Then, if you need to change from MySQL to PostgreSQL, essentially all you need to do is change a config file value from "mysql" to "pgsql".
Until recently, API portability was handled almost exclusively by userland code such as AdoDB, PEAR DB, Metabase/MDB, ezSQL, and so on. But with the advent of PDO, database APIs are being unified centrally at the "internal" PHP level, so that aspect of the userland code will quickly become irrelevant.
Data Type Portability
However, the API only deals with the PHP code itself. Different database backends have their own, frequently incompatible, data types. You know: CHAR, VARCHAR, INT, DATE, TIME, TIMESTAMP, and so on. The problem is that a VARCHAR in one system may have a max of 255 characters, and in another have a max of 8000 chararacters. DECIMAL in one may mean the same thing as NUMBER in another. Especially pernicious are the date and time types; each backend has different storage requirements and min/max limits.
The vagaries and pitfalls of disparate data types show up in number of common operations. Some of them are ...
- When using CREATE: The data types and their parameters are not the same from system to system, so you can't just issue "CREATE TABLE table_name (id INT4)" and expect it to work everywhere.
- When using SELECT: Writing "SELECT * FROM table_name WHERE date_field = '20041224'" would make no sense to a system that stores dates with the dashes or in a non-ISO-standard format.
- When processing results: Again, the backend may not return dates and times in an expected format, which makes it hard to use or update that information.
The common solution for this is to build your SQL queries so that the vagaries of the particular backend are handled by a PHP method that adds the proper SQL for you. For example, using something like MDB, you would not write a SELECT in plain text; instead, you would do something like this:
// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
" WHERE date_field = " .
$mdb->getDateValue('2004-12-24');
... and MDB would convert the date into a format expected by the database backend. (Not sure about the specifics; I am sure Lukas will correct me post haste. ;-)
Frankly, that kind of thing drives me nuts.
And in turn, I am about to drive a lot of formal database gurus nuts themselves.
If you are a database professional you might not want to read any further, as my solution to the data type abstraction problem (from a database optimization perspective) is inefficient and non-standard and disrespectful and will make you want to claw your eyes out. But from a programmer's perspective, someone who just wants his app to work the same way everywhere, this is pretty simple.
Still with me? OK.
Instead of asking the database how it expects to store dates and times, we will force it to store them the way we want. My projects force the database store dates and times as character fields in ISO format. Thus, a date is always created as CHAR(10) and stored as "yyyy-mm-dd". A time is always created as CHAR(8) and stored as "hh:ii:ss". I do this in DB_Table and in Solar_Sql_Entity from the Solar project, with great success.
The beauty of this is that you don't need a method to convert to or from anything, ever. Writing a query on a date? Use ISO format in plain text; no need for a method to change from one format to another.
// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
" WHERE date_field = '2004-12-24'";
Processing results? The date always comes back the same way without any special considerations or intercessor methods; it's just a character string, same as any other, in standard ISO format for dates.
(I hear the database gurus screaming at me even now. Take a breath, then look at SQLite and how it does not even *have* strict data types; the typing is a hint, not much more, to SQLite. This is not a bad thing, in my eyes.)
Another advantage of this forcing of data types is that, if you convert from MySQL to PostgreSQL, your export and import SQL commands will be transparent: no need to convert from one database format to another for dates and times; they're just strings.
"But what if I want to use my database's date and time SQL functions in my queries?" Well, if you want to use SQL that is specific to your database, then you just lost portability, and this essay is about portability. So it's not a problem: just don't use those commands, because they are unlikely to work in any other backend.
This brings us to the last bit of portability: the SQL command set and the services provided by different RDBMS backends. Look for that in an upcoming entry.
Update: The conclusion of this essay is here.
Update (2005-06-13): Updated the link to Daniel Convissor's presentation.