Paul M. Jones

Database Portability (Parts 1 and 2)

Overview

Via John Lim of AdoDB fame, we find that Daniel Convissor, current steward of the PEAR DB package (his work has been both solid and spectactular there), has put up a great presentation about database portability. I'd like to follow up on his posting with my own opinions on this topic.

When we talk about database "portability" in PHP, we need to discuss three separate concepts.

API portability
Data type portability
Command and service portability

I'll address points 1 and 2 in this posting, and point 3 in a later one.

A little teaser: what's going to really bake your noodle is what I think about data type portability. Skip ahead if you like.

API portability

PHP has a different application programming interface (API; i.e., the function calls with their parameter and return signatures) for each of its supported databases. The calls for MySQL are different from those for PostgreSQL, which are different from Microsoft SQL, and so on. This makes is difficult to switch a codebase from one RDBMS backend to another. For example, if you hard-code for MySQL and want to convert to PostgreSQL, you would need to convert all the mysql_*() calls to pgsql_*() calls, and then convert the parameters and check the returns differently.

The benefit of an API portability (or abstraction) layer is that you no longer use the native PHP calls for database interface. Instead, you use an abstracted set of methods or functions that work exactly the same way for every database backend. Then, if you need to change from MySQL to PostgreSQL, essentially all you need to do is change a config file value from "mysql" to "pgsql".

Until recently, API portability was handled almost exclusively by userland code such as AdoDB, PEAR DB, Metabase/MDB, ezSQL, and so on. But with the advent of PDO, database APIs are being unified centrally at the "internal" PHP level, so that aspect of the userland code will quickly become irrelevant.

Data Type Portability

However, the API only deals with the PHP code itself. Different database backends have their own, frequently incompatible, data types. You know: CHAR, VARCHAR, INT, DATE, TIME, TIMESTAMP, and so on. The problem is that a VARCHAR in one system may have a max of 255 characters, and in another have a max of 8000 chararacters. DECIMAL in one may mean the same thing as NUMBER in another. Especially pernicious are the date and time types; each backend has different storage requirements and min/max limits.

The vagaries and pitfalls of disparate data types show up in number of common operations. Some of them are ...

When using CREATE: The data types and their parameters are not the same from system to system, so you can't just issue "CREATE TABLE table_name (id INT4)" and expect it to work everywhere.
When using SELECT: Writing "SELECT * FROM table_name WHERE date_field = '20041224'" would make no sense to a system that stores dates with the dashes or in a non-ISO-standard format.
When processing results: Again, the backend may not return dates and times in an expected format, which makes it hard to use or update that information.

The common solution for this is to build your SQL queries so that the vagaries of the particular backend are handled by a PHP method that adds the proper SQL for you. For example, using something like MDB, you would not write a SELECT in plain text; instead, you would do something like this:


// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
	" WHERE date_field = " .
    $mdb->getDateValue('2004-12-24');

... and MDB would convert the date into a format expected by the database backend. (Not sure about the specifics; I am sure Lukas will correct me post haste. ;-)

Frankly, that kind of thing drives me nuts.

And in turn, I am about to drive a lot of formal database gurus nuts themselves.

If you are a database professional you might not want to read any further, as my solution to the data type abstraction problem (from a database optimization perspective) is inefficient and non-standard and disrespectful and will make you want to claw your eyes out. But from a programmer's perspective, someone who just wants his app to work the same way everywhere, this is pretty simple.

Still with me? OK.

Instead of asking the database how it expects to store dates and times, we will force it to store them the way we want. My projects force the database store dates and times as character fields in ISO format. Thus, a date is always created as CHAR(10) and stored as "yyyy-mm-dd". A time is always created as CHAR(8) and stored as "hh:ii:ss". I do this in DB_Table and in Solar_Sql_Entity from the Solar project, with great success.

The beauty of this is that you don't need a method to convert to or from anything, ever. Writing a query on a date? Use ISO format in plain text; no need for a method to change from one format to another.


// sorry about the backslashed quotes, that's WordPress
$statement = "SELECT * FROM table_name" .
    " WHERE date_field = '2004-12-24'";

Processing results? The date always comes back the same way without any special considerations or intercessor methods; it's just a character string, same as any other, in standard ISO format for dates.

(I hear the database gurus screaming at me even now. Take a breath, then look at SQLite and how it does not even *have* strict data types; the typing is a hint, not much more, to SQLite. This is not a bad thing, in my eyes.)

Another advantage of this forcing of data types is that, if you convert from MySQL to PostgreSQL, your export and import SQL commands will be transparent: no need to convert from one database format to another for dates and times; they're just strings.

"But what if I want to use my database's date and time SQL functions in my queries?" Well, if you want to use SQL that is specific to your database, then you just lost portability, and this essay is about portability. So it's not a problem: just don't use those commands, because they are unlikely to work in any other backend.

This brings us to the last bit of portability: the SQL command set and the services provided by different RDBMS backends. Look for that in an upcoming entry.

Update: The conclusion of this essay is here.

Update (2005-06-13): Updated the link to Daniel Convissor's presentation.

Buy Stuff and Save the World

Via Ryan King, we have a link to David Brin and how global capitalism is a force for world peace. To which I say:

Yay captalism! Yay globalism! Yay for the bourgeious elite! Without the rich (for whom sellers compete by lowering prices and increasing quality), goods and services cannot become cheap enough for everyone else; only the wealthy (in a relative sense) can subsidize that development. And with the increase in availability of goods and services and common expectations, what follows is "peace through shared culture."

Incidentally, the best way to help the survivors of natural disasters such as the Christmas Tsunami in the long run is to buy their goods and services, not provide simple aid. In the short term, aid is necessary, but it always dries up. Profitable business is self-supporting.

UPDATE:

Here's a quote from the Brin essay:

We are told that the world is a devastatingly sad and oppressive place, filled with poverty and hopelessness. And, indeed, the raw numbers of people who suffer malnourishment and/or oppression, can only tear at the heart. Ideally, this awareness will spread and make us feel determined to do better.

But the irony is that only people who are relatively satiated can indulge in empathy and feel the pain of others, for whom satiation is but a dream.

In fact, the percentage of human beings who live in some degree of comfort and safety, with secure hope that their increasingly educated children will do better, has been rising spectacularly for two generations. And the principal driver of this change has been the U.S. consumer, purchasing the output of tens of thousands of foreign factories, wherein the same pattern gets repeated from one country to the next.

By historical (or even current third-world) standards, even the poorest of Americans is screamingly wealthy. This is what I mean by "relative" wealth earlier in this post; that you have a roof over your head, comfortable and protective clothing, indoor plumbing with hot and cold running water, easy access to food, and so on -- this places you in the top one percent of all humanity that has ever lived, up with the emperors of Rome, as far as luxury living is concerned. The natural state of Man is naked, cold, hungry, destitute, living in caves, being hunted by tigers and other men -- anything better than that is a good thing.

UPDATE 2:

Brin is responding to Buffett. Cafe Hayek responds as well:

Buffett no doubt believes that dollars held by foreigners represent debt owed by Americans to foreigners. Only in the most irrelevant and formal sense is this belief valid. Each U.S. dollar is a Federal Reserve Note, issued by a U.S. government institution (the Fed) and formally redeemable by that U.S. institution. But redeemable for what? Answer: another Federal Reserve Note of the same value.

In fact, dollars held by foreigners are not debt in any economically relevant sense.

"But can't foreigners spend these dollars on U.S. goods, services, or assets?!" I hear someone (Buffet, perhaps) asking. Of course dollars can be spent in this way; that's the only reason foreigners accept dollars in exchange for the things they sell. But this fact doesn't make dollars debt.

PDO at wiki.cc

You can see the PDO Basics entry at wiki.cc here. PDO unifies the various PHP RDBMS function calls; database API abstraction in PHP is a Good Thing.

But PDO brings something I did not expect: table instrospection. You can get meta-information about the statement results that tells you what the native type, size, precision, etc. are for the returned columns. This can make automated type-checking and conversion super-easy. Just one more reason I was lucky to think of making Solar PDO-ready.

Hey, PDO guys: neat stuff, and thanks!

Quick Hits: DB_Table, Text_Wiki

I just released the first 1.0.0 stable version of DB_Table, and released RC2 of Text_Wiki earlier today. That's another two things I don't need to think about any more. :-)

Now that I think about it, it's been a good couple of days for getting software out the door. I guess this is one way to spend my spring break. ;-)

Savant3: Second Development Release

The second development revision of the Savant template system version 3 is now available; you can get it from the usual location here. The change notes are:

* second development release

* error codes are now strings, not integer constants (e.g., 'ERR_UNKNOWN' instead of -1).

* added 'freeze' property to Savant3_Plugin_form

* added auto() and fullauto() methods to Savant3_Plugin_form

* changed element description arrays in Savant3_Plugin_form

* all plugins using htmlspecialchars() to filter output now use Savant3::scrub plugin instead

* added error resource for Solar errors

* updated comments throughout

As always, if you find any problems, please let me know via the Savant mailing list.

Please note that Savant2 is still being actively maintained for PHP4 and non-strict PHP5. In fact, I want to backport the Savant3 version of the 'form' plugin to Savant2; that plugin is just too good to be limited to PHP5.

Solar 0.0.4 released

I released a new revision of Solar today; we're at 0.0.4 now. (UPDATE (2004-03-07): I failed to put the tarball on the Solar website earlier; it is now in place. Thanks to Jens for pointing it out.)

Solar is still in development and as such is unstable. However, I am continuing work on the first proof-of-concept application, a simple bug tracker, which combines the generic commenting application component (called a "Cell" in Solar parlance, in this case "Solar_Cell_Talk") with the specific data maintained by the Solar_Cell_Bugs component. All it lacks is edit restrictions based on user name and user role.

The change notes for this revision are:

* This is a public development release, and is not yet stable.
* Added Solar_Form class (Solar/Form.php) for form element collation and
  input validation (not for output or rendering)
* Added Solar_Locale class (Solar/Locale.php)
* Solar.php:
  * Modified locale() method signature
  * Added public static $locale object
  * Removed Solar::setLocale()
  * Properties $config and $shared are now public
* Solar_Base: modified to use new locale structure
* Solar_Sql:
  * dropped support for Oracle (30-char limit on names is just too strict)
  * max table name is now 30 chars, max col/idx name is now 27 chars
  * double underscores not allowed in table, index, or column names
    (to prevent name collisions)
  * indexes now suffixed with '__idx', sequences with '__seq'
  * in index names, the table/index separator is now '__'
  * Converted fetch(), fetchNum(), and free() methods to static (now
    there's no need to pass a reference to the driver into the result class)
* Solar_Sql_Result: modified to make static calls to Solar_Sql for free(),
  fetch(), and fetchNum()
* Various modifications to the Solar_Cell classes for Bugs and Talk to go
  with Solar_Sql changes
* Various schema changes to the Solar_Cell classes for Bugs and Talk

Solar is a "simple object library and application repository" for PHP5. It is designed for developers who wnat to publish their code worldwide in heterogenous environments.

For example, the database drivers are highly abstracted and are intended to work identically on all supported RDBMS backends (including date and time representations). In addition, localization is built in from the start, so it is a simple matter to create translated strings and formats for your applications.

Above all, I am striving to make the Solar codebase comprehensible and intelligble for all adoptive developers, whether new or experienced, with as little "tricky stuff" as possible and verbose commenting throughout.

Updates to Savant Site

While there is no new documentation, I have converted the backend to its own database so I can use YaWiki to manage the existing Savant2 documentation in parallel with the new Savant3 documentation. (The Savant3 pages are lamentably lacking, with only a home page, but that should change as I move it closer to a stable release.)

Savant is a lightweight, object-oriented template system for PHP. I call it the simple, elegant, and powerful alternative to Smarty. Savant2 was written for PHP4; while it works with PHP5 out of the box, it is not E_STRICT compliant. Savant3 *is* E_STRICT compliant, sports a nicer plugin system using __call() magic, and comes with a modified set of default plugins.

Should Solar_Sql_Entity Support Oracle?

Solar is a simple object library and application repository for PHP5. It is currently in development. This post refers to the object-relation management tool, Solar_Sql_Entity, which follows closely the table data gateway pattern described by Martin Fowler.

Oracle only allows 30 characters for an object name (table, field, index, sequence). While that by itself may be OK, it seriously restricts the operation of Solar_Sql_Entity. (This class is the natural successor to DB_Table.)

One of the points of Solar_Sql_Entity is to make it possible to create a table that will work on *all* of the supported RDBMS backends. This means that it has to submit to a lowest common denominator in some ways.

For example, in PostgreSQL, your index names must be unique across all tables in the database. For example, if you have a "table1" and "table2", each with an "id" field, it's not enough to specify an index called "id" on table1 and annother index also named "id" on table2; PgSQL sees that as a conflict. You have to make the index names unique even though they refer to different tables.

Thus, to support PgSQL, the index names would have to be "table1_id" and "table2_id". This is why DB_Table and Solar_Sql_Entity prefix the table name automatically to your index name (i.e., to make sure there are no index name conflicts if the schema is created in PgSQL). As an additional identifier, the index name gets "_idx" suffixed to it.

You can see where this is going: because of the 30-char limit in Oracle, this means you're pretty limited in how long your table and field names can be when using Solar_Sql_Entity. Right now, Solar_Sql_Entity limits you to 15 characters for table names and 10 characters for field names, all to make sure we stay under 30 chars for Oracle. This is pretty short, almost too short to be universally useful.

If Solar_Sql_Entity were not to support Oracle, the next highest character limit is 64 (MySQL and MS-SQL). That effectively doubles the available length (~30 chars for table, ~25 chars for field), but at the cost of dropping Oracle support within Solar_Sql_Entity.

Frankly, I'm not sure Oracle is in the "target market" for Solar anyway; the point of Solar is to be provide a toolset to help build apps-for-distribution (although it's fine for in-place apps, too).

Anybody have any feelings about (or experiences with) this kind of issue, one way or another?

28 Feb 1991: Desert Storm Ceasefire

On this day in 1991, after 100 hours of land battle following roughly 5 weeks of air operations, the United States declared a ceasefire in its military campaign to drive Saddam Hussein's invading forces out of Kuwait. That was a good day, but looking back now it was the first major battle of World War IV, and the conclusion to that battle has been accomplished only recently with the sovreignty of a free Iraq. Let's hope it sticks.

Simplifying Automated Form Generation, Validation, and Output

First Praise, Then Critique

HTML_QuickForm is the standard class within PEAR for building forms programmatically. Here's an example:


$form =& new HTML_QuickForm();
$type = 'text';
$name = 'city';
$label = 'Your City:';
$form->addElement($type, $name, $label);

(All XHTML element types are supported, including some pseudo-types such as 'date' and 'time').

When you call $form->display(), the class builds all the HTML for the form and the elements you have added. That's just the start, though. If you have a POST variable called 'city', it will populate the form with that value for you. You can add validation methods to each element, and the class will check the values and print error messages inside the form if the values are not valid. You can apply special processing to the values before retrieving them as an array with $form->export().

It's quite a neat package, but I have some quibbles with its operation. These are not design flaws; indeed, its design seems quite internally consistent. For example, each element is represented internally as an object itself, with its own methods and properties (usually descended from a base element class). For example, to set the value of an individual element, you have to call its setValue() method -- but any submitted values (e.g., in POST) will override that value. There are ways around this, but none of them seem simple or straightforward.

More to the point, while I am not a strict model-view-controller devotee, I find that HTML_QuickForm not only pierces the veil between model and view (or controller and view, depending on your definition), it positively shreds that veil to pieces. Yes, it has renderers that allow you to parse through the internal objectified representatiion of the form elements, but they system seems overly complex.

While I like HTML_QuickForm, and have written code (in DB_Table) that automatically generates forms from tables using it, I don't fully comprehend the methodologies underlying its operation. It seems to me there should be a simpler and more straightforward way of automating form generation, one that is both more comprehensible and more open to MVC separate while retaining the obvious strengths of validation and automated population of values.

Savant and Solar

Note: Savant3 and Solar are publicly available but are still in development. I present the remainder of this article as advance information on how they will interoperate, and while I am using the described methodologies on a continuing basis, the specifics may change in the future.

With the above points in mind, I developed a plugin for Savant2 that accepts an array of element definitions and generates forms from it (called Savant2_Plugin_form for obvious reasons). In Savant3, to be released, I have refined and extended that original idea to be a bit more full-featured (and I intend to backport it to Savant2, so don't worry about missing out). The Savant plugins cover the 'view' half of form generation. For the 'model' (or the 'controller' depending on how you think about these things) portion, I have implemented a complementary system in Solar_Form to collect and manage these element arrays.

The key here is that the form generation happens not in one location, as in HTML_QuickForm, but in two: Solar_Form defines the form elements, and Savant renders it. One benefit of this separation of duties is that neither class cares what the other class is. If you like, you can send a hand-built array of elements to Savant, and it will build the form just the same. Similarly, you can pass a Solar_Form element array to any template system, and if your template system knows how to deal with that array, it can render a form too. The point is that the information about the form is transmitted in a standard format as an array.

Therefore, the format of the element array is of paramount importance. Lucky for us, it is quite simple. For each element in the form, you have an array that looks like this:


$element = array(

    // the 'name' attribute for the XHTML form element
    'name'     => null,

    // the 'type' attribute
    'type'     => null,

    // a text label for the element
    'label'    => null,

    // the 'value' attribute of the element
    'value'    => null,

    // whether or not the element is required
    'require'  => false,

    // whether or not the element is disabled
    // (i.e., 'read-only')
    'disable'  => false,

    // any options related to the element values,
    // typically for select options, radios, or
    // checkbox values
    'options'  => array(),

    // additional attributes for the XHTML form element
    'attribs'  => array()
);

From our earlier example above, you would create an element under this system simply by specifying the array keys for it:


$element = array(
    'name' => 'city',
    'type' => 'text',
    'label' => 'Your City:',
    'value' => 'Memphis'
);

The Savant3 template system can take this array and render a form within a template script like this.


// assume that $element has been assigned as 'elem'
$this->form('start');
$this->form('auto', $this->elem);
$this->form('end');

Note the use of the 'auto' call, which automatically builds a single form element from an array; if you use the 'fullauto' call, it will build multiple elements from a sequential array of elements, in order.

The above discussion only describes the manual generation of form elements (the display is automated via Savant3). To get more of the functionality of HTML_QuickForm, we would need to use Solar_Form.

Solar_Form collects an array of elements together in a class property; you can add them one-by-one like this:


$form = Solar::object('Solar_Form');
$element = array(
    'type' => 'text',
    'label' => 'Your City:',
    'value' => 'Memphis'
);

// note that the 'name' key is optional when
// using setElement(); Solar_Form will use the
// first parameter as the 'name' value, overriding
// the 'name' key.  this has benefits when
// building arrays of elements.
$form->setElement('city', $element);

To populate the form elements with their related POST values, simply call the populate() method.


$form->populate();

You can then pass that form to Savant3 ...


$Savant3->form_elements = $form->elements;

... and render the form in a template script like so:


$this->form('start');
$this->form('fullauto', $this->form_elements);
$this->form('end');

Finally, you can add automated validation to the Solar_Form elements with the addValidate() method. Below, the $method relates to a standard Solar_Valid method; in addition to 'alpha', there are 'alphanumeric', 'numeric', 'isoDate', 'email', and other validation methods, as well as 'regex' (for user-defined regular expressions), 'custom' (for call_user_func callbacks), and 'multiple' (for arrays of validation routines).


$form = Solar::object('Solar_Form');

$element = array(
    'type' => 'text',
    'label' => 'Your City:',
    'value' => 'Memphis'
);

$form->setElement('city', $element);

$name = 'city';
$method = 'alpha';
$feedback = 'Please use only letters in the city name';
$form->addValidate($name, $method, $feedback);

Then you can call the validate() method, which will return true or false, and will automatically set the feedback messages for elements that failed validation.


if ($form->validate()) {
    $values = $form->values();
    // insert values into a table
}

$Savant3->form_elements = $form->elements;

Conclusion

While I shamelessly promote Solar and Savant throughout this entry, my main goal is to point out that MVC separation of form building, validation, automated population, and output display is possible with a unified format describing the form elements. It is not necessary to wrap all those functions into one class; indeed, separating the concerns seems vital to extended development cycles when differing framework systems must interoperate. Having it all in one class is great to start with, but after a while you find yourself coding to the class and trying to learn its intricacies, instead of using it for your own purposes.