Paul M. Jones

Don't listen to the crowd, they say "jump."

Solar 0.1.0 Released

Solar is a simple object library and application repository for PHP5. This is a development release made in the middle of the night due to insomnia. ;-)

Among other things, there is a second proof-of-concept application included; Solar_App_Bookmarks mimics and extends a lot of del.icio.us functionality. I'm already using it instead of del.icio.us, as I can assign arbitrary ranks to links and then get an RSS feed in that order, instead of descending by timestamp.

Finally, I've started an open wiki for documentation (now that I have two good working apps I think it's time to describe how to install and use them ;-).

The change notes are:

* WARNING: This is a public development release, and is not
  yet stable.

* WARNING: There are database schema changes in this release.

* Fixed bug in Solar::pathinfo(); elements were all off by 1
  (needed to shift off the first returned element, as it
  is always blank).

* In Solar_Sql_Entity::buildSchema(), the 'join' value must
  now be a $schema['rel'] keyword.

* Solar_Sql_Entity::update() now retains primary keys in a
  separate array (instead of unsetting them entirely) and
  restores them after the update process; this will help
  with post-update tasks that need the primary key.

* Solar_Sql_Entity::selectCount() and selectPages() have
  been combined into countPages(), which returns an array
  with both the row count and the page count.

* Solar_User_Auth_Htpasswd now rejects DES-encrypted
  passwords longer than 8 characters due to a limitation
  in crypt().  Have added support for SHA1 (thanks Tomas
  Cox) and APR1-MD5 (thanks Mike Wallner) encrypted
  passwords to offset this.

* Added Solar_User_Auth_Multi to perform authentication
  against multiple fallback sources.

* Solar_User_Role now uses a single role driver class by
  default.

* Added Solar_User_Role_Multi to pull from multiple
  group/role sources.

* Renamed Solar_Cell_Talk to Solar_Cell_Comments for clarity
  (this includes a set of schema changes as well as
  in-place support for trackback and pingback storage).

* Schema changes to Solar_Cell_Bugs.

* Added Solar_Uri as a URI import/manipulate/export tool,
  generally useful for building navigation links.  Works
  with query elements as well as path_info elements.

* Added Solar_Cell_Tags as a shared tag-search resource to
  support ubiquitous tagging across all Solar applications.

* Added Solar_Cell_Bookmarks and Solar_App_Bookmarks for
  bookmark management similar to the del.icio.us service.

* Various locale string additions and modifications.

Htpasswd and crypt() in Solar -- fixed!

(Well, sort of fixed.) This post originates from an issue I had with htpasswd files and crypt(); effectively, crypt() only looks at the first 8 characters in a password and validates if they match, regardless of the rest of the password. It turns out this is a known limitation of crypt(); it generated a fair amount of discussion on the pear-dev mailing list.

So while Solar_User_Auth_Htpasswd will still reject passwords longer than 8 characters as a security measure against the default DES crypt() limitation, I have been able to add support for SHA1 and APR1-MD5 encrypted passwords in htpasswd files. This will allow you to use much longer passwords. The new code comes courtesy of two PEAR developers: from a tip by Tomas V. V. Cox for SHA1, and from Mike Wallner's excellent crypt_apr_md5() method in File_Passwd. Thanks, guys!

(A side note: Apache htpasswd does not use a standard MD5 encryption routine, which is why just calling md5() from PHP was not a viable option.)


Password problems with crypt() and htpasswd files

In working with Solar today, I discovered an issue related to the crypt() function and password files generated by Apache htpasswd. Technically, it's not a security issue with either of those fine programs, because they do work as documented and intended. However, due to my own ignorance of the limitations of crypt(), I created a security issue of my own; perhaps this post will help others avoid it.

The Solar_User_Auth class is very much like the PEAR Auth class, in that it lets you pick different container or storage types for your username/password authentication. You can use a database table, LDAP, POP or IMAP email account, a .ini file, and more. Similar to LiveUser, Solar_User_Auth also comes with a "Multi" container that lets you specify multiple authentication sources, so you can fall back from one to another automatically.

My problem today was with the "Htpasswd" container for Solar_User_Auth. Apache comes with a utility called "htpasswd" which lets you create a file of usernames and encrypted passwords. The file format is pretty straightforward; each line consists of "username:cryptedpass". To create a htpasswd file and insert the first username/password combination, you would issue "htpasswd -c /where/you/want/htpasswd.data -a someuser"; it will create the htpasswd.data file and prompt you for the password for "someuser", encrypting the password with the system crypt() function.

Now here's the thing about crypt() ... effectively, it only looks at the first 8 characters of the password to generate the encrypted hash. (Yes, there are ways to make crypt() use a longer salt, but that's not pertinent to this particular discussion, as we are only concerned with the way Apache htpasswd uses the crypt() function.)

Thus, if you have a password *longer* than 8 characters, as long as the first 8 characters match, crypt() will call it a valid match. For example, if your password is "password" and is stored as a crypted hash in a htpasswd file, checking against "passwordX" will be exactly the same as checking against "password", "password123", and so on; that is, it will be returned as a valid check because the first 8 characters match properly. Similarly, if your password is "longpassword" and you check against only "longpass", that will be returned as valid, too.

Obviously this is a problem. The solution, at least for Solar_User_Auth_Htpasswd, is that from now on, password checks longer than 8 characters will be rejected automatically as invalid. This sidesteps the problem entirely, even though it does limit users who want to use the Htpasswd driver to passwords of 8 characters or less. The next release of Solar will have this patch in place, and it has already been committed to the Subversion repository for Solar.

Have I missed anything important here? Has anyone else out there run into anything like this? If so, what was your solution?

Update: (2005-04-14) Although I'm retaining the 8-character limit under default DES encryption, I've added SHA1 and APR1-MD5 support, which should help a great deal.


Ryan King on the small-s semantic web

Ryan King's Evolutionary Revolution has a lot of good info about making the web as we know it now a semantic datasource. He mentions, among other things, "microformats" (which are themselves a really useful tool that I'd seen but not understood).

Ryan's entry, along with the articles cited in his blogliography ;-), should give PHP developers good ideas on the kinds of things to support in their apps, possibly by making data from those apps automatically available in one of the various microformats. For example, there is the hCard format, a point-for-point XHTML representation of a text-based vCard. You can already parse vCards with Contact_Vcard_Parse, so rendering one from the generated array should be a relatively straightforward task. You could then take all your vCards, parse them from source text, and make a web-available resource from them.

Good post, Ryan; lots of food for thought.


Quick Hit: Savant3-3.0.0dev3 released

This is a bugfix release; I failed to close parentheses properly in the block for extracting references (not the default behavior). You can get it from here (Savant3 is a PHP5 E_STRICT-compliant version of the Savant template system; see the website at http://phpsavant.com.)



Database Portability (Part 4)

This is part 4 in a 3-part series (take that, Douglas Adams ;-). You can see parts 1 and 2 here and part 3 here.

A little background: the reason I am so concerned portability is the Solar project. One of the stated goals for Solar_Sql (the RDBMS abstraction layer) is that it should run exactly the same way on every supported database. Right now, that list includes Frontbase, Microsoft SQL, MySQL, PostgreSQL, and SQLite.

Case-Sensitivity in WHERE Clauses

Let's say we have a table with a column named "the_field" defined as CHAR(2). In the table, we have every possible combination of two letters, picked from the set of all upper-case and lower case letters (i.e., "A-Z" and "a-z", for a total of 26^2 rows in the table). For example, the "A" combinations are represented by four rows: "aa", "aA", "Aa", and "AA".

In theory, if you wanted to select the row with "aa" (where both are lower case), you would issue this SELECT statement:

SELECT * FROM the_table WHERE the_field = 'aa'

How many rows will you get back? You should only get one: the "aa" row. That's true if the database is PostgreSQL, Frontbase, or SQLite; however, in Microsoft SQL and MySQL, you get four ("aa", "aA", "Aa", and "AA"). That's because searches in those two RDBMSes are *not* case-sensitive on CHAR and VARCHAR fields. This fails to meet my requirement that Solar_Sql return the same results regardless of the backend.

I think the solution will be to re-define the native type used for "char" and "varchar" fields in those two databases. It turns out that in MySQL, a "CHAR(n) BINARY" and "VARCHAR(n) BINARY" both retain case-sensitivity in searches. In Microsoft SQL, the same appears to be true for "BINARY" (vice "CHAR") and "VARBINARY" (vice "VARCHAR").

Does anyone see a problem with using the binary types instead of the plain character types in those two databases, specifically for purposes of case-sensitivity? My appreciation will extend to buying you a beer next time we're in the same town. :-)


Solar 0.0.5 Released

Solar is a simple object library and application repository for PHP5. Solar stresses comprehensibility and simplicity; reusable code should be easy to understand, and that's what I strive for in Solar. It is written with worldwide application distribution in mind, so things like localization are built in from the start.

This release includes a new and very important component: the "Solar_App" class. It takes some superficial hints from Ruby on Rails (but is definitely not comparable to RoR in any meaningful sense). Essentially, Solar_App is an application controller class; when you extend it (e.g. to Solar_App_Bugs, the proof-of-concept bug tracker for Solar) it looks for subdirectories called "controllers", "models", "views", and "helpers". (So it looks like I have started to adopt "real" MVC methodology after years of shunning it.)

It's easy enough to define new controller actions: you put your controller scripts in "controllers", and for each script you can call an action. For example, if you have "edit.php" in controllers/, Solar_App will call it if the "action" URL variable is "edit" ("http://example.com/index.php?action=edit"). There's a little more to it than that, of course, but the source code is thoroughly commented and should provide a good full-up example for now (it is a proof-of-concept, after all).

One nice thing about distributing apps this way is that they never have to leave the Solar directory itself; you can just create a simple PHP file (5-6 lines) to start Solar, get the application object, and call its output() method; the entire application sits outside the web root. Pretty nifty; you can see an example in the Solar_App_Bugs package (included in the full Solar package).

The full change notes are:

* This is a public development release, and is not yet stable.

* Added Solar_Super class for retrieving superglobals, modified Solar.php to use it

* Added Solar::cookie() method for retrieving cookie values.

* Added Solar::session() method for retrieving session data.

* Added new Solar_App application controller class (with standard directory structure using an MVC architecture inspired by Ruby-on-Rails)

* Solar_User_Auth 'action*' keys and values renamed to 'op*' (Solar_App now owns the 'action' keyword)

* Added validation for 'new' and 'edit' forms in Solar_Cell_Bugs

* Added Solar_App_Bugs proof-of-concept application based on Solar_App

You can download this release, read the API docs, or join the mailing list -- just visit http://solarphp.com.



Database Portability (Part 3)

(This is part 3 of a 3-part essay; you can view parts 1 and 2 here.)

Commands and Services

The last major concept related to database portability in PHP is itself composed of two related pieces: the SQL command set presented by the RDBMS backend, and the additional services proved by the backend (sequences, stored procedures, views, triggers, and so on). They're pretty closely related in my opinion, because the services are addressed through the SQL command set for that RDBMS.

The LIMIT Clause

One very important command clause for SELECT statements, the LIMIT (or TOP) clause, can be radically different between databases. For example, Microsoft SQL has no support at all for the LIMIT clause, and its native TOP clause does not support offsets. Thus, a "SELECT * FROM table_name LIMIT 10,50" clause in MySQL (i.e., return 10 rows starting at the 50th row of the results) has no direct correlate in Microsoft SQL (the best you can do with its native SQL command set is "SELECT TOP 60 FROM table_name" and then, when looping through the results, skip the first 50, which sucks for large result sets).

The only way I have found to emulate LIMIT clauses in a reasonable way is the method employed by PEAR DB and MDB; I think AdoDB uses this method as well. Essentially, you pass a SELECT statement into a function and specify the count and offset as function parameters; the function then adds the proper LIMIT or TOP clause to the statement for you (with respect to the specified RDBMS backend). For example, instead of this ...

// $db is PEAR DB object
$result = $db->query("SELECT * FROM table_name LIMIT 10,50");

... you would do this:

// $db is PEAR DB object
$result = $db->limitQuery("SELECT * FROM table_name", 10, 50);

In Solar_Sql, I have streamlined this somewhat so that you can execute the command with bound parameters (in preparation for PDO) and LIMIT count/offset values; the same exec() method is used regardless of whether or not a LIMIT is to be added:

// $sql is a Solar_Sql object
$result = $sql->exec("SELECT * FROM table_name", $bind_params, 10, 50);

SQL Functions

Some seemingly standard SQL functions, such as SUBSTRING and CONCAT, are different from backend to backend. This means that you must either code your SQL statments *not* to use such functions, or come up with a way to abstract them out (either with a search-and-replace on your SQL command strings, or with PHP functions).

AdbDB uses the latter method; that is, it abstracts the SQL CONCAT() function, along with other functions, using a related PHP function call. For example, instead of this ...

$stmt = "SELECT CONCAT('field1', 'field2', 'field5')" .
" AS combo FROM table_name";

... you would use the concat() method to make sure the proper SQL for your backend RDBMS is generated:

$stmt = "SELECT " . $adodb->concat('field1', 'field2', 'field5') .
" AS combo FROM table_name";

While I am unhappy with this way of doing things, I don't see a less error-prone replacement. The CONCAT() function (and other SQL functions) can be used in almost any SQL command, whether SELECT, INSERT, UPDATE, etc. Attempting to do a search-and-replace for a function placeholder in the plain text of the statement means you need to test and see if the placeholder is literal text to be inserted or updated (as in the text of an example for using the command), or if the placeholder is in fact standing outside quotes and is intended as a function call. As such, using a method to insert the proper SQL seems the most reasonably foolproof way to abstract such SQL-level function calls.

Services

Besides the SQL command set and common SQL functions, we have additional "services" provided by the particular RDBMS that may not be available in other systems. For example, MySQL uses autoincrementing fields to generate unique sequential values for a column, while most other databases use "sequences" for such values (a "sequence" is a special service that increments values independently of the tables in the database).

Similarly, some RDBMSes provide access to stored procedures (these are essentially compiled sets of SQL logic that are invoked by name, much like user-defined functions in PHP) that can be called automatically by certain triggers (ON UPDATE, ON DELETE, and so on).

Indexes are handled differently in some backends. PostgreSQL in particular is touchy about index names; index names on different tables may not share the same name, so if you have table1.id column and a table2.id column, the table1 index named "id" will conflict with the table2 index named "id". As such, to make your application portable to PostgreSQL, you have to make sure that index names across all tables in the database are unique; Solar_Sql_Entity handles this by prefixing every index name with the table name and suffixing with "_idx".

Finally, not all database backends are transaction-aware. Transactions allow you to attempto make changes to the database, and if the changes only partly succeed, you can roll back to a known good state, thus preserving data integrity. Some MySQL table types do not support transactions at all. Some PHP transaction commitment functions will commit for all pending statements on the connection, which means that if two separate sets of SQL statments are being handled on the same connection, both will be committed to the database; this makes resource sharing problematic in some cases.

These services, and their related SQL commands, cannot be depended on to behave similarly (or even exist) across all database backends. As such, you need to reduce your expectations to the minimum that can possibly be supported by all RDBMSes. Stored procedures, when supported, are signigicantly different in their implementation between RDBMSes. Triggers, while often similar between backends, are different enough to warrant special treatment in a portability layer; in addition, triggers often depend on stored procedures, which are themselves not very portable.

As such, it appears the only service you can depend on in regular practice is sequences, whether native or emulated. Again, PEAR DB, MDB, and AdoDB have similar solutions. Instead of depending on MySQL's autoincrement as a native column option, a sequence (or separate sequence table) is generated; you then must acquire the sequence number separately from the SQL command. For example, the following "standard" MySQL command to insert an autoincremented value ...

// the 'id' column is autoincremented
$stmt = "INSERT INTO table_name (id) VALUES (NULL)";

... you would get a sequence value, then insert that value explicitly.

// $db is a PEAR DB object
$id_value = $db->nextSequence('sequence_name');
$stmt = "INSERT INTO table_name (id) VALUES ('$id_value')";

Regarding transactions, it may be best to turn them off entirely (i.e., set "autocommit" for each individual statement) so that shared connections do not commit statements being handled by other pieces of PHP code. (This is the mode I use in Solar_Sql.)

Conclusion

While it is relatively easy to create an API portability layer for PHP to access different databases, no RDBMS supports the exact same set of column types, SQL commands, and services. As such, database portability generally means that you have to limit your coding to a lowest common denominator of RDBMS power.

So why attempt database portability at all? For most in-house applications, it may not make sense, especially when speed and specific backend functions are of paramount importance. However, for many applications, especially applications intended for worldwide distribution into heterogenous environmments, coding for portability will reduce debugging problems and make it easy for end-users to install and maintain your application regardless of their particular database backend. While you will be sacrificing some level of power at the SQL level, you will be able to depend on the database results to be exactly the same, regardless of which RDBMS is actually in use, at the PHP level.