Database Portability (Part 4)

This is part 4 in a 3-part series (take that, Douglas Adams ;-). You can see parts 1 and 2 here and part 3 here.

A little background: the reason I am so concerned portability is the Solar project. One of the stated goals for Solar_Sql (the RDBMS abstraction layer) is that it should run exactly the same way on every supported database. Right now, that list includes Frontbase, Microsoft SQL, MySQL, PostgreSQL, and SQLite.

Case-Sensitivity in WHERE Clauses

Let’s say we have a table with a column named “the_field” defined as CHAR(2). In the table, we have every possible combination of two letters, picked from the set of all upper-case and lower case letters (i.e., “A-Z” and “a-z”, for a total of 26^2 rows in the table). For example, the “A” combinations are represented by four rows: “aa”, “aA”, “Aa”, and “AA”.

In theory, if you wanted to select the row with “aa” (where both are lower case), you would issue this SELECT statement:

SELECT * FROM the_table WHERE the_field = ‘aa’

How many rows will you get back? You should only get one: the “aa” row. That’s true if the database is PostgreSQL, Frontbase, or SQLite; however, in Microsoft SQL and MySQL, you get four (“aa”, “aA”, “Aa”, and “AA”). That’s because searches in those two RDBMSes are *not* case-sensitive on CHAR and VARCHAR fields. This fails to meet my requirement that Solar_Sql return the same results regardless of the backend.

I think the solution will be to re-define the native type used for “char” and “varchar” fields in those two databases. It turns out that in MySQL, a “CHAR(n) BINARY” and “VARCHAR(n) BINARY” both retain case-sensitivity in searches. In Microsoft SQL, the same appears to be true for “BINARY” (vice “CHAR”) and “VARBINARY” (vice “VARCHAR”).

Does anyone see a problem with using the binary types instead of the plain character types in those two databases, specifically for purposes of case-sensitivity? My appreciation will extend to buying you a beer next time we’re in the same town. 🙂

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving your codebase, all while keeping it running the whole time.

4 thoughts on “Database Portability (Part 4)

  1. Can you do fulltext indexing on binary varchar & char columns? (I’ve never tried, but I imagine things like this will break.)

    Of course … in general I think there is a flaw with your approach 😉 In my experience, people use databases because they want to take advantage of the features of the databases, and because they want to harness powerful reporting functionality in those databases. When you force the use of text-type columns instead of temporal columns and force the use of binary columns instead of text columns, you are severely limiting the database features that your application can take advantage of. … And it seems to me that you’re severely limiting the reasons to use a full-fledged RDBMS at all. Why not simply use SQLite and not worry about cross-database issues at all?

    I certainly understand that this type of abstraction is going to imply lowest-common-denominator features, but I think your rigid approach is making that lowest-common-denominator so low that that it almost defeats the point of storing data in a database. Furthermore, I think the harder you try to enforce true 100% abstraction, the more you’ll realize that in its best case it’s extremely impractical & at worst it’s probably simply impossible.

    But good luck! 🙂

    Cheers,
    Hans

  2. It is a configuration setting on Microsoft SQL Server.

    Noticed many years ago that Microsoft SQL Server Enterprise manager ran
    SELECT ‘A’ = ‘a’
    when it just connected to a server, to determine how comparisons being performed.

    I would prefer to see a SQL rewriter style application/tool.
    Given a bunch of SQL written for a specific database, say PostgreSQL, it could rewrite it for Sqlite.

    Then it’d be simple for an application to treat SQL using the same methods as supporting multiple natural languages.
    (Plus it’d be far more efficient than having SQL dynamically adjusted to fit every time its ran)

  3. It is a configuration setting on Microsoft SQL Server.

    Noticed many years ago that Microsoft SQL Server Enterprise manager ran
    SELECT ‘A’ = ‘a’
    when it just connected to a server, to determine how comparisons being performed.

    I would prefer to see a SQL rewriter style application/tool.
    Given a bunch of SQL written for a specific database, say PostgreSQL, it could rewrite it for Sqlite.

    Then it’d be simple for an application to treat SQL using the same methods as supporting multiple natural languages.
    (Plus it’d be far more efficient than having SQL dynamically adjusted to fit every time its ran)

  4. Hi, Hans —

    It’s definitely a lowest common denominator, but that low point is (I think) pretty good for most general web apps, especially the kinds I write and try to distribute. Just think how much was done with MySQL 3.x; that’s barely a step above SQLite (and in some cases a step below).

    Yes, you can index binary & varbinary, at least up to 255 chars and perhaps more, but 255 is the common limit for those anyway.

    When taking advantage of the specific features of a specific database, you lose portability. You have to write th SQL a special way for that system, and depend on the functions of that system, and so on, and then it becomes a big hairball if you want to distribute it for systems other than the original database (and often with a very specific configuration). The point of this project is to push portability as far as it can go, and still do useful stuff. Useful to me, anyway. 😉

    At the same time, few people to set up a new database server just for a specific app. I think it’s better if they can use what they have already, with the expertise they already possess. I know that if I see an app written specifically for (say) PostgreSQL, I won’t use it, becuase I don’t know PG and don’t have time to set it up and learn it. And maintain the new system, and patch and troubleshoot it. Better if it’s MySQL, for me, because I already have that.

    In the end these kinds of portability strictures may be impractical, but it seems to have done well with YaWiki and other more minor projects, so I think it will be a useful thing for some folks. As with everything I do, the project is mostly for me, and if others think it’s good then I’m happy to have the company. 🙂

    Hope all is well with Propel/Creole and your other projects, and thanks for taking the time to comment.

Leave a Reply

Your email address will not be published. Required fields are marked *