Paul M. Jones | Atlas.Orm "Cassini" (v3) Early-Access Alpha Release

Atlas.Orm "Cassini" (v3) Early-Access Alpha Release

For those of you who don't know, Atlas is an ORM for your persistence model, not your domain model. Atlas 1 "Albers" (for PHP 5.6) was released in April 2017. Atlas 2 "Boggs" (for PHP 7.1) came out in October 2017.

And now, in April 2018, we have an early-access release of Atlas 3 "Cassini", the product of several lessons from a couple of years of use.

Architecture and Composition

One big architectural change is that Atlas 3 no longer uses the older Aura packages for SQL connections and queries; instead, Atlas has adopted these packages as its own, and upgraded them for PHP 7.1 using strict typing and nullable returns. Another big architectural change is that the table data gateway and data mapper implementations are now available as packages in their own right.

The end result is a mini-framework built from a stack of packages, where any "lower" package can be used indepdendently of the the ones "above" it. The package hierarchy, from bottom to top, looks like this:

Atlas.Pdo: Descended from Aura.Sql, this provides a database Connection object and a ConnectionLocator. If all you need is convenience wrapper around PDO with fetch and yield methods, this is the package for you.
Atlas.Query: Descended from Aura.SqlQuery, this is a query builder that wraps an Atlas.Pdo Connection. If you just want to build and perform SQL queries using an object-oriented approach, the Atlas query objects can handle that.
Atlas.Table: Extracted from Atlas 2, this is a table data gateway implementation that uses Atlas.Query under the hood. If you don't need a full data mapper system and only want to interact with individual tables and their row objects, this will do the trick.
Atlas.Mapper: Also extracted from Atlas 2, this is a data mapper implementation that models the relationships between tables. It allows you to build Record and RecordSet objects whose Row objects map naturally back to Table objects; you can write them back to the database one by one, or persist an entire Record graph back to the database in one go.
Atlas.Orm: Finally, at the top of the hierarchy, the overarching ORM package provides a convenience wrapper around the Mapper system, and provides several strategies for transaction management.

There are also two "side" packages:

Atlas.Info: Descended from Aura.SqlSchema, this inspects the database to get information about tables, columns, and sequences.
Atlas.Cli: This command-line interface package uses Atlas.Info to examine the database, then create skeleton classes for your persistence model from the database tables.

Separation, Completion, and Hierarchy

One goal for Atlas 3 was to split up the Table and Mapper subsystems into their own packages, so they could be used in place of the full transactional ORM if needed. Along with that, I wanted better IDE autocompletion, especially at the query-building level, particularly in support of different SQL dialects (e.g., PostgreSQL has a RETURNING clause, but nothing else does).

The first idea long these lines was to have parallel packages, one for each SQL driver: Atlas.Query-Mysql, Atlas.Query-Pgsql, Atlas.Query-Sqlite, etc. However, I shortly realized that typehinting at higher levels would have been a problem. If a generic Table class returns a TableSelect that extends a generic Select object, then providing a driver-speific MysqlSelect had to return a MysqlTableSelect for a MysqlTable. That in turn would have meant parallel packages for each driver all the way up the hierarchy: Table, Mapper, and Orm. Generics at the language level might have solved this, but PHP doesn't have them, so that idea was out.

Then the idea was to have a single "master" ORM package with all the subsytems included in it as virtual packages, and template methods where driver-specific implementations could be filled in. However, that ended up being an all-or-nothing approach, where the "lower" level packages could not be used independently of the "higher" level ones.

I could think of only one other alternative that would enable IDE autocompletion for driver-specific functionality, and keep the packages well-separated. That was to make Atlas.Query itself more general-purpose. As a result, the returning() method is available, even if the particular backend database will not recognize it. I'm not especially happy about this, since I'd rather the classes expose only driver-specific functionality, but the tradeoff is that you get full IDE completion. (I rationalize this by saying that the query objects are not there to prevent you from writing incorrect SQL, just to let you write SQL with object methods; you could just as well send a RETURNING clause to MySQL by putting it in a query string. Again, generics at the PHP level would help here.)

Further, on the query objects, I found myself wanting to perform the query represented by the object directly from that object, rather than manually passing it through PDO each time. As such, the query objects now take a PDO instance (which gets decorated by an Atlas.Pdo Connection automatically) so you can do things like this:

$result = Select::new($pdo)
    ->columns('foo', 'bar')
    ->from('table_name')
    ->where('baz = ', $baz)
    ->fetchAll();

Gateways, Mappers, and Relationships

With that in place, the next step was to extract the table data gateway subsystem to its own separate package. The new Atlas.Table library is not too different from the Atlas 2 version. The biggest single change is that the identity map functionality has been moved "up" one layer to the Mapper system. This keeps the package more in line with expectations for table data gateway implementations.

That, in turn, led to extracting the data mapper subsystem to its own package as well. Atlas.Mapper is now in charge of identity mapping the Rows that serve as the core for each Record, but the transaction management work has been moved up one layer to the overarching ORM package.

Of particular note, the new Atlas.Mapper package does away with the idea of a
"many-to-many" relationship as a first-class element. It turned out that managing many-to-many relateds was both counterintuitive and counterproductive in subtle but significant ways, not least of which was having to keep track of new or deleted records in two places (the "through" association and the "foreign" far side of the association). Under the hood, Atlas 2 had to load up the assocation mapping records anyway, so forcing an explicit with() call when fetching a many-to-many relationship through the association mapping seems like a reasonable approach. With that, you only need to manage the association mapping to add or remove the "foreign" records on the far side of the relationship.

Also in the relationship department, the "ManyToOneByReference" relationship has been renamed to "ManyToOneVariant". (I think that name flows better.)

Finally, 1:1 and 1:M relationships now support different kinds of cascading delete functionality. These methods on the relationship definition will have these effects on the foreign record when the native record is deleted:

onDeleteSetNull(): Sets $foreignRecord keys to null.
onDeleteSetDelete(): Calls $foreignRecord::setDelete() to mark it for deletion.
onDeleteCascade(): Calls $foreignMapper::delete($foreignRecord) to delete it right away.
onDeleteInitDeleted(): Presumes that the database has deleted the foreign record via trigger, and re-initializes the foreign record row to DELETED.

Further, the 1:M relationship will detach deleted records from foreign RecordSets, and the 1:1 relationship will set the foreign record value to false when deleted. This helps with managing the in-memory objects, so you don't have to detach or unset deleted records yourself.

ORM Transaction Management

At the very top of the framework hierarchy, then, we have the Atlas.Orm package. Almost all functionality has been extracted to the underlying packages; the only parts remaining are the overarching Atlas object with its convenience methods, and the transaction management system.

Whereas Atlas 2 provided something similar to a Unit of Work implementation, I have come around to the idea that Unit of Work is a domain-layer pattern, not a persistence-layer pattern. It's about tracking changes on domain objects and writing them back to the database approriately: "A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work." (cf. POEAA: UnitOf Work).

With that in mind, Atlas 3 provides more typical transaction management strategies, though with some automation if you feel you need it:

The default AutoCommit strategy starts in "autocommit" mode, which means that each interaction with the database is its own micro-transaction (cf. https://secure.php.net/manual/en/pdo.transactions.php). You can, of course, manually begin/commit/rollback transactions as you wish.
The AutoTransact strategy will automatically begin a transaction when you perform a write operation, then automatically commit that operation, or roll it back on exception. (This was the default for Atlas 2.)
The BeginOnWrite strategy will automatically begin a transaction when you perform a write operation. It will not commit or roll back; you will need to do so yourself. Once you do, the next time you perform a write operation, Atlas will begin another transaction.
Finally, the BeginOnRead strategy will automatically begin a transaction when you perform a write operation or a read operation. It will not commit or roll back; you will need to do so yourself. Once you do, the next time you perform a write or read operation, Atlas will begin another transaction.

Style and Approach

Even though I was the lead on PSR-1 and PSR-2, I find myself beginning to drift from them somewhat as I use PHP 7 more. In particular, I find things line up more nicely when you put the opening brace on the next line, for a method with a typed return and an argument list that spreads across multiple lines. That is, instead of this ...

    public function foo(
        LongClassName $foo,
        LongerClassName $bar
    ) : ?VeryLongClassName {
        // ...
    }

... I'm leaning toward this:

    public function foo(
        LongClassName $foo,
        LongerClassName $bar
    ) : ?VeryLongClassName
    {
        // ...
    }

The extra bit of whitespace in this situation gives a nice visual separation.

I'm also beginning to drift a little from some of my devotion to interfaces. One of the problems with exposing interfaces is that strict backwards compatibility becomes harder to maintain without also bumping major versions; if you find you need even a single new feature, you can't add it to the interface without breaking BC. As such, to maintain both Semantic Versioning and to avoid adding off-interface methods in sub-major releases, Atlas 3 avoids interfaces entirely in favor of abstract class implementations. Further, these abstract classes do not use the "Abstract" prefix, as I have done in the past. This means Atlas can typehint to Abstract classes without it looking ugly, and can avoid technical BC breaks when interfaces are added to. This has its own problems too; it is not a solution so much as a tradeoff.

I'm also trying to avoid docblocks as much as possible in favor of typehinting everthing possible.

Upgrading from Atlas 2 to Atlas 3

Whereas the upgrade path from Atlas 1 to Atlas 2 was relatively straightforward, the jump from Atlas 2 to Atlas 3 is a much bigger one. The most significant changes are in the method names and signatures at the Connection and Query levels; the Table classes now each have their own Row instead of using an all-purpose generic row; the Mapper classes now separate the relationship-definition logic to its own class as well. I will be preparing an upgrade document to help ease the transition.

But if you're just getting started with Atlas, you'll want to try v3 "Cassini" first.

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving you codebase, all while keeping it running the whole time.