Norbert Mocsnik has raised some important points about automated form generation and processing in this post. He outlines what he thinks of as the form processing steps:

1. build a form (e.g. call functions to add inputs, set form ACTION)

2. fill in the defaults (with one call or walking through the form inputs by inputs)

3. assign the form template (either a static template which is created for a specific form thus giving the greatest flexibility or a dynamic template which describes how to display different field types thus can be applied to any form)

4. parse (and display) the form template

The user fills in the form and posts it to the server. Any time the user posts the form, the server should start with step 5 (so it goes like 5-6; 5-6-7; 5-6-7-8 not 5; 6; 7; 8). This way it is guaranteed that the user gets back the form any time he/she posts invalid values, gets to the confirmation after then (only with valid values that are ready to be processed) and the form is processed only if both the form was valid and it was confirmed (if needed).

5. form validation

6. if the form was not valid, pass it back to the user for correction (go back to step 1 but instead of filling the defaults in step 2, fill it with the values the user just entered)

7. if this form should be confirmed before processing and it wasn't confirmed after the user edited it the last time, pass it back to the user "freezed" (=read-only) for confirmation

8. process the form (this means storing it in a database in most cases)

I am not 100% certain that I agree with these points as presented; allow me to revisit and restate them. The following is a thought experiment; it's an outline of the client and server processes for a form submission.


0. (Optional, not the normal case.) The client decides to attack your script by submitting form data directly. This would cause us to skip steps 1, 2, and 3, going directly to step 4. This is why we cannot rely on client-side validation of data for any serious purposes.

1. Model logic is triggered for the first time by client browsing to the page. We need to send the client a blank form with some sane default values. The model logic talks to the database through an interface class to see what a default data set should look like, modifies it as needed, and presents the default data set to the View (template) logic.

2. View logic takes the default data set and parses it through to a form; the form in the template script may be dynamically generated, say with the Savant2 "form" plugin or the Smarty form element plugins; alternatively, it may be mostly static XHTML with placeholders for the data elements. When done, off we go back to the client.

3. Client gets the generated form, fills in some elements, submits the form.

4. Model logic gets the submitted form data. Now we have sanitize and validate the data; either the model logic or the data interface class sanitizes the data, then the data interface class validates it. Obviously there are two possible outcomes: some or all of the sanitized data is not valid (see 4a), or all of the sanitized data is valid (see 4b).

4a. If some part of the data is not valid, we should re-present the submitted data (perhaps modified to make it more sane) to the client. We cycle back to the equivalent of steps 1 and 2 again, with the submitted or modified data set (not the default data set).

4b. If all of the data is valid, then we can continue to perform the model logic; this may mean changing database values, handing control off to another script, "freezing" the form for confirmation (in which case we may cycle back to step 3 again), or any other type of processing.


As you can see, the above outline isn't strictly based on Model/View separation. Instead, it is more like Data/Model/View separation, where the Data logic is encapsulated in a class like DB_Table or DB_DataObject.

It seems to me that the Model logic should not be validating the data; becuase it is data, that Data logic should handle that behavior. The Model should only ask the Data logic if the information is valid, and the reasons for invalidation; that way, any Model that talks to the Data logic will always get the same responses for the same data sets.

This is where I find HTML_QuickForm and FormBuilder and the like to be not-the-best long term solution. They pierce the veil between Data, Model, and View, trying to roll them all into one piece. This is fine for prototyping, but for extended use I'm beginning to think we need less of a monolithic approach. We need more "small pieces" to be "loosely joined".

What would this entail?

The Data logic would entail a data-interface class that knows what it's columns and their valid limitations are, effectively an increased-functionality version DB_Table or DB_DataObject. The class would need to be able to validate data at the PHP level for reasons of data portability (can't depend on the database backend for that, becuase all DB backends are different in different ways). The class would also need to be able to report validity failures in an extensible and programmatic way (validation codes in addition to messages). Finally, the class would need to be able to provide hints to the View logic as to what kinds of values are valid, so that the View logic can present a reasonable form to the client.

The Model logic is up in the air; its functions are the heart of the custom application, and as such cannot be strictly outlined here. At worst, the Model would serve as a mediator between the Data logic and the View logic.

The View logic would be able to take the data presented to it from the Model and construct a useful form, along with validation messages.

Man, that's a lot; let me think on this a bit more and get back to it later. In the mean time, please comment freely and tell me why this outline is the worst thing you've ever seen. ;-)

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving you codebase, all while keeping it running the whole time.