On Preferring Spaces Over Tabs in PHP

The best lack all conviction, while the worst are full of passionate intensity. — “The Second Coming”, William Butler Yeats

Keep the above in mind when considering either side of the debate. ;-)

tl;dr

Herein I assert and discuss the following:

  • Using spaces has subtle advantages over using tabs in collaborative environments.

  • The “tabs reduce file size” argument is factually true, but is a case of optimizing on the wrong resource.

  • The “tabs allow each developer to set his own indent widths” argument sounds good in theory but leads to problems in practice regarding line length recognition and inter-line alignment.

Introduction

In the PHP world, there are effectively two competing indenting practices: “4-spaces” and “tab.” (There are some in the 2-space camp as well but they are very few.)

I want to point out a couple things about why spaces might be considered preferable in a collaborative environment when style is important on a PHP project. And yes, it turns out this dicussion does matter.

I used to use tabs, and slowly migrated over to spaces. Over the course of several years, I have found there is a slight but useful advantage to using spaces for indentation when working with other developers, and I want to discuss that one advantage in this essay.

Note that I am not asserting an overwhemling, absolutely obvious, infallible moral rule that clearly favors spaces over tabs as the One True Path. It is merely a noticeable improvement regarding more sophisticated rules of style.

Do I expect this essay to change anybody’s mind on using tabs? No; but, I do hope it will give some food for thought.

Regarding Tabs

When making an argument, it is important to state the alternative viewpoint in a way so that people who hold that viewpoint actually agree with it.

What are the reasons for preferring a tab indent? As far as I can tell, they are:

  • The tab is a single character, so files are smaller.

  • Using a tab character allows each developer to change the level of indent
    that he sees, without actually modifying the on-disk file.

If there are other reasons I have missed, please let me know.

File Size

In general, I assert that the “file size” argument is a case of “optimizing on the wrong resource.”

By way of example, let’s take one file from a real project that uses 4-space indenting, Zend_Db_Abstract, and use wc -c to count the number of bytes in the file.

$ wc -c Abstract.php
40953 Abstract.php

Now, let’s convert each 4-space indent to a tab.

$ unexpand -t 4 Abstract.php > Abstract-tabs.php
$ wc -c Abstract-tabs.php
34632 Abstract-tabs.php

We save 6K of space on a 40K file, or roughly 15%, by using a tab character for indents instead of a 4-space indent.

Now, to get an idea of how that compares to another way to reduce size, let’s remove all the comments from the original 4-space file and see what that does. We’ll use a tool I found after two minutes of Googling (you may need to change the hashbang line of remccoms3.sed to point to your sed):

$ wget http://sed.sourceforge.net/grabbag/scripts/remccoms3.sed
$ chmod +x remccoms3.sed
$ ./remccoms3.sed Abstract.php > Abstract-no-comments.php
$ wc -c Abstract-no-comments.php
21022 Abstract-no-comments.php

That’s about a 50% reduction. If disk storage is really a concern, we’d be much better off to remove comments than to convert spaces to tabs. Of course, we could do both.

This example makes me believe that the “file size” argument, while factually correct, is a case of “optimizing on the wrong resource.” That is, the argument gives strong consideration to a low-value item. Disk space is pretty cheap, after all.

A followup argument about this is usually, “Even so, it’s less for the PHP interpreter to deal with. Fewer characters means faster code.” Well, not exactly. Whitespace is tokenized, so the parser sees it all the same.

Developer Tab Stop Preferences

This, to me, seems to be the primary argument for preferring tabs over spaces for indenting. Essentially, the idea is to allow each individual developer on a project to make the code look the way that individual developer prefers.

This is a non-trivial argument. It’s very appealing for the individual developers to be able to work on a project where Developer A sees a tab stop every 4 characters, and Developer B sees a tab stop every 2 or 8 or whatever characters, without changing the actual bytes on disk.

I have two arguments against this; they seem to be minor, until we examine them in practice:

  • It becomes difficult to recognize line-length violations with over-wide tab stop settings.

  • Under sophisticated style guides, inter-line alignment for readability becomes inconsistent between developers using different tab stops.

These arguments require a little exposition.

Line Length Recognition

Because of limitations of this blog, let’s say that our coding style guide has a line length limit of 40 characters. (I know, that’s half or less of what it should be, but it serves as an easy illustration.)

The following code, with 4-character tab stops, shows what that line length limit looks like:

         1         2         3         4
1234567890123456789012345678901234567890
function funcFoo()
{
    $varname = '12' . funcBar() . '34';
}

It’s clearly within the line length limit. But it looks like this under an 8-character tab stop:

         1         2         3         4
1234567890123456789012345678901234567890123
function funcFoo()
{
        $varname = '12' . funcBar() . '34';
}

A developer who sees this code under 8-character stops will think the line is past the limit, and attempt to reformat it in some way. After that reformatting, the developer working with 4-character tab stops will think the line is too short, and reformat it back to being longer. This is not particularly productive.

Some will say this just shows that line length limits are dumb. I disagree.

Inter-Line Alignment

By “inter-line alignment” I mean the practice where, if we have several lines of code that are similar, we align the corresponding parts of each line in columns. To be clear, it’s not that unaligned code is impossible to read; it’s just noticeably easier to read when it’s aligned.

Typically, inter-line alignment is applied to variable assignment. For example, the following unaligned code …

$foo = 'bar';
$bazdib = 'gir';
$zim = 'irk';

… is easier to scan in columns aligned on the = sign:

$foo    = 'bar';
$bazdib = 'gir';
$zim    = 'irk';

We can see clearly what the variables are in the one column, and what the assigned values are in the next column.

Alternatively, we may need to break an over-long line across several lines, and make it glaringly obvious during even a cursory scan that it’s all one statement.

Now, let’s say we have a bit of code that should be aligned across two or more lines, whether for readability or to adhere to a line length limit. We begin with this contrived example using 4-space indents (the spaces are indicated by • characters):

function funcName()
{
••••$varname = '1234' . aVeryLongFunctionName() . 'foo' . otherFunction();
}

Under a style guide where we align on = to keep within a line length limit, we can do so regardless of tab stops:

function funcName()
{
••••$varname = '1234' . aVeryLongFunctionName()
•••••••••••• . 'foo' . otherFunction();
}

Under a guide where we use tabs, and Developer A uses 4-character tab stops, we need to push the alignment out to the tab stops to line things up (tabs are indicated by → characters):

function funcName()
{
→   $varname→   = '1234' . aVeryLongFunctionName()
→   →   →   →   . 'foo' . otherFunction();
}

However, if a Developer B uses an 8-character tab stop, the same code looks like this on Developer B’s terminal:

function funcName()
{
→       $varname→       = '1234' . aVeryLongFunctionName()
→       →       →       →       . 'foo' . otherFunction();
}

The second example has the same tabbing as in the first example, but the alignment looks broken under 8-character tab stops. Developers who prefer the 8-character stop are likely to try to reformat that code to make it look right on their terminal. That, in turn, will make it look broken for those developers who prefer a 4-character stop.

Thus, the argument that “each developer can set tab stops wherever he likes” is fine in theory, but is flawed in practice.

The first response to alignment arguments is generally: “Use tabs for indenting and spaces for alignment.” Let’s try that.

First, a 4-character tab stop indent, followed by spaces for alignment:

function funcName()
{
→   $varname = '1234' . aVeryLongFunctionName()
→   •••••••• . 'foo' . otherFunction();
}

Now, an 8-character tab stop indent, followed by spaces for alignment:

function funcName()
{
→       $varname = '1234' . aVeryLongFunctionName()
→       •••••••• . 'foo' . otherFunction();
}

That looks OK, right? Sure … until a developer, through habit (and we are creatures of habit) hits the tab key for alignment when he should have used spaces. They are both invisible, so the developer won’t notice on his own terminal — it will only be noticed by developers with other tab stop preferences. It is the same problem as before: misalignment under the different tab stop preferences of different developers.

The general response at this point is to modify the tab-oriented style guide to disallow that kind of inter-line alignment. I suppose that is reasonable if we are committed to using tabs, but I find code of that sort to be less readable overall.

Solution: Use Spaces Instead

The solution to these subtle and sophisticated issues, for me and for lots of other PHP developers, is to use spaces for indentation and alignment. All professional text editor software allows what are called “soft tabs” where pressing the tab key inserts a user-defined number of spaces. When using spaces for indentation and alignment, all code looks the same everywhere, and does not mess up alignment under different tab stop preferences of different developers.

Conclusion

I realize this is a point of religious fervor among developers. Even though I have a preference for spaces, I am not a spaces zealot. This post is not evangelism; it is a dissection of the subtle and long-term issues related to tabs-vs-spaces discovered only after years of professional collaboration.

Please feel free to leave comments, criticism, etc. Because this is such a touchy subject, please be especially careful to be civil and maintain a respectful tone in the comments. If you have a very long comment, please consider pinging/tracking this post with a blog entry of your own, instead of commenting directly. I reserve the right to do as I wish with uncivil commentary.

Thanks for reading, all!

Are you stuck with a legacy PHP application? You should buy my book because it gives you a step-by-step guide to improving you codebase, all while keeping it running the whole time.