Regarding Underscores

By | October 19, 2010

Today, PHPDeveloper.org referred to a post by Leszek Stachowski about underscore prefixes on non-public class elements.

The question which comes instantly to my mind is: why? Is there any reason why this convention should be kept when PHP object oriented programming has gone a long way since PHP 4 (when there was no access modifiers and such underscore was the only fast way to distinguish public from, hmm, not public methods and properties) ? Are, for instance (as one of major OOP languages), Java coding standards pushing towards such naming convention? No!

I think that we, as developers, should not stick to this silly convention. For the sake of progress, stop looking back (because that what in fact this convention is) and stop supporting this one, particular naming convention.

I think the underscore-prefix for protected and private is a good convention to keep. As with many things about programming, this convention is not for the program, it is for for the programmer. For example, limiting your line length to about 80 characters is a good idea, not for reasons of “tradition”, but because of cognitive limitations of human beings who have to read code.

Likewise, using the underscore prefix is an immediate and continuous reminder that some portions of the code are not public. Its purpose is as an aid to the programmer. The underscores make it obvious which parts of the program are internal, and which parts are externally available. (Note that I do not extend this argument to support the use of Hungarian Notation in PHP; if something like the underscore prefix is overused, it loses its obvious-ness and thus becomes less powerful.)

As an example, look at the following code:

<?php
class NoUnderscores
{
    protected $data = array(
        'item' => 'magic-data',
    );

    protected $item = 'property-value';

    public function __get($key)
    {
        return $this->data[$key];
    }

    protected function doSomething()
    {
        // do we want the magic public item,
        // or the internal protected item?
        return $this->item;
    }
}

Here we have magic __get() method that reads from the protected $data property. Any time you try to access a property that doesn’t exist, PHP will go to the __get() method and read from protected $data. Now look in the doSomething() method. Because the code executes inside the class, it has access ot the protected $item, so it’s not obvious if the programmer wanted the value of protected $item, or the magic $data['item'].

By way of comparison, take a look at the following modification to use the underscore prefix on private and protected elements:

<?php
class Underscores
{
    protected $_data = array(
        'item' => 'magic-data',
    );

    protected $_item = 'property-value';

    public function __get($key)
    {
        return $this->_data[$key];
    }

    protected function _doSomething()
    {
        // it is clear we want the internal protected item
        return $this->_item;
    }
}

Now the _doSomething() method is perfectly clear: the programmer wants the value of the internal protected property.

17 thoughts on “Regarding Underscores

  1. Matthew Weier O'Phinney

    I agree with what you’re saying in many regards. However, the example is flawed, to my thinking: whenever you’re using overloading to get at properties, you’re going to have similar questions, regardless of the underscore prefix. And if the property is defined, there’s no doubt in my mind which property I’m referring to – it’s the defined one.

    While I like the easy visual semantics of the underscore prefix, I will admit to also liking (a) how clean dropping it looks, and (b) how easy it is to refactor from protected/private visibility to public, particularly when mocking objects for testing. Finally, if I’m encapsulating variables properly, it’s rare that I have public access to properties anyways, other than through accessors and mutators – making the leading underscore fairly superfluous.

    Reply
  2. Pingback: Tweets that mention Paul M. Jones » Blog Archive » Regarding Underscores -- Topsy.com

  3. Herman Radtke

    To me, using underscores is similar to using Hungarian prefix notation. The issue with this kind of metadata is that there is nothing guaranteeing metadata is correct. The only thing that really matters is whether _item is declared private, protected or public.

    I think in most cases, member variables are non-public anyways so the underscore issue is really moot. As for public methods, an interface clearly defines what public methods should be used. Anything else is not guaranteed by the contract.

    Reply
  4. Anon

    The 80 character limitation is because terminals were 80 characters wide , any more would cause line wrapping.

    The typographic guidelines relate specifically to continuous blocks of text, not sparse code, and the problem they solve is tracking to the beginning of the next line, not a difficulty understanding longer lines.

    Reply
  5. eleg

    shouldn’t the editor you use dynamically provide hints about the status of that function/method, instead of hardcoding that status in the name of the function, as the status could change (and then you have to change the name everywhere it is used!)?

    à la “:before” and “:after” css rules.

    pure text editor (B&W screen): underscore prefix dynamically inserted for private

    graphical editor: red/bold for private, orange/italic for protected, green/normal text for public, or a little trafficlight, or …

    Reply
  6. Lukas

    I am also now in the do not use underscore camp for the reasons Matthew mentioned. I should also note that since I use an IDE, there is less benefit from this “hint”, because if I screw up the IDE complains immediately.

    Reply
  7. Martin de Keijzer

    I also do like the underscore character. And not only because of the example given above, but for a more numb reason. In my IDE’s inspector window all methods get sorted alphabetically. If I want to use autocompletion (inside the same class) or scan over my objects methods it’s a gift that methods starting with an underscore are at the top of the list while programming.

    I’m also not agreeing with the original poster’s argument that the underscore should be removed because it’s an old habit. It doesn’t block any new functionality from being implemented and is therefor not annoying enough to stop using it for the sake of backporting ease.

    Reply
  8. Jory Geerts

    To expand on what Lukas said, I also use an IDE, and use autocomplete a lot. If I type $$foo->, my IDE will give me a list of everything I can access from there.
    That means I always know what I can do /there/, and I don’t really care about what I can do at another place in the code.

    Also, the underscore makes autocomplete a little less usable, because it means I sometimes have to type $this->_ when I want to use a property or method.

    Reply
  9. fritz from london

    Completely agree with Matthew. I tried dropping it on one project as a test, and realized what a waste of energy that convention is. You say it’s there to help the programmer, but since editors don’t give underscores any special treatment, it’s just another thing to keep track of in your head, for no extra benefit.

    Reply
  10. devosc

    I never jumped into the underscore prefixing of variables until I started examining or working with the Zend Framework.

    Even then, it became quickly apparent that prefixing protected or private methods is a potential hindrance when later needing to make that method publicly accessible.

    However for protected variables it did (and may?) seem appropriate to prefix them in particular situations (classes), such as in the controller. Unfortunately, PHP does not enforce a coder to ensure that they have declared a variable prior to instantiating it with a variable. And in a controller there might well be a mix of variables, some that are to be assigned to the view and some that should not. So rather than ending up with a controller proliferated with

    $this->view->{var_name}

    through out the controller since if it is known that this variable is also needed in the view, coders tend to just instantiate via (or to) the view and use it through out the controller class code – which can either get ugly or cause another section of code just to pass on view variable assignments.

    However, if with the convention that variables confined to the controller are prefixed with the underscore and those that do not will also be assigned to the view this did/does provide some form of work able convention.

    In the case of the coding withing the controller I think the debate about prefixing or not is half a dozen of one and six of the other. Since there will always be coders who either don’t understand and or neither care – in the magic methods a check could be made to see if the the variable was prefixed and an error thrown notifying that the protected variable should first be defined.

    That said, it would be nice to not have to worry about prefixing at all and have clean code :).

    Reply
  11. Wil Moore III

    I am also in the camp that didn’t like/use underscores for scoping until starting to work with Zend Framework. I got used to it and after a while actually started to see benefits.

    That said, I’ve recently dropped using them and agree with Matthew’s statement:

    “it’s rare that I have public access to properties anyways, other than through accessors and mutators – making the leading underscore fairly superfluous.”

    Reply
  12. pmjones Post author

    Matthew, Wil: I see where you guys are coming from. Well, I’m trying out new things this week (cf. Git/Github for the benchmarking project). I’ve converted that project away from underscores; we’ll see how I like it after using it a for a while.

    Reply
  13. Seva Lapsha

    I’m fully with Matthew on this topic. In addition, modern IDEs provide freedom easy ways to configure highlighting of properties with different access level.

    Reply
  14. hakre

    Line length up to 120 chars with about 100 chars in common is well suited for the programmer. 80 chars (78 to be precise) was born out of a limitation of smaller screens. At least that what the science says, right?

    So do you have any sources that prove your assumptions?

    Reply
  15. Andrew

    I understand that kind of coding practice can be useful, whether it’s underscore prefixes, type suffixes, or any other convention of baking special information into the symbols. But I have seen it break down so often that from a practical standpoint, I never recommend it. Basically, my stance is that if the underlying tool (programming language) doesn’t allow you to specify it at compile time, there’s no point in trying to express it yourself.

    I find that developers who follow these conventions drift easily towards one of two extremes. On the one hand, they go too far with it and start using double or triple prefixes or suffixes to convey additional information. I saw one developer use triple underscores to convey that something was private and being used as an accessor for another private variable that he wanted to name the same as another related private variable. The same abuses can occur with type suffixes. Also, it makes the whole codebase harder to maintain. You have a high chance of having do a lot of refactoring to make some simple change (like toggling between public and private).

    On the other hand, developers hate following conventions consistently, even their own (sure, I’m guilty of it too). When the discipline breaks down, those cute little conventions become a huge rusty bear trap on readability and maintainability.

    Reply
  16. Leszek Stachowski

    I think that your example is wrong. It is obvious (at least it should be) that from outside an object you are referring to its public interface and – on the opposite – from inside you have access to internal properties and they have, let’s say, priority over __get() – in this case. In the matter of fact, rename protected property “item” to ie. “anotherItem” and your problem disappears. Looking that way, the second example is exactly the same ;-) Just the matter of naming.

    Reply
  17. Leszek Stachowski

    ^ Almost the same of course, but still, an underscore doesn’t magically tell interpreter – “Hey, that’s my private property, don’t touch it”, it’s up to the access modifiers. That is their role in OOP…

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *