2009-07-31 20:58Whitespace standardsWhen writing code, it is good to be consistent about how you use whitespace. When collaborating with others, it can actually be detrimental to your group’s productivity if there is a mix of systems, so the sensible thing is to pick a standard early on and stick with it (even if that standard only defines the on-disk format, and individual programmers use editors which present the code to them in the way that works best for them). The policies that people choose may be partly a matter of taste, but there can still be logical reasons for preferring one system over another. Even though everyone weighs different pieces of evidence differently, I am going to state the whitespace system I use for my personal projects and list its good points. Maybe it will inspire someone into agreeing with me, or at least help them to be clearer about which system they prefer and why. NewlinesI will follow the example of a well-written blog post I found online called Inconsistent Whitespace (even though I don’t agree with all the recommendations there) and start with a section about newline characters. This is one of the less contentious issues today, presumably at least partly due to Mac users having largely switched to Mac OSX which uses the UNIX standard for newlines rather than the older Mac-specific standard. That effectively leaves two standards, the Windows standard and the UNIX standard, and although these two systems do cause interoperability problems, they do not seem to cause arguments, possibly because they are a relatively minor front in the wider UNIX versus Windows flamewar. Nowadays most editors can render either system correctly, and probably save under either system too. The only possible exceptions to this are Notepad from Windows, which I believe refuses to understand anything other than the Windows-specific format, and vi which renders ^M at the end of lines which use the Windows format. It could however be argued that this is a feature, as it makes it easy to detect inconsistent newline use. Regardless of the behaviour of these programs, though, there is an independent de jure standard which should settle the matter. The XML standard mandates the use of LF to mark ends of lines, which is fortunately and coincidentally the same as the UNIX standard. BracketsAs perhaps a subset of the newline problem there is the question of where one should enter them in the document. Curly brackets (which I call “squigglies”) are often accompanied by newline characters in curly bracket programming languages, but different coding styles suggest different arrangements. I won’t list all the possible systems here, and will merely indicate that there is a detailed Wikipedia article on the matter for people who wish to see lots of “wrong” ways to write code. The “right” way is of course “ANSI style”, named after its use in the documents describing the ANSI C standard (also standardised as ISO/IEC 9899). Obviously it is appealing to choose a de jure standard again, and the article gives other reasons in favour of it too. The article does list some potential problems with it, though, which I shall address here. Firstly it says that having a line of code which just contains a single bracket is wasteful, but I think most programmers use even completely blank lines in code to split up logical chunks and aid readability, so there is no lower bound on the amount of information required on a line. Also, as the article suggests, nowadays monitors should be capable of holding more lines than any programmer needs to see at any given time. The other disadvantage mentioned in the article is that if lots of lines only contain a single open bracket, then that makes it harder for a diff-based version control system to match up blocks of code. There is no suggestion that people should avoid using blank lines, or lines containing a single close bracket, though, so without any concrete data on how much more likely ANSI style code is to suffer from this mismatching problem, I’m going to write this off as FUD. Perhaps someone could find the top 10 most popular open source packages, download the most recent 100 revisions from their source repository, format each of these revisions under both ANSI and a competing style, then run the version control commands which are expected to cause the problems. Reformatting and comparing the resulting code should give a fairly good idea of whether ANSI style code really does cause errors like this, or rather, whether diffing algorithms really do need to be rewritten to work better with ANSI style code(!) IndentationAlthough the Wikipedia article about coding styles is useful for comparing whether to put squigglies on their own line, the title of the article is actually “Indent style”. I was somewhat surprised to see that even among people who agree that open curly brackets should get their own line, there are disagreements about how much to indent those brackets relative to the start of the previous line. Fortunately the ANSI standard is not ambiguous on this point, as it says that an open squiggly and its previous line should have the same indentation, so I do not need to argue for this choice separately. Of course, ANSI style does require that some lines be indented, and it is here that the standard starts to get ambiguous. It might be nice if it could specify every detail about indentation, but to do so would have meant taking a view in the so-called Tabs versus Spaces debate, making it very unpopular with one or other of the two sides. The disagreements and rants which are caused by this debate are even more heated than those about bracketing style, which is, I think, for two main reasons. Firstly, there is the fact, as the influential Jamie Zawinski points out, that there are actually three separate issues related to tabs versus spaces that people care about: how wide the indentation of a block of code should be, how that indentation should be stored on disk, and what a text editor does when the tab button is pressed. This makes it very difficult to have reasoned discussions and for the different sides to understand each other. Secondly, there is the fact that these characters, being whitespace, are often invisible in text editors, whereas squiggly brackets are not, so it is possible to inadvertently mix the two systems within a file, which is sometimes only noticed when someone else then tries to read the mangled document that results. Missing the pointSuch a contentious issue among programmers has unsurprisingly lead some of them to blog about their opinions, with some presenting the reasons in favour of spaces and some refuting those reasons and explaining why tabs are better. I must admit that I am on the pro-tab side, but I don’t want to rehash all the old arguments here, especially as any readers of this post have probably already made up their mind. However, I would like to point out a few lines of thinking that keep cropping up in “pro-space” arguments which either I am just not understanding, or the pro-spacers don’t realise that there is an obvious counterargument against. For example, I have come across pro-space sites which say things like “Nobody can agree on how many spaces a tab character is” or “not every editor uses the same number of spaces per tab character” or “to solve [editors displaying tab characters differently and acting differently when the tab key is pressed] … program your editor to expand TABs to an appropriate number of spaces” (emphasis added), but these arguments don’t seem to mention why a disagreement about the size of tabs should cause practical problems. It is true that no one can agree on how many spaces a tab character should appear as, but as Wichert Akkerman explains, a worse problem would be people using spaces for indentation and not agreeing on how many. If the file contains tabs, then each user can choose for themselves how many spaces this represents, but if someone uses spaces instead of tabs then this choice has been made for you, with seemingly no benefit other than a spurious “Look! Now you agree with me!” Similarly, just as it is true to say that not every editor displays the same number of spaces per tab character, it is equally true to say that every editor displays every tab character in a file the way that the user of that text editor wants. Finally, it is not a problem if the tab key does different things in different text editors, as it always does what the user wants it to, and since there is no consensus on what an “appropriate” number of spaces is, the only thing that is achieved by mandating spaces is to hard-code one person’s set of assumptions into a file. I don’t want to make exactly the mistake that I am accusing the pro-spacers of by failing to imagine counterarguments to what I am saying, so I have thought hard about what sort of problems could be introduced by different editors displaying tabs differently, but I have only come up with one potential problem. If your code contains columns which you want aligned then you should not do this with tabs as columns implicitly require a fixed number of characters, which cannot be reliably made using tab characters. There is no such requirement, as far as I can see, with normal indenting, though, so as the talented Salvatore Iovene writes, “use what ever you want for indenting, but use spaces for aligning”. The other line of thinking which is inexplicably prevalent in pro-space writings is best exemplified by a blog post by a Perl fan and a comment on it that I found. They both seem to blame pro-tabbers for the existence of files which have a mixture of tabs and spaces, without considering the possibility that a pro-spacer could be adding space-indented lines into a file that was written with just tabs. For example, the blog posts says “Tab users like to tout that it allows you to use whatever indentation level you like when in reality you just wind up with mixed up tabs and spaces.” but I could write the same sentence starting with “Space users like to tout that it lets you force on people whatever indentation you like…” and this wouldn’t help people choose between the two systems either. More subtly, the comment says “starting line 334 you have a group of lines that are indented with space rather than tabs. You can’t see that because as [the Perl fan] says, tabs are invisible.” but all this proves is that mixed files are difficult to detect. When I read something like “tabs are so hateful… THEY’RE INVISIBLE!” I can’t help wondering what spaces look like in that person’s editor. Tab widthGiven that I think indentation should be stored on disk as tab characters, allowing the user’s editor to decide how to render this information, I do not need to specify a standardised tab width (which is one less thing to argue about). For the sake of completion, though, I will specify that I personally configure my editor to make tabs take up as much width as 4 spaces. At least part of the reason for doing this that 4 is a power of 2 (and all significant numbers in computing should be a power of 2) and it makes code look less sparse than 8 space tabs and less cramped than 2 space tabs. This is, however, a matter of preference and I could even put up with reading nervous twitch 3 space indents if I knew that those indents weren’t being saved as spaces. As for what this actually looks like in my editor, I do actually see little grey » signs that show where I have tabs, which makes it much easier to spot files which have a mixture of tabs and spaces. ConclusionUsing ANSI style brackets avoids any disagreement about how much to indent those brackets and whether to hug your Given that consistency is the most important thing, then, perhaps a partial solution is possible. It has long been possible in certain editors to store file-specific formatting rules within a given file as a comment, which can be parsed by the editor, but unfortunately these systems have been implemented without compatibility and standardisation in mind. There is no reason why a cross-editor standard couldn’t be written though, and in fact there is one: the Plain Text/Source Code (PT/SC) file header. This system has even been proposed as an Internet-draft, but is sadly lacking implementations. I’m sure someone could file bug reports with the top 10 open source text editors asking them to support this, or even write plugins for them (probably requiring skill in 10 different languages), but that someone would have to have about a year’s worth of free time. Once all the other editors followed suit and this became a de facto then de jure standard, we could then push for editors to support more exciting standards like elastic tabstops, which would save pro-spacers and pro-tabbers alike from having to bash the space bar just to get their second row of arguments to line up with the end of the function name. Or instead of getting all the editors in the world to support these improvements, we could just get everyone using the same editor. So, which editor is the best? Trackbacks
Trackback specific URI for this entry
No Trackbacks
|
QuicksearchCategoriesSyndicate This BlogBlog Administration |