This is an opinion piece by one of our employees, and not an official statement of Originate. At Originate we celebrate diversity of opinions around technology as one of our greatest strengths, and we encourage our employees to share their ideas on our blog.
These are amazing times to be a software developer. We have access to a vast and multi-faceted ecosystem of very well thought out programming languages crafted by masters. More are added regularly, and they are typically free to use and adapt.
As somebody who looks at a lot of source code every day, needing to evaluate its essence, architecture, and structure quickly, I have learned that less is often more when it comes to syntax. In my opinion, this is especially true for infrastructure elements like paretheses, curly braces, or semicolons. This blog post is an attempt at comparing their costs and benefits somewhat objectively.
Before we dive into this, let’s remind ourselves that the #1 reason for using a programming language is enjoying using it and getting stuff done. Shipping things. Most languages achieve that, truly awesome software has been written in many different stacks, and that’s a very good thing. With that said, if you are into a bit of recreational bikeshedding about programming language syntax, and you don’t just want to stick to what you and others are used to, read on!
Upfront some simple rules that I think we all can agree on, which we will use to judge the different approaches:
simpler is better: if there are two alternatives that provide more or less the same functionality, and one is simpler, easier, or smaller, then that is the preferred alternative
robustness: the alternative that provides less rope for people to hang themselves with, i.e. less possibilities to mess up wins
the signal-to-noise ratio of code should be high. This means most of the source code should describe business functionality. Boilerplate (code that is just there to make the machine work) should be minimal. Said another way, code should express developer intent as directly as possible, without too much infrastructure around it.
Curly braces replaced the even older
end statements in Algol, Pascal, and friends.
Which was a big step forward since it saves a good amount of typing without losing expressiveness.
The question is, are curly braces the final step,
or can we optimize things even more here?
Let’s look at some code.
What does this snippet do:
Did you spot the typo? Hard to read, right? That’s because this code solely relies on delimiters like curly braces, parentheses, and semicolons to describe code structure. Let’s format this more human-friendly by writing each statement on its own line and adding identation:
1 2 3 4 5 6 7
So much more readable! And it’s much easier to see that a closing curly brace is missing at the end. This experiment demonstrates that people use indentation as the primary mechanism to infer code structure, and use curly braces only as the backup, in edge cases like when the indentation is messed up and not obvious.
Braces also introduce extra categories of errors. Forgetting to close a curly brace will cause a program to behave in unexpected ways, even though the actual statements in it (the code that computes arguments into results) is sound. Indenting code correctly but misplacing curly braces results in code that does something else than what most programmers would expect when only briefly looking at it.
Code with curly braces still has to be indented properly. When we do that, this code uses two different ways of expressing code structure: indentation and curly braces. Humans mostly look at the indentation, and “filter out” the curly braces. Parsers look only at the braces and ignore indentation. Because both of these stakeholders should always agree 100% on what the code is supposed to do, these two ways of expressing code structure must always be in perfect sync with each other. As an example, this code here compiles and runs fine, but is misleading and hard to maintain since it is improperly formatted, i.e. the indentation is wrong:
1 2 3 4 5 6 7 8 9 10
So if indentation is the most important aspect for humans, humans filter out curly braces most of the time, and any time indentation differs from code structure we end up with misleading code, what would happen if we got rid of curly braces at all and only used indentation?
1 2 3 4 5 6
This is still clear and unambiguously structured code, understandable by both humans and parsers. It is also horizontally and vertically more compact, i.e. uses less lines. It is more clear, and it avoids a number of issues like forgotten opening or closing curly braces, or whether the opening curly brace should be on a separate line or not. Because indentation errors are parse errors for white space sensitive languages, we can rest assured the indentation is always correct, and we don’t need formatters or linters to correct it for us, which people might or might not run at all times.
At best, when used correctly, curly braces are just there, don’t add much readability, and get filtered out by humans, since readability is mostly provided by indentation. At worst, when curly braces don’t match indentation, human developers can be seriously mislead. While proper indentation is a necessity, as we have seen above, curly braces are redundant and unnecessary. That’s why we call them line noise. They are just more stuff to type, more stuff to verify, and exist mostly to satisfy our habits at this point. They are a legacy, and according to our rules above we are better off simply leaving them out moving forward.
Semicolon to terminate lines
What is the difference between these two code blocks?
1 2 3 4 5 6
1 2 3 4 5 6
Nothing. Nobody needs semicolons at the end of lines. Nobody misses them if they aren’t there. Their only use is to separate multiple expressions on the same line. So they should be optional.
Parentheses around expressions
Next, what about parentheses around expressions? Let’s remove them from our example code block and see what happens:
1 2 3 4 5 6
It’s still clear that
b are parameters to the function
that we log
a on the next line,
and then check if
a is larger or equal to
Similar to curly braces, parentheses are a redundant way of expressing code structure and developer intent. The real world isn’t always as simple as this example, so more complex code can definitely benefit from parentheses to group sub-expressions. But there seems no need to enforce them being there for simple cases. Let’s make them optional, so that we can simplify our code where possible, without giving up the ability to structure more complex expressions unambiguously.
The most widely used ways to add comments to source code are via
/* */, and
Let’s look at C-style comments first:
1 2 3 4 5 6 7 8 9 10
Now let’s look at comments via a single character:
1 2 3 4 5 6 7 8
Both code snippets do the same, and are equally well readable. The first version uses 19 comment characters and requires indentation of subsequent lines in multi-line comments via a space (which I count as characters here, since they need to be typed and be verified as well). The second version only uses 7 comment characters, without any need for indentation, and results in less lines of code.
According to our rules the second version wins.
Spaces vs Tabs
The arguments for using tabs to indent code are:
- because it saves you characters (1 tab vs 2 or 4 spaces), making the source code file smaller
- because it allows different people to indent their code in different amounts, by configuring the tab size of their text editors
- it avoids bikeshed debates how deep code should be indented
The first argument comes from limitations of computer systems 60 years ago and is no longer valid. The arguments against tabs are:
- the default tab size (8 spaces) is clearly too much. This means EVERY PERSON in the world who looks at code now has to configure EVERY TOOL and EVERY WEBSITE they use to write or view code to their preferred tab size. On EVERY MACHINE they use.
- there are tools that don’t allow to configure the tab size, for example many websites with embedded code snippets or many low-level terminal commands. These things that are often used to dispay source code.
- the tab character is hard to insert into input fields, for example in search-and-replace forms, since it is also used to switch to the next input field. People often have to copy-and-paste a tab characters from their source code in order to search for it.
- not all keyboards have a TAB key. For example most keyboards on mobile devices are lacking it. Mobile devices play a larger and larger role in our lives, including in software development. I review a good amount of code on my mobile device, and code reviewers sometimes need to provide code snippets as examples.
- the tab character was not invented to indent source code. It exists to make it easier to format numerical content into columns and tables using tab stops. There are no columns, tables, or tab stops in source code.
- Formatting using tabs doesn’t support many ways of formatting code in readable ways.
Let’s look at a few examples around the last point. One is formatting function calls with larger lists of complex parameters:
1 2 3 4 5
This code draws a circle and calculates the parameters for it inline. Because the arguments are so complex, we want to put each one on a separte line to separate them visually. Putting them behind the function name makes it visually clear that they are parameters for this function. The equivalent code using tabs needs to move all arguments to the next line:
1 2 3 4 5 6
The problem with this way of formatting is that the arguments to
now look too much like an indented code block.
This is confusing, especially if there would also be an indented code block right below it.
Another – quite common – use case where tab-based formatting falls short is method chaining: With spaces the code can be formatted nice and clear:
1 2 3 4
This makes it clear that we do a bunch of things with a
we again have to move the call chain to the next line,
making it look too much like a nested code block:
1 2 3 4 5
Based on these examples, we can conclude that using tabs to indent source code may sound good in theory, but only works for simple code in environments where only few tools are used.
Let’s also evaluate the pros and cons of using spaces. Pros:
- code looks exactly the same in all tools and environments
- more flexibility around expressing complex code structures elegantly
- works with all devices and keyboards
- opens up bikeshed debates about how many spaces to use to indent code
- opens up bikeshed debates around what “elegant” code formatting looks like
- formatting code with tabs feels a bit more semantic in nature, while using spaces feels more presentational
To finish this, let’s talk about how many spaces should be used to indent code. The most common candidates are 2 and 4, since we can (hopefully) agree that 8 is too large and 1 is not enough. Given that many communities like to limit the width of code (to something like 80 or 100 characters), 2 seems preferrable since it leaves more horizontal space for code. So the question is, are 2 spaces enough indentation? Let’s look at our initial example to find out:
1 2 3 4 5 6
1 2 3 4 5 6
Both styles work, 2 spaces uses less horizontal space, so it wins.
Following our rules has led us on a journey into minimal syntax, i.e. syntax that minimizes line noise and tries to get out of the way to let the business logic and developer intent shine through better. This is something that most of us deal with every day, but often don’t spend much time thinking about. Many modern languages (i.e. those designed in the last two decades) are adopting many of these elements in some form, with certainly more to come in the future.
Hopefully this was an enjoyable read, and some food for thought when you decide which language to learn next, or design your own language or data format. No matter what language you end up using, happy coding! :)