Ever thought about writing your own programming language?
If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!
One of the most serious decisions you can make as a programmer is to write a new programming language. Or to write your own compiler or interpreter for an already defined language. There are a whole host of questions to answer before you even start contemplating such a move. I have made the decision a few times and worked on several other projects that also made the decision; of those projects some were fully justified others could have been done better with already available languages.
Virtual Machines
For every programming language you have the compiler and the virtual machine (ignoring interpreters for the moment.). The compiler outputs some form of byte code which is then executed by a CPU (discussed below) or alternatively by a virtual machine. The virtual machine is a software level CPU that reads from a data stream and performs actions as it goes along. The advantage of a VM is that they are cross platform and allow languages like Java to be executed across a range of platforms. The VM also simplifies the task of compilation as the VM can contain as much functionality as the compiler designer likes. An example of this is memory management, the Java virtual machine uses garbage collection which simplifies what the compiler has to generate because memory management at runtime is more flexible.
Compiling for a CPU
The alternative to running within a virtual machine is to write op-codes for a specific CPU. The obvious advantage is execution speed. The downsides are that at bare bones levels you have to implement everything yourself (a massive task) you also need to take advantage of whatever CPU you are writing for by playing to it’s strengths, this normally means having a intimate understand of the CPU architecture, today’s 80×86 (and beyond) are amazingly clever at executing truly terrible code. For RISC style CPU’s the compiler needs to generate output that takes advantage of the available registers and understand any bottlenecks.
One area where working bare bones is common place is the game industry, ever increasing demands for better looking games always means that however fast the games machine there is always a requirement to write at the very lowest level to get that little extra speed. A common practise is to use a standard C/C++(other) compiler and then hand tweak the assembler code afterwards. The latest Sony PS3 has a number of RISC cores which are blindingly fast but do not have any predictive branching models so do not execute bad code very well. This is partly the reason that the first batch of PS3 games are only just comparable with the XBOX 360 games. The 360 has 3 general purpose CPU’s which run bad code without slowing down unduly, but long term the performance gains Microsoft will be able to achieve will not match that of Sony.
Macro Languages
A macro language (to me at least) is a set of rules that do not have the same demands than that of a complex programming language. This means they tend to be in the format of a single instruction type followed by some parameters, they may include some rudimentary rules and even some form of sub-routines. The advantage of a macro language is that they are quick to implement.
The Syntax and first example
Writing a whole new syntax for a programming language is not something I will go into detail as its a subject that could take up several books. In my experience a lot of custom language solutions have based their syntax upon the C. The reason for this being that C is ‘relatively’ easy (and well known) languages to parse. It also allows for quite a range of extension without fundamentally changing the nature of the language.
I worked on one project were we wrote a complete C interpreter that alongside its own virtual machine allowed us to change the scripts at runtime. The justification for this was that design time and implementation time of game play routines was cut because it did not require the whole game to restart to see changes take effect. Although these goals were met it was not without its problems. Because the code was being executed by an internal virtual machine the performance started to become a problem, most games spend a large proportion of their time rendering their graphics, the burden of running an internal VM meant that the graphics power had to suffer. It was also found that although changes could be made at runtime it was sometimes led to restarts being required because the VM crashing could still take down the rest of the game.
A large email problem
Several years back I was working on a project that required us to deal with 400,000 customers moving from one mail system to another. The change required their email address to change and the company was well aware of the problems that this was going to cause. The biggest problem was that the volume of customers and the timescales within which it was meant to happen. I understood at the beginning that the requirements of the system were likely to change and that any downtime was not an option.
My solution was to write a simplistic C compiler and virtual machine that was very specific to dealing with the kind of actions involved in the project. The VM had access to the email queue, it could view, edit, delete and send out new emails. We had a set of scripts that ran upon the email queue and these could be stopped and started from a control panel. Although it took two months to write the framework, compiler and VM the result was that when it came to altering the scripts we had the ability to change them without fear of breaking the rest of the system.
As I said the C was quite simplistic, it did not allow for functions or classes, it did allow one script to start off another script (or stop). We could have chosen to go with a simple macro language but during the course of the project it proved its worth when the complexity of the scripts pushed the limitations of compiler. If we had gone with a macro language we would have ended up writing ‘real’ code which would then have required much more testing.
Reasons why you should or should not write your own language.
- You need to embed your language into your project.
- You need a subset of a known language
- You are writing for a new hardware platform (rare)
- You need an embedded compile at runtime language that is integrated tightly with your product.
- You have a totally unique requirement which would be best met with a new syntax of language.
- We can resell it later
Bad reasons..
- It sounds like fun..
- Because its the ultimate programming challenge
- It will waste a few months
Comment by Eric B on 28 April 2007:
That it’s fun is a perfectly good reason to write a programming language, or reimplement another. However, don’t charge someone else (your employer or client) for such a privilege.
Comment by Mark on 29 April 2007:
This does not have the ring of truth. There’s a lot of context sensitive crap in the C language. HOWEVER if by C you only mean scopes delimited by curly braces and statements ending with ‘;’ everything else being dynamic and no pointers then I will grant that C is easy to parse…. No that still doesn’t sound right.
Comment by Paul on 7 June 2007:
Is there any point these days? You can embed so many high level languages these days it seems a waste of time, unless your just doing it for fun!?