Saturday, November 29, 2014

Why Static Code Analysis is important... and why we don't care

Static code analysis as of today is crucial as the software today is much bigger than ever. It is even more important as you have many people considering better code based on their previous experience, but they don't agree what means better.

So, based on this, is very likely that a static code analysis tool will help you a lot to see these errors, and these errors are found early, sometimes in a live session (like: Resharper tool - also written as R# - a very great extension of C# from JetBrains).

But what is the main issue of companies not using them (or not at a big scale)? I found very often (in my usage of R#, but I think is similar for many other companies and tools) more systemic causes:
I - (some) developers don't like to be nagged with a lot of errors/warnings when they want to finish the task fast. Having variables not named consistent or the namespace doesn't follow the location on disk is less of a concern. Even more, some libraries are for example with deprecated fields, and tools like R# will warn you that the code is broken (or at least using obsolete fields) which is a distraction given that users will not change libraries which are "legacy"
- these tools offer solutions which sometimes are at best strange or even harder to be understood for the user who doesn't understand/accept the technology. For example, Linq syntax in C# is weird for some users who don't know it, and even applying best coding practices, it makes your code less prone with side-effects errors, but the developers need to know Linq
- there is a cost upfront you have to make that these tools do not complain (much), but some companies focus on: let's implement this feature, then let's implement the other feature, but when you say: "I want to clean the code based on a tool" is a very hard sell. Is more likely a question of: "is this code working better/faster after this?" And answering honest (like: "no, it will make the code just more readable and less likely to have bugs in feature") will not change the decision.
- they are fairly costly - price is always relative, but if you take a license for every developer, things get expensive really fast. And there is no visibile advantage. Sometimes the tool is an easier sell if you say: "we need a better refactoring tool", and management may understand it (as R# is a very good refactoring tool), but otherwise is hard sell for management.
- small codebases very unlikely need these tools, the bigger the codebase is, the likely you will need these kinds of checks. The reason is that in small codebases you can put all the tool does in memory, and even you can forget a null pointer, this will be found early. If you need a tool that for example processes a batch of pictures and makes the a document, you may need very little refactors, but most likely is enough to use just default Visual Studio's facilities.

As for me, when I was a single freelance developer and I was paid (mostly) per finished feature, a license of R# or Visual Studio was very justified, as full navigation of Resharper and not wanting to look for all weird bugs in my code, was a understandable price I have to justify (for myself).

But later, when I worked in bigger companies, I did not see people using R#. The closest to R# is NRefactory, which brings a similar (but much buggier) experience and with many many limitations. So if you don't mind to work with a much buggier experience of Xamarin Studio or SharpDevelop, please go for it, at least to know what is this static code analysis about!

What I suggest to you anyway is to use a static code analysis, not at least because is easier to let a tool to look if you forget a null pointer check, than to you to do it, or even worse the user to notice it on their machine.

Monday, November 3, 2014

Poor Man's Parser in C#

I am sure that anyone dreaming about writing a compiler will look at least once into parser-generators or things like it. But some may find it daunting and the worst of it, some of them need to have very hard to debug and setup.

So how to write your own parser - which is easy to maintain/write.

Ok, so first of all, please don't blame me if you care about performant parser engine or you want to to do the "correct way". Also, this strategy I found it to work with somewhat simple languages (let's say Pascal, MQL (MetaTrader's trading language), C, and things like it.

The parser has the duty to make a tree based on your tokens.

Steps:
- write a lexer that gives a list of tokens. You can start from this implementation. It should be very easy to change it for example that some IMatcher-s to accept specific words
- create a Node in a tree with arbitrary number of children as following:
    public class BlockAstNode {
        public List<BlockAstNode> Items = new List<BlockAstNode>();

        public TokenDef InnerToken { get; set; }
        public bool IsTerminal { get; set; }
        public BlockAstNode()
        {
        }
      
        public BlockAstNode(TokenDef token)
        {
            InnerToken = token;
            IsTerminal = true;
        }

        public BlockAstNode BuildNonTerminal(int startRange, int endRange)
        {
            var result = new BlockAstNode ();
            for (var i = startRange; i <= endRange; i++)
            {
                var node = Items[i];
                result.Items.Add(node);
            }
            Items.RemoveRange(startRange, endRange- startRange+1);
            Items.Insert (startRange, result);
            return result;
        }
  }
- fill a "tree" with one root and as children create AstNode children for every token (using the constructor with TokenDef, where TokenDef is your token class)
- make a bracing folding. Most major languages support a folding strategy. In Pascal is: "begin .. end", in C is { to } or for Ruby/VB: <class> to <end> or <def>..<end>. Every time when you find a block that starts with your begin brace token, you fold it using BuildNonTerminal method, and the result node is placed instead of the full node. Try for simplicity to find the smallest block that can be folded, and repeat until you cannot fold anymore.
- add more folding strategies for common cases: "fold all comments tokens"
- add fold for reserved words tokens: class, while, if, (interface/implementation) and they are separated iterations that visit the tree recursively and simplify the major instructions

Why this works better than regular parser generators?
- every strategy is very easy to debug: you add multiple iterations and if you have a crash in any of them, you will likely have a mismatched index (with +/- one, but no ugly generated code)

Enjoy writing your new language!