I Need to Make Some Space in my Head – What can I Throw Out?
This is the second article in an informal series of shock exposés in which I lift the lid on the white-knuckle ride which constitutes the daily rough and tumble of the E-Quip gerontocracy. Over the years I have written a lot of code, a fair proportion of which seems to work, and somehow or other I seem to remain in control of all the 0’s and 1’s floating around in my head. There are, however, two exceptions – bits of code which, while not exceptionally complex, I seem to have to re-learn every time I look at them.
The first example is Transformation Matrices, which are central to our 3D graphics and CAD applications. In case you don’t know, if you represent a point in space as a 4-element column vector (called homogeneous coordinates), there are several matrices that you can multiply this vector by to do things like rotation, scaling, transformation etc. This is how CAD systems manipulate graphical objects. It’s not rocket science and doesn’t really involve anything more complex than matrix multiplication. Twenty or more years ago I wrote a graphics library to handle all this good stuff and, for the most part, it just works. It doesn’t need changing very often: once to add support for Windows (that will give you an idea how long this library has been around), again to add support for 32-bit processors (did I mention that this code has been around for a while?), once to convert it from C to C++, once to add support for 3D, and again to convert it to C#. The trouble is, every time I look at this code, it’s like it’s the first time I’ve ever seen it! I don’t just have to “brush up” on my theory, I have to dive back into Foley & Van Dam or reach for my old university lecture notes (yes, I do have all of my old lecture notes). These algorithms (or, perhaps more accurately, how I turned them into code over 20 years ago) just won’t stay in my head.
The second example is expression evaluation, which lies at the heart of filtering in all of our products. There is absolutely nothing difficult about parsing an expression from a given grammar; it’s a pretty common 1st-year undergraduate assignment which comes somewhere between the matriculation dinner and finding your coat-hook and locker.
A grammar defines a set of rules which are used to build an expression tree. Grammars are often described using BNF ( Backus – Naur Form), and they specify things like how factors are combined to form terms, and how terms combine to form expressions. A very simple example is shown below. Of course, being software developers we have to be different, and in the interests of bio-diversity, our trees have their roots at the top and their leaves at the bottom. I have yet to work out where worms and/or birds appear in these trees, yet alone which way up the nests need to go.
Each node of this tree has three key features:
1. An operator (+, – etc)
2. Two operands, Op1 & Op2
A really simple expression, like “1 + 2” would be represented by a single node tree where the operator is +, Op1 is 1 and Op2 is 2. In more complicated trees like the example above, both Op1 & Op2 can be other tree nodes.
Filter Expressions
A filter is also an expression, albeit a Boolean rather than an arithmetic one, and so they are handled in a very similar way. The tree is slightly different to a simple arithmetic expression because the operands are predicates, i.e. statements which can evaluate to true or false. The tree below is taken from the ICL Reference Documentation (ICL is a set of class libraries used for all Integra products): it is the parse tree for the filter: A = B OR ( A = 5 AND B = 3)
Note that brackets can be used to change the way that the tree is built, but do not actually appear in the tree itself. Turning this parse tree into SQL (that’s the language used to get data out of the database) is simply a case of doing a recursive, left-right tree walk of our upside-down tree, starting from the root (i.e. the bit nearest the sky) translating each node into SQL syntax as we go.
On the off-chance that you’re interested, compilers translate entire software programs into parse trees in exactly the same way as this, the only difference is that the grammar is more complex, with each programming language having its own grammar. Compilers often have a front end which builds the parse tree, and then a back end which does the tree-walk to turn the tree into assembler instructions.
Now, every time I have to make any changes to the code which implements this, I run straight into the old mental block and have to go right back to basics. Fortunately this isn’t code that changes very often, but, like transformation matrices, there just doesn’t seem to be any room for this in my head on a long-term basis.
What’s the Problem?
The thing that makes this complicated isn’t creating the parse tree itself; that really is pretty simple stuff. There are three things about filters that make them complicated-ish:
1. They use public names for data, rather than the names that are actually in the database. The predicate [Status = ‘Active’] must be translated into “EquipmentStatusShortName = ‘Active’“. This means that when we do the left-right tree walk (keep up!) to translate the filter into SQL, each public name has to be looked up in the database to find its equivalent database name.
2. They are defined in client-side code and have to be sent to the server. E-Quip uses a type of formatted text called XML to communicate between the client and the server, and unfortunately there are characters (<, > and ‘ are the most common) that have special meaning in XML but which are commonly used in filters. An expression like [Unit Price > 5000] can’t be sent directly to the server because XML is using the > character for its own purposes.
3. The filter processing code itself has many reserved words and characters which users also might want to include in filters. For example, the filter [Status = ‘Scrapped – Not in use’] actually includes the words “Not In” which have special meaning to the filter mechanism. We also use the [ ] characters to delimit predicates which makes building the parse tree much easier, but which cause problems if users actually want to include those characters in filter text. To make things even trickier, [ & ] are actually SQL wildcards: “Model LIKE 3[1-5]00” means all models where the name has a 3 followed by any number between 1 and 5, followed by 00, i.e 3100, 3200, 3300 etc.
Problems with BETWEEN
In fact, it is exactly this last point that has prompted me to write this article. A filter expression is a series of terms separated by OR’s, each of which can be a series of factors separated by AND’s, just like an arithmetic expression is a series of terms separated by + and -. You might not have noticed, but in version 2.0 support for the BETWEEN operation was added to E-Quip filters. Now, most predicates are made up of three bits: a field name to filter on (e.g. Status), an operator (e.g. =) and a comparison value (e.g. Active). Put these three together and you get the filter [Status = ‘Active’].
BETWEEN is unusual in that it doesn’t have three components, it has four, and the last two are separated by the word AND, e.g. [Purchase Date BETWEEN 1-JAN-2010 AND 31-DEC-2010]. This is particularly inconvenient, not only because of the extra operand, but we have already seen how the word AND is used to separate the factors in a filter expression. Although this doesn’t cause any problems when evaluating the filter, it does confuse the Filter Wizard when you open a saved Query Builder filter which contains BETWEEN. This relatively simple type of filter is always made up of a number predicates ANDed together, and the Filter Wizard uses the word “AND” to split a filter expression up into its constituent elements. It then writes each factor to a separate line in the Query Builder. This extra AND convinces the Query Builder that the expression has two factors:
1. Purchase Date BETWEEN 1-JAN-2010
2. 31-DEC-2010
Neither of these terms is valid according to our grammar, and if you applied this filter you would get an error. Note that the problem only appeared in the Query Builder and not in any of the other Filter Wizard options, because only the Query Builder attempts to split an expression into individual terms.
The solution to the problem is pretty straightforward: because we are in charge of the translation we can use whatever words we like. So, instead of using BETWEEN … AND, I just changed the code to use FROM … TO. Hey presto – the extra AND has disappeared and sanity has been restored. We do a very similar thing for the LIKE predicate. Strictly speaking the opposite of LIKE is NOT LIKE but that extra word is problematic and makes the creation of the parse tree a tad more complex, so we use the word UNLIKE instead (try it – create a filter using the DOES NOT CONTAIN comparison and look at the filter that is produced).
So, it took me 5 minutes to come up with a solution to the problem, and then the rest of the day to dig through the documentation (yes, we do have documentation) to remind myself how this stuff all hangs together and to check to see if my simple fix would introduce other problems.
So that’s it for filters – until the next time. By the time I have clicked on Publish to put this article on the blog, everything that I have just re-taught myself about about filter expressions will have been flushed from my brain yet again. Perhaps I should be glad that there are only two things that seem not to fit in there!