Liblognorm internals

Parse-tree

A parse-tree is generated each time when normalization process is set up.

You could also call it a optimized rulebase. Each message runs through this tree consisting of parsers and fields and will be compared to it. The message can either fit into a branch or not. If it fits, it can be normalized. If it does not fit any branch in the tree, then a fitting sample has to be created for this message.

The tree is built from branches. These branches consist of 3 things: nodes, paths and parser.

A node is typically a literal part from a message where either a parser follows or there are several subsequent literals which are different, so one of the paths must be selected. After a parser, a node will always follow. Parsers are like variables and thus the core structure of a sample. With these a property field can be filled, which in the end is needed to normalize the message.

A few notes on optimization of a parse-tree.

A parse-tree is always optimized, whether or not the samples of a similar kind are next to each other or not. Even if you make the order totally random, it should always result in the same parse-tree. Therefore, no optimization efforts have to be made to the tree itself. It reuses equivalent prefixes of messages which are already in the tree. Only if a difference occurs, then a new node must follow.

One case where rule order can be significant is when a message can match two or more different rules. This can occur when the rules differ in parsers. If in doubt, use lognormalizer tool to debug.