Blog

  • liblognorm 0.1.0 has been released

    Liblognorm is a event and log normalization library that is capable of real-time processing. It provides the capability to normalize events to a set of standard formats. It is most efficient when used together with almost unstructured data, as for example found in typical syslog messages. While liblognorm provides a service similar to other projects, it is unique in the way it works:

    a) As a library, it’s services can be used by any third-party application with ease. As such, other projects can benefit from liblognorm functionality without the user even knowing.

    b) Liblognorm is very fast, and especially much faster than regular-expression based solutions. This is possible because it uses a special data structure which (kind of) combines many regular expressions into a single, and thus faster, big one.

    This is the initial public release, targeted at early adopters. We will continue to enhance considerably, but the 0.1.0 version offers decent stability and features and so can actually be used.

    You can download it here.

  • liblognorm 0.1.0

    Download file name: liblognorm 0.1.0

    liblognorm 0.1.0
    md5sum: 72d90438b21e23805a0ab9312916782d

    Author: Rainer Gerhards (rgerhards@adiscon.com)
    Version: 0.1.0 File size: 0.307 MB

    Download this file now!

  • Rulebase

    A rulesbase is a collection of rules. The rulebase is the core of the normalizing process as it holds the information that is needed to transform logs into a common format.

    Upon execution of the normalizer, it will be transferred into a parse-tree.

  • Rule

    A rule is a a scheme that fits to a specific type of logfile from a specific device. It consists of multiple fields, which reflect the type of information. Many of these rules together build the rulebase.

  • log normalization with rsyslog

    We just wanted to give you a quick heads-up on our current development efforts: We have begun to work heavily on a message modfication module for rsyslog which will support liblognorm-style normalization inside rsyslog. In git there already is a branch “lognorm”, which we will hopefully complete and merge into master soon. It provides some very interesting shortcuts of pulling specific information out of syslog messages. We will probably promote it some more when it is available. IMHO it’s the coolest and potentially most valuable feature we have added in the past three years. Once we have enabled tags in liblognorm/libee, you can even very easily classify log messagesbased on their content.

  • Creating a graph of the rulebase

    To get a better overview of a rulebase you can create a graph that shows you the chain of normalization.

    At first you have to install an additional package called graphviz. Graphviz is a tool that creates such a graph with the help of a control file (created with the rulebase). Here you will find more informaton about graphviz.

    To install it you can use the package manager or the yum command.

    $ sudo yum install graphviz

    The next step would be creating the control file for graphviz. Therefor we use the normalizer command with the options -d “prefered filename for the control file” and -r “folder of sampledb”

    $ ./normalize -d control.dot -r /home/Test/messages.rb

    Please note that there is no need for an input or output file.
    If you have a look at the control file now you will see that the content is a little bit confusing, but it includes all information, like the nodes, fields and parser, that graphviz needs to create the graph. Of course you can edit that file, but please note that it is a lot of work.

    Now we can create the graph by typing

    $ dot control.dot -Tpng >graph.png

    dot + name of control file + option -T -> file format + output file

    That is just one example for using graphviz, of course you can do many other great things with it. But I think this “simple” graph could be very helpful for the normalizer.

    Please find below a sample for such a graph, but please note that this is not such a pretty one. We will update that graph as soon as we have a adequate one. Such a graph can grow very fast by editing your rulebase.

    graph sample
    Click to enlarge.

  • Creating a rulebase

    A first example for a rulebase you can download at
    http://blog.gerhards.net/2010/11/log-normalization-first-results.html

    I will use an excerpt of that rulebase to show you the most common expressions.

    rule=:%date:date-rfc3164% %host:word% %tag:char-to:\x3a%: no longer listening on %ip:ipv4%#%port:number%'

    That excerpt is a common rule. A rule contains different “parts”/properties, like the message you want to normalize (e.g. Host, IP, Source, Syslogtag…)

    All rules have to start with “rule=:

    The buildup of a property is as follows

    %field name:field type:additional information%

    field name -> that name can be free selected. It should reflect the content of the field, e.g. src-ip for the source IP. In common sense, the field names should be the same in all samples, if the content of the field means the same.

    field type -> selects the accordant parser

    date-rfc3164: date in format of rfc3164

    ipv4: ip adress

    number: sequence of numbers (example: %port:number%)

    word: everything till the next blank (example: %host:word%)

    char-to: the field will be defined by the sign in the additional information (example: %tag:char-to:\x3a%: (x3a means ":" in the additional information))

    additional information -> dependent on the field type; some field types need additional information

    In our example we have some more information that is used as “simple text”. That parts are exactly like the parts in the messages and are not selected by a property.

    Very important:

    In the field type “char-to” you can use any item that is on your keyboard. In the case shown above, the item “:” has to be escaped with it’s ANSII version. Other characters do not have to be escaped.

  • First steps using liblognorm

    Here you can find the first steps to use the pre-release of liblognorm.

    (Please note that the used operating system was Fedora 13.)

    At the moment there are two ways to install libognorm.
    You can install everything you need from git (below you can find all commands you need) or you can download it as tarball at

    libestr
    libee
    liblognorm

    Please note if you install it with tarballs you have to to the same steps which are mentioned below, apart from

    $ git clone git://git.adiscon.com/git/libestr.git
    $ autoreconf -vfi

    Installation
    Open a terminal and switch to the folder where you want to install liblognorm. Below you find the necessary commands

    $ git clone git://git.adiscon.com/git/libestr.git

    switch to the new folder libestr

    $ autoreconf -vfi
    $ ./configure --libdir=/usr/lib --includedir=/usr/include
    $ make
    $ make install

    leave that folder and repeat this step for libee

    $ git clone git://git.adiscon.com/git/libee.git

    switch to the new folder libee

    $ autoreconf -vfi
    $ ./configure --libdir=/usr/lib --includedir=/usr/include
    $ make
    $ make install

    leave that folder and repeat this step again for liblognorm

    $ git clone git://git.adiscon.com/git/liblognorm.git

    switch to the new folder liblognorm

    $ autoreconf -vfi
    $ ./configure --libdir=/usr/lib --includedir=/usr/include
    $ make
    $ make install

    That’s all you have to do.

    For a first test we need two further things, a test log and the rulebase. Both can be downloaded at

    http://blog.gerhards.net/2010/11/log-normalization-first-results.html

    After downloading these examples you can use liblognorm. Go to /liblognorm/src and use the command below:

    $ ./normalize -r /home/Test/messages.sampdb -ojson </home/Test/messages.log >/home/Test/temp

    -r = path to the rulebase

    -o = output format

    Please have look at http://www.liblognorm.com/help/available-options-for-normalizer/ for all available options.

  • log normalization – first results

    At the beginning of this week we were pretty confident, that we would not make our self-set deadline of one month to implement a first rough proof of concept of liblognorm, a log normalizing library. Fortunately, we made extremely good progress the past two days and we are now happy to say that we have such a proof of concept available. All of this can be seen by pulling from Adiscon’s public git server: you need libestr, libee and liblognorm to make it work.

    Right now, we’d like to provide a glimpse at how things work. Thanks to Anton Chuvakin and his Public Security Log Sharing Site we got a couple of examples to play with (but we are still interested in more lag samples, especially from Cisco devices). Out of the many, we took a random messages.log file written by sysklogd. This is our input file and can be seen here.

    To normalize events, liblognorm needs to know which fields are present at which positions of the input file. It learns this via so-called “samples”. Samples are very similar to the patterns used by virus scanners: like virus patterns describe how a specific virus looks, log samples describe how a specific log line looks. Other than virus patterns, we have crafted a format hopefully easy (enough) to understand by sysadmins, so that everyone can add relevant samples himself. To support this, samples look relatively similar to actual log lines, and this is the reason we have termed them “log samples”. Like log files, samples are stored in simple text files. For the initial test, we used a very small set of samples, available here. A production system will have many more samples, and we envision systems that have many (ten?-) thousand of samples loaded at the same time. If you look at the samples, take special care about entities enclosed in ‘%’ – these are field definitions, the rest is literal text.

    The actual normalization is performed by the libraries engine, which parses log lines, based on the samples, into fields. This creates an in-memory representation of the event, which can than be processed by the driving application or be written to some other media or the network.

    Liblognorm will come with a small tool called “the normalizer”. It is a minimal library user: it loads a sample database and reads log lines from standard input, creates the event in-memory representation and then writes this representation to standard output in a standardized format. So far, it supports formats as they are expected for the upcoming CEE standard.

    The result of a normalizer run on our test input file based on the provided sample base can be seen here. The output is actually a bit more verbose than described above, because it lists the to-be-normalized line as well. If you look at the properties we extracted, you’ll probably notice that some do not make too much sense (maybe…). Also, a classification of the message is missing. Don’t care about these aspects right now: it’s a proof of concept and these things will be addressed by future development (the classification, for example, will be based on CEE taxonomy via tags).

    We hope we were able to convey some of the power that is available with liblognorm. Of course, a “little bit” of more work and time will be required to get it production-ready. Unfortunately, we will be unavailable for larger parts of the next two weeks (other work now pressing plus a long-awaited seminar ;)), but we will try to get liblognorm as quickly as possible into the best shape possible. In the meantime, if you like, feel free to have a look at its code or play with it. All of what I wrote can actually be done with the versions available in git.

  • Introducing liblognorm

    Liblognorm shall help to make sense out of syslog data, or, actually, any event data that is present in text form.

    In short words, one will be able to throw arbitrary log message to liblognorm, one at a time, and for each message it will output well-defined name-value pairs and a set of tags describing the message.

    So, for example, if you have traffic logs from three different firewalls, liblognorm will be able to “normalize” the events into generic ones. Among others, it will extract source and destination ip addresses and ports and make them available via well-defined fields. As the end result, a common log analysis application will be able to work on that common set and so this backend will be independent from the actual firewalls feeding it. Even better, once we have a well-understood interim format, it is also easy to convert that into any other vendor specific format, so that you can use that vendor’s analysis tool. (more…)