liblognorm 0.3.1 released

Monday, April 18th, 2011

We have just released liblognorm 0.3.1.

This release includes a new major features.

Changes:
Version 0.3.1 (rgerhards), 2011-04-18

  • added -t option to normalizer so that only messages with a specified tag will be output
  • bugfix: abort if a tag was assigned to a message without any fields parsed out (uncommon scenario)
  • bugfix: mem leak on parse tree destruct — associated tags were not deleted
  • bugfix: potential abort in normalizer due to misadressing in debug message generation

Download:
http://www.liblognorm.com/files/download/liblognorm-0.3.1.tar.gz

As always, feedback is appreciated.

Best regards,
Florian Riedl

log classification with liblognorm

Wednesday, April 6th, 2011

Today, we have added support for so-called “tags” to liblognorm (and it’s base library libee). This new capabilities permits very easy classification of syslog message and log records in general. So you can not only extract data from your various log source, you can also classify events, for example, as being a “login”, a “logout” or a firewall “denied access”. This makes it very easy to look at specific subsets of messages and process them in ways specific to the information being conveyed.

To see how it works, let’s first define what a tag is: A tag is a simple alphanumeric string that identifies a specific type of object, action, status, etc. For example, we can have object tags for firewalls and servers. For simplicity, let’s call them “firewall” and “server”. Then, we can have action tags like “login”, “logout” and “connectionOpen”. Status tags could include “success” or “fail”, among others. The idea of tags is based on early CEE concepts. We will try to keep consistent with whatever CEE heads to. Tags form a flat space, there is no inherent relationship between then (but this may be added later on top of the current implementation). Think of tags like the tag cloud in a blogging system. Tags can be defined for any reason and need (though obviously we must strive to get to a standard set, something we hope CEE will provide in the not too distant future). A single event can be associated with as many tags as required. (more…)

liblognorm 0.3.0 released

Wednesday, April 6th, 2011

We have just released liblognorm 0.3.0.

This release includes a new major feature and a bugfix. (more…)

liblognorm 0.2.0 released

Friday, April 1st, 2011

We have just released liblognorm 0.2.0.

This release includes some bug fixes and feature enhancements. (more…)

New Mailing List for Log Normalization

Thursday, January 13th, 2011

Thankfully, the interest in log normalization and the related libraries liblognorm and libee has increased. Up until now, we have handled discussions on this topics via the rsyslog mailing list. As conversations increase, this may be come an unnecessary burden for those only interested in rsyslog. So we have created a new mailing list named lognorm. We used this somewhat generic name, as we intend to use it for both libraries. This saves me some overhead, and we strongly assume that anyone interested in liblognorm will also be interested in libee (but to a lesser extent in the reverse direction).

Please subscribe to the new lists. Currently, it is a very exciting phase in log normalization development, so getting involved is a great way to shape things in the way you need it!

liblognorm 0.1.0 has been released

Thursday, December 9th, 2010

Liblognorm is a event and log normalization library that is capable of real-time processing. It provides the capability to normalize events to a set of standard formats. It is most efficient when used together with almost unstructured data, as for example found in typical syslog messages. While liblognorm provides a service similar to other projects, it is unique in the way it works:

a) As a library, it’s services can be used by any third-party application with ease. As such, other projects can benefit from liblognorm functionality without the user even knowing.

b) Liblognorm is very fast, and especially much faster than regular-expression based solutions. This is possible because it uses a special data structure which (kind of) combines many regular expressions into a single, and thus faster, big one.

This is the initial public release, targeted at early adopters. We will continue to enhance considerably, but the 0.1.0 version offers decent stability and features and so can actually be used.

You can download it here.

log normalization with rsyslog

Thursday, December 2nd, 2010

We just wanted to give you a quick heads-up on our current development efforts: We have begun to work heavily on a message modfication module for rsyslog which will support liblognorm-style normalization inside rsyslog. In git there already is a branch “lognorm”, which we will hopefully complete and merge into master soon. It provides some very interesting shortcuts of pulling specific information out of syslog messages. We will probably promote it some more when it is available. IMHO it’s the coolest and potentially most valuable feature we have added in the past three years. Once we have enabled tags in liblognorm/libee, you can even very easily classify log messagesbased on their content.

log normalization – first results

Monday, November 15th, 2010

At the beginning of this week we were pretty confident, that we would not make our self-set deadline of one month to implement a first rough proof of concept of liblognorm, a log normalizing library. Fortunately, we made extremely good progress the past two days and we are now happy to say that we have such a proof of concept available. All of this can be seen by pulling from Adiscon’s public git server: you need libestr, libee and liblognorm to make it work.

Right now, we’d like to provide a glimpse at how things work. Thanks to Anton Chuvakin and his Public Security Log Sharing Site we got a couple of examples to play with (but we are still interested in more lag samples, especially from Cisco devices). Out of the many, we took a random messages.log file written by sysklogd. This is our input file and can be seen here.

To normalize events, liblognorm needs to know which fields are present at which positions of the input file. It learns this via so-called “samples”. Samples are very similar to the patterns used by virus scanners: like virus patterns describe how a specific virus looks, log samples describe how a specific log line looks. Other than virus patterns, we have crafted a format hopefully easy (enough) to understand by sysadmins, so that everyone can add relevant samples himself. To support this, samples look relatively similar to actual log lines, and this is the reason we have termed them “log samples”. Like log files, samples are stored in simple text files. For the initial test, we used a very small set of samples, available here. A production system will have many more samples, and we envision systems that have many (ten?-) thousand of samples loaded at the same time. If you look at the samples, take special care about entities enclosed in ‘%’ – these are field definitions, the rest is literal text.

The actual normalization is performed by the libraries engine, which parses log lines, based on the samples, into fields. This creates an in-memory representation of the event, which can than be processed by the driving application or be written to some other media or the network.

Liblognorm will come with a small tool called “the normalizer”. It is a minimal library user: it loads a sample database and reads log lines from standard input, creates the event in-memory representation and then writes this representation to standard output in a standardized format. So far, it supports formats as they are expected for the upcoming CEE standard.

The result of a normalizer run on our test input file based on the provided sample base can be seen here. The output is actually a bit more verbose than described above, because it lists the to-be-normalized line as well. If you look at the properties we extracted, you’ll probably notice that some do not make too much sense (maybe…). Also, a classification of the message is missing. Don’t care about these aspects right now: it’s a proof of concept and these things will be addressed by future development (the classification, for example, will be based on CEE taxonomy via tags).

We hope we were able to convey some of the power that is available with liblognorm. Of course, a “little bit” of more work and time will be required to get it production-ready. Unfortunately, we will be unavailable for larger parts of the next two weeks (other work now pressing plus a long-awaited seminar ;)), but we will try to get liblognorm as quickly as possible into the best shape possible. In the meantime, if you like, feel free to have a look at its code or play with it. All of what I wrote can actually be done with the versions available in git.

Introducing liblognorm

Wednesday, October 13th, 2010

Liblognorm shall help to make sense out of syslog data, or, actually, any event data that is present in text form.

In short words, one will be able to throw arbitrary log message to liblognorm, one at a time, and for each message it will output well-defined name-value pairs and a set of tags describing the message.

So, for example, if you have traffic logs from three different firewalls, liblognorm will be able to “normalize” the events into generic ones. Among others, it will extract source and destination ip addresses and ports and make them available via well-defined fields. As the end result, a common log analysis application will be able to work on that common set and so this backend will be independent from the actual firewalls feeding it. Even better, once we have a well-understood interim format, it is also easy to convert that into any other vendor specific format, so that you can use that vendor’s analysis tool. (more…)