Predictive Analytics + Drools:

I’ve been looking into Decision Management over the last few days. It is an excellent topic I’ve been missing out for quite some time. A brief description of what I understood so far from it, is a set of methodologies to provide a closed cycle for knowledge management systems, from the point of knowledge discovery and formalization, to exposing it through a knowledge runtime and providing feedback to the knowledge discovery.

This last part, the feedback, is a component I’ve always wondered if was possible to build as part of the tooling that Drools provides. The idea behind it is having an environment where you can play with data from your production systems, finding all sorts of combinations that you weren’t paying attention to before, either because you planned them for later, or because you didn’t thought of that condition was appearing in your environment. This is not always something you want to do so lightly, specially when you’re handling private or sensitive information in your working memory, but after a certain amount of filtering, it is something that you would like to consider a good place to start searching how your environment is working in production.

You might want to do this in a separate environment, however, because of two things:

  • Performance issues: If you’re going to perform queries, groupings, or any heavy search engine related tasks, you don’t want them to slow down your environment. For Drools, this is a very important topic, because so many people use the rule engine because of its high performance, and you don’t want it to decrease by any reasons
  • Simulation generation: Another heavy task. Once you find the patterns that would identify your new cases, or improve your existing cases, you might want to do changes to your knowledge definitions to see how they behave with the existing data. If they work well, you will want to apply the new knowledge definitions to the real production system, but not before.

The branch that targets this analysis is called predictive analytics, because it goes further from just making realtime analysis of data, but also focuses on discovering trends of change in your production environments to be suggests new ways in which production data might be getting inside your environment in a few minutes, hours, months, and so on depending on how far back the production data involved in the discovery process goes.

All these analysis are usually possible because they are thought to be conducted from Big Data. This is the part where Drools puts a distance with Decision Management Software. Drools production data (the working memory) lives in memory. It can be persisted, but even if it is, it is just a serialized blob of information to restore the session elsewhere. So, even if we did these analytical tools for Drools, they would have to work on memory.

This got me thinking if these sort of analysis could be done on top of a Drools rule engine. We certainly do have some tools in DRL to provide us with analysis capabilities.

Analytics with rules and Queries

DRL Queries can give us insight on any specific working memory using the same search patterns that rules use. The one thing it would be able to do is provide us with easy grouping, but rules and specific facts for carrying grouping information could do this quite easily.

rule "Group by init"
insert(new GroupBy());

rule "Group by example"
p: Person(age > 16)
gb: GroupBy()
gb.sum("personAge", p.getAge());

Also, once a query is constructed, group by functionality could be introduced on the Java side (after all, everything is running in memory). But let’s delay the detailed analysis of that for a moment by stating that, for the moment, if we wanted to analyze a working memory to find uncovered cases, we would most likely be able to. Having that as a “for the moment” assumption, lets try to see how we could implement an in-memory analytical tool for Drools

Given that, lets see 3 different scenarios to start building some tooling for this. We will use a few diagrams with the same color coding: white is for already existing systems, yellow for easily built components, and red for components that would be hard to implement:

Case 1: Your own 100% developed environment

Let’s say that, for the simulation environment where we will run our analytics, we were going to build an entirely new system from scratch. Some things might be easy to create, some not so much. Usually when we do this, it is because we have an already existing application dedicated to this, and want to use it for analyzing our Drools environment, so we will assume the environment already exists. But what do we have to build for it?

Without a KMS

As you see, the first thing we will need is a way to send new information to Environment B. Session persistence could take care of this, but usually when we run complex working memories we don’t usually persist in order to gain maximum performance. In this scenario, we would use a single component of persistence: session serialization. Using any pluggable communication methods, we could create a communication between two environments to share the same session , even if that session is not persistent.

The one component that would be hard to implement, however, would be a runtime and UI where we can construct queries or rules to run simulations with the production data. This would involve creating query editors, rule editors, runtime components to perform those searchs in the copied working memory, and UIs to show them to the user. These components would be quite hard to build, not because of any intrinsic complexity, but mostly because they would have to be maintained by 3rd parties (A.K.A. YOU)

Case 2: Using the KIE Workbench functionality embedded in our application

Fortunately, there is an alternative to writing your own editors that would facilitate development a lot. That is, using the guvnor editors for rules to construct a query editor you could use from your application. That leaves environment B to worry only about having an execution environment for execution of simulation scenarios.

With a KMS

It also facilitates deploying, because if you use guvnors internal build and deploy functionality, all new knowledge definitions could be implemented in your production environment by using nothing else than the KieScanner, provided by the kie-ci dependency in Drools 6.

The one problem it would have is that it would have to embed existing editor inside your own applications. It is a very reasonable consideration most Drools tooling users want to be able to do, but for the moment, it is rather complex to integrate. Perhaps there is a way to have everything environment B has in a single place?

Case 3: Extending the KIE Workbench functionality for analytics

In this scenario, we would just extend existing functionality of existing workbenchs. A query editor would be easily built using the guided rule editor. A query executor and analyzer could be created from injectable components and UIs. Event sending could be used to trigger a simple yet powerful session replicator from a different environment. These, I think, would be the best way to go to start building analytical tools on top of Drools right now:
Everything inside Kie WB


Of course, these are just theoretical components for the moment, but they would be very possible to implement. They would provide a huge added value to the discovery stages of knowledge development for Drools tooling. Development over the next few months could prove me wrong, but I hope it will not, as I also hope you found this analysis of drools analytic possibilities informative.