The MP13 Approach to the KDD'99 Classifier Learning Contest

October 1999

Vladimir Miheev
Alexei Vopilov
Ivan Shabalin

The MP13 method is best summarized as recognition based on voting decision trees using "pipes" in potential space..

1. General Classification Approach

The final decision on a connection belonging to one of the five classes is made according to the following steps: Steps are applied sequentially. A branch to the next step occurs whenever the current one has failed to recognize particular
connection. This sequencing was not assumed in advance; it became clearly desirable obvious during work on the contest.

2. Work for the KDD'99 Contest

In a preliminary stage, thirteen decision trees were generated based on a subset of the training data. The target was to identify "normal" connections and any attacks. The training dataset was randomly split into three subsamples: 25% for tree generation, 25% for tree tuning, and 50% for estimating model quality.  The results of this experiment were quite encouraging.

We did not have enough hardware capacity to work with all the learning samples. Hence, we prepared our own learning data set as 10% of the given training database (about 400,000 entries).  Since the task description pointed out the fact of different distributions in the training and testing datasets, we randomly removed some of the DOS and "normal" connections from the full training database.  The resulting rediced dataset was used for all further manipulations.

Next, we proceeded with learning based on the "one against the rest"' principle. Classes were separated from each other using five sets of decision trees. Each connection was assigned a vector of proximity to the five classes. Each vector component was calculated as a sum of weighted votes from one set of decision trees.  This representation forms a so-called "potential space," while a multidimensional interval on proximity vectors is called a "pipe."

Next, we converted the testing dataset into the potential space representation.  We obtained two types of connections in this space: well-recognized and unrecognized.  Some more work was done to improve recognition. First, expert knowledge made it possible to construct some simple but quite stable verbal rules.  Also, a second echelon (see Section 1) of decision trees was constructed to separate unrecognized pairs of classes.

3. Training Algorithm

To construct our recognition model, we used a version of the 'Fragment' algorithm originally invented at the IITP (Russian Academy of Science), in the division 'Partner Systems'.  The authors of "Fragment" are V. Miheev and V. Pereversev-Orlov.

For constructing a decision tree, training data is split into learning and testing samples. The learning sample is used to find the
structure of a tree (with sufficient complexity) and to generate a hierarchy of models on this tree. The testing sample is used to select a subtree having optimal complexity.

Repeated application of the algorithm to various splits of the training data into different subsamples allows us to generate a set of voting decision trees. Some heuristic approaches are used to make the trees independent.

4. Hardware and Software

We used two computers: a Pentium Notebook 150 MHz with 32 MB RAM and about 200MB free disk space, and a Pentium Desktop 133 MHz with 40 MB RAM and about 200MB free disk space.  It took about 30 minute to generate one set of trees. The total time for all calculations using the PERGAMENT software was about 6 hours. Additionally, we used the
DOS FoxPro(TM) package to perform various data transformations and interactive data analyses.

5. Team Members

  Miheev, Vladimir ( - Data Mining Expert
  Vopilov, Alexei  ( - Network Security Expert
  Shabalin, Ivan   ( - Sr. Software Developer

Special thanks to V. Pereversev-Orlov as forever guru in KDD science and original inventor of the approach used.