General
Spatial Trees
RP Trees
Downloads
Publications
Contacts
Getting Started
Installation
Tutorials
Documentation
FAQ
|
Software Tutorials:
Here is a walkthrough of some basic tutorials for using RPTrees
for learning.
Preliminaries
RPTrees for Manifold Learning
- Obtain i.i.d. samples from a particular manifold and save it
in a file in proper format.
For practice, we provide i.i.d. samples from a 1-dimensional sinusoid
(with mild noise) embedded in 3-dimensional space. This sample
data is kept in the examples/
directory.
click here for details on data format
- Set up the parameters.
Most important parameters that need to be set are to convey information
about the data format. In case of the example data file (examples/sin3D.data),
data vector is a 3 dimensional vector of reals. Hence, make sure
to have the following lines in src/globals.h
#define
DATA_TYPE_DOUBLE
#define VECT_LEN 3
|
Another important parameter that needs to be set is the maximum
tree depth of the learnt RPTree. For the example data file (examples/sin3D.data),
it should be enough to have tree depth of 4 (this results in partitioning
the space into 2^4 = 16 regions). Maximum tree depth can be set
by having the following line in src/learnRPTree.h
click here for details on parameters
Some useful parameters in src/learnRPTree.h
:
- MAX_TREE_DEPTH
- maximum depth of the RPTree
- NUM_PROJ
- maximum number of random vectors used by the RPTree
- DECAY_COUNT
- 1/DECAY_COUNT
is the weight given to the older examples (in the streaming
context).
- NUM_BINS_SMALL
- number of small bins (refer the RPTree algorithm streaming
version)
- NUM_BINS_LARGE
- number of large bins (refer the RPTree algorithm streaming
version)
- N1 /
N2 / N3 - counts for phase I, II, III (refer the
RPTree algorithm streaming version)
Some useful parameters in src/globals.h
:
- VECT_LEN
- size/dimensionality of the data vector
- DATA_TYPE_XXXX
- data type of the vector (XXXX
should be replaced by DOUBLE
in case of real valued numbers, or by INT
in case of integer valued numbers)
Notes:
- Parameters are specified as pre-processor directives
instead of command line arguments to help RPTree
elements be layed out contiguously in memory.
This helps improve program execution speed in
large scale settings.
|
- Compile the code by issuing make.
Once all the essential parameters are set, the code can be compiled
by issuing the make command. A successful execution of make should
produce an executable named rptree
in the top level directory structure. Following is what a successful
compile may look like:
$
make
gcc -Wall -c matrixlib.c -o matrixlib.o
gcc -Wall -c globals.c -o globals.o
gcc -Wall -c learnRPTree.c -o learnRPTree.o
gcc -Wall -lm learn.c learnRPTree.o matrixlib.o
globals.o -o rptree
|
- Learning the RPTree.
Now we are ready to learn the RPTree from the data. Execute the
compiled program rptree
with appropreate command line arguments. For the example data
file (examples/sin3D.data),
you can issue the following command:
$
./rptree -l -d examples/sin3D.data -o sin3D.tree
|
click here for details on RPTree learn
Here are the full details of the command line options
that can be specified to learn an RPTree.
./rptree -l [-t old_tree] -d datafile [-b] -o learnt_tree
-l
: to indicate that we want to learn a tree from
the data.
-t
: to continue learning from an old tree.
-d
: to specify the data file.
-b
: if the data file is in binary format.
-o
: to specify the output file. The output file
is the learnt RPTree from the data.
|
- Classifying using RPTree.
We can use the learnt RPTree classify new data to see which region
of space do data vectors fall in. Execute the compiled program
rptree
with appropreate command line arguments. For the example data
file (examples/sin3D.data),
you can issue the following command:
$
./rptree -c -t sin3D.tree -d examples/sin3D.data -o sin3D.cls
|
click here for details on RPTree classify
Here are the full details of the command line options
that can be specified to learn an RPTree.
./rptree -c -t learnt_tree -d datafile [-b] -o classify_file
-c
: to indicate that we want to classify the data
from already learnt tree.
-t
: to specify the rp tree.
-d
: to specify the data file, which needs to be
classified.
-b
: if the data file is in binary format.
-o
: to specify the output file. The output file
is the classification of each data vector according
to the specified RPTree.
|
|