In [3]:
# Preamble  here, add paths and import relevant modules
import os, sys, numpy as np, sklearn
from numpy.random import randn
sys.path.append('/'.join(os.getcwd().split('/')[:-1]))
% matplotlib inline
import matplotlib.pyplot as plt

Tutorial 4 -- Usage of GraphAlgorithms Package

Now that we've done some examples, here is a detailed description for the usage of the actual package

The main interface with the user is suppose to go through the classifier class LaplacianClustering. Basically all graph construction, prediction and parameter setting can be done via this class only. The name LaplacianClustering stems from the fact that all algorithms are some form of clustering, and the Graph Laplacian is used. Future versions will contain non-Laplacian based graph methods.

In [10]:
# create a classifier, setting some parameters in the constructor
from graph_cluster import LaplacianClustering
clf = LaplacianClustering(scheme_type = 'MBO_fidelity', 'eta':2, 'dt':.6)

# load the data into the classifier, ground_truth if you have any
raw_data = randn(50,2)
clf.load_data(raw_data = raw_data)

# build the graph
clf.build_graph(affinity = 'rbf', Neig = 3, gamma = 1)

Technically, all parameters could be defined and changed using the constructor and the set_parameters function, but load_data and set_graph_parameters are specialized function for setting the data and graph_params fields

The graph is stored in clf.graph, which is a BuildGraph() instance. Changing the raw_data automatically deletes the graph, while changing parameters in clf.graph automatically triggers a recomputation of the graph.

In [14]:
# generate random initial condition
clf.generate_initial_value()
# load a ground truth
clf.load_raw_data(ground_truth = np.ones(50).astype(int))
# generate random fidelity
clf.generate_random_fidelity(percent = .5)

The class also has built in methods for generating initial values, and fidelity points. It can also compute the misclassification rate for your result. Of course, you need to load in the ground_truth for the latter two.

To do a prediction, just call the fit_predict method. The name is to be consistent with the sklearn notation.

The nice thing about modularizing all the code is that when doing multiple tests, different parameters for the scheme can be changed without recomputing everything again, unless the parameter changed affects the underlying data.

In [17]:
# For example, after I'm done with MBO fidelity in the example above, to change the scheme to 
# 'GL_fidelity', simple switch the 'scheme_type' attribute:
clf.set_parameters(scheme_type = 'GL_fidelity') #that's all!

# Or if there's any change to the graph parameters
clf.set_graph_params(neighbor_type = 'connectivity', n_neighbors = 10)
#The graph Laplacian is automatically recomputed everytime this function is triggered. 

The class is designed for multiple tests with mimimal effort. Hence parameters are updated lazily, i.e., if it's not specified in the set_parameters argument, it stays(mostly) the same. To clear the parameters, use the clear() method supplied by the Parameter class. If any crucial parameter is missing, the code should raise an error or print a warning.

Finally, this is a first version of the package, and many improvements + new functionalities are expected to be added. Below are some key references to the entire package.

References

Here are some refs I've listed

  • Bertozzi, Andrea L., and Arjuna Flenner. "Diffuse interface models on graphs for classification of high dimensional data." Multiscale Modeling & Simulation 10.3 (2012): 1090-1118. link

  • Merkurjev, Ekaterina, Justin Sunu, and Andrea L. Bertozzi. "Graph MBO method for multiclass segmentation of hyperspectral stand-off detection video." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014. link

  • Hu, Huiyi, et al. "A method based on total variation for network modularity optimization using the MBO scheme." SIAM Journal on Applied Mathematics 73.6 (2013): 2224-2246. link

  • Hu, Huiyi, et al. "A method based on total variation for network modularity optimization using the MBO scheme." SIAM Journal on Applied Mathematics 73.6 (2013): 2224-2246. link