Problem introduction

We will walk through an example of the local linear solver algorithm. All the code can be downloaded at cseweb.ucsd.edu/~osimpson/localsolver. Our example graph will be a network of dolphins from Mark Newman's network datasets library.

In [1]:
import solver
%run ipn_dolphins_example.py

We will compute a local solution over a subset of vertices, indicated by the red-colored nodes below.

In [2]:
subset
Out[2]:
[6, 32, 41, 25, 9, 17, 26, 31, 54, 27, 13, 57, 60, 5, 48, 56, 7, 22, 19, 1]
In [3]:
from IPython.display import Image
Image(filename='ipn_dolphins_subset.png')
Out[3]:

The vertex boundary of this subset are the following nodes, in purple:

In [4]:
vertex_boundary
Out[4]:
[40, 28, 36, 30, 39]
In [5]:
from IPython.display import Image
Image(filename='ipn_dolphins_subset_boundary.png')
Out[5]:

For this example, the boundary vector will have its support entirely in the vertex boundary. The vector is indexed by vertices in lexicographical order. The vector values are listed and plotted below.

In [6]:
boundary_vec
Out[6]:
array([[  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [ 73.20356034399658540224],
       [  0.                    ],
       [ 72.36707861296704891174],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  2.61707892868332514524],
       [  0.                    ],
       [  0.                    ],
       [ 67.56497031535104724753],
       [ 73.21537901342006193772],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ],
       [  0.                    ]])
In [7]:
%matplotlib inline
import matplotlib.pyplot as plt

bv_y = list(np.ravel(boundary_vec))

plt.figure(figsize=(20,5))
plt.bar(range(len(bv_y)), bv_y, align='center', color='purple')
plt.xticks(range(len(bv_y)), dolphins.graph.nodes(), size='large', rotation=290)
plt.xlabel('Vertices', fontsize=15)
plt.ylabel('Vertex Values', fontsize=15)
plt.show()

The local linear solver involves computing Dirichlet heat kernel pagerank vectors over the subset. Since we are interested in a solution that satisfies the boundary condition, $x(v) = b(v)$ for every vertex in the support of $b$ (the vertex boundary, in this case), a solution $x_S$ can be found with the following vector of coefficients, $b_2$, computed from $b$ and the adjacency and degree matrices of the graph. First we compute the vector: $$b_2 = (D_S^{-1/2} A_{S,\delta S} D_{\delta S}^{-1/2}b_{\delta S})^T D_S^{1/2}.$$

In [8]:
b2_y = list(np.ravel(b2))

plt.bar(range(len(b2_y)), b2_y, align='center')
plt.xticks(range(len(b2_y)), subset, size='large')
plt.show()

The local solution (over the subset $S$) to the system $\mathcal{L}x = b$, where $b$ is a boundary vector with support outside the local subset as in the vector boundary_vec, is given exactly by \begin{equation} x_S^T = \int_0^{\infty}\rho_{S,t,b_2} ~\mathrm{d}t ~D_S^{-1/2}. \end{equation}

We can get a good approximation to the local solution $x_S$ by sampling $r=\gamma^{-2}(\log s + \log(\gamma^{-1}))$ values of Dirichlet heat kernel vectors $\rho_{S,t,b_2}$ for varying $t$. Here, $\gamma \in (0,1)$ is the error in our solution approximation and $s$ is the size of the subset.

Below is one example heat kernel pagerank vector over the full graph with $t=50$. We see that the support is concentrated on the subset $S$, colored in red, and the vertex boundary, colored in purple. This suggests that $\epsilon$-approximate Dirichlet heat kernel pagerank vectors will be able to capture the support well.

In [9]:
x = np.loadtxt('ipn_dolphins_x.txt')
In [10]:
plt.figure(figsize=(20,5))
barlist = plt.bar(range(len(x)), x, align='center', color='white')
for s in subset:
    barlist[s].set_color('red')
for b in vertex_boundary:
    barlist[b].set_color('purple')
plt.xticks(range(len(x)), dolphins.graph.nodes(), size='large', rotation=290)
plt.xlabel('Vertices (subset in red, boundary in purple)', fontsize=15)
plt.ylabel('Vertex Values', fontsize=15)
plt.show()

Solving a local linear system with Dirichlet heat kernel pagerank vectors.

Our goal is to compute the local solution to $\mathcal{L}x = b$ restricted to the subset $S$. We denote this local solution $x_S$, the vector indexed by vertices in the subset. We require the solution to satisfy $x_S(v) = b(v)$ for every $v$ in the support of $b$, and this support must not include any vertices in $S$.

The exact solution is given by $x_S = \mathcal{L}_S^{-1}(D_S^{-1/2}A_{S,\delta S}D_{\delta S}^{-1/2}b_{\delta S}),$ where $D_S$ is the $s \times s$ submatrix of the diagonal degree matrix, $D$, indexed by vertices in the subset $S$, and similarly $D_{\delta S}$ is the $|\delta(S)| \times |\delta(S)|$ submatrix indexed by vertices in the vertex boundary, $\delta(S)$. The matrix $A_{S, \delta S}$ is the $s \times |\delta(S)|$ submatrix of the adjacency matrix $A$ with rows indexed by $S$ and columns by $\delta(S)$. Finally, $b_{\delta S}$ is the $|\delta(S)| \times 1$ subvector of the boundary vector $b$ indexed by $\delta(S)$.

Using spectral properties of the heat kernel and the graph Laplacian, we can compute this solution exactly with $$x_S^T = \int_0^{\infty}\rho_{S,t,b_2} ~\mathrm{d}t ~D_S^{-1/2}.$$ However, in this section we focus on approximating this integral with random sampling.

Sampling Dirichlet heat kernel pagerank.

An approximation $\hat{x}_S$ that satisfies $||x_S - \hat{x}_S|| = O(\gamma||b_2^TD_S^{-1/2}|| + ||x_S||)$, $\gamma \in (0,1)$, can be obtained by sampling $\rho_{S,t,b_2}$ for $r = \gamma^{-2}(\log s + \log(\gamma^{-1}))$ values. In each sample, $t$ is drawn uniformly at random from $[\gamma, T]$ where $T = s^3\log(s^3 \gamma^{-1})$ and $s = |S|$. Below is a summary of the algorithm.

GreensSolver(G,b,S,gamma)
    inputs: graph G,
            boundary vector b,
            subset S,
            error parameter gamma
    output: restricted solution xS satisyfing the boundary condition b

    s = size_of(S)
    xS = zero-vector of size s
    b2 = compute_b2(G,b,S)
    T = compute_T(S,gamma)
    N = T/gamma
    r = compute_r(S,gamma)
    for i = 1 to r:
        draw j from [1,N] uniformly at random
        xi = compute_Dirichlet_hkpr(G, jT/N, b2, S)
        xS += xi

    return T/r * xS * D_S^{-1/2}

We call something similar to this when computing the approximation.

An approximate solution.

First we compute the true solution as $x_S = \mathcal{L}_S^{-1}(D_S^{-1/2}A_{S,\delta S}D_{\delta S}^{-1/2}b_{\delta S})$.

In [11]:
xS = solver.restricted_solution(dolphins, boundary_vec, subset)
In [12]:
xS_y = list(np.ravel(xS))

s = len(subset)
plt.figure(figsize=(15,5))
plt.plot(range(s), xS_y, 'ro-')
plt.xticks(range(s), subset, size='large')
plt.show()

Let's permute the nodes so that node values are in descending order to visualize the vector values a bit better.

In [13]:
xS_dict = {}
for i in subset:
    xS_dict[i] = xS[subset.index(i)]
subset_sorted = sorted(xS_dict, key=xS_dict.get, reverse=True)
subset_perm = [subset.index(node) for node in subset_sorted]
In [14]:
xS_sorted = xS[subset_perm]
In [15]:
xS_sorted_y = list(np.ravel(xS_sorted))

plt.figure(figsize=(15,5))
plt.plot(range(s), xS_sorted_y, 'ro-')
plt.xticks(range(s), subset_sorted, size='large')
plt.show()

Then we compare the result of a call to Local Linear Solver, which approximates the local solution by sampling Dirichlet heat kernel pagerank vectors. We choose $\gamma = 0.01$ as the error parameter. The red points correspond to the node values of $x_S$ and the blue triangles correspond to the node values of the approximation with sampling.

In [16]:
xS_sample = solver.greens_solver_exphkpr(dolphins, boundary_vec, subset, 0.01)
In [17]:
#permute the nodes
_xS_sample = xS_sample[:,subset_perm]
In [18]:
xS_sample_y = list(np.ravel(_xS_sample))

plt.figure(figsize=(15,5))
plt.plot(range(s), xS_sample_y, 'b^--',
         range(s), xS_sorted_y, 'ro-')
plt.xticks(range(len(xS_y)), subset_sorted, size='large')
plt.legend(('Output of LOCAL LINEAR SOLVER', 'exact local solution'), loc='best')
plt.show()

The relative error of the approximation:

In [19]:
xS_appr = np.transpose(xS_sample)
print np.linalg.norm(xS-xS_appr)/np.linalg.norm(xS)
0.0203332238553

Let $rie(x_S)$ denote the approximation obtained by computing the full Riemann sum, as above. Then the error in this case should satisfy $||x_S - \hat{x}_S|| < \gamma(||b_1|| + ||x_S|| + ||rie(x_S)||)$, where $b_1 = D_S^{1/2} b_2^T$. We compute the error that exceeds this.

In [20]:
xS_rie = solver.restricted_solution_riemann(dolphins, boundary_vec, subset, 0.01)
In [21]:
gamma = 0.01
b1 = solver.compute_b1(dolphins, boundary_vec, subset)
allowable_err = gamma*( np.linalg.norm(b1) + np.linalg.norm(xS) + np.linalg.norm(xS_rie) )
err = max(0, np.linalg.norm(xS-xS_appr) - allowable_err)
print err
0

It is important to note that for random sampling we achieve this error with a certain probability, so more trials might be required to achieve the values you want. This is in contrast to computing the full Riemann sum, which gives a deterministic answer.

In the next sections we focus just on Dirichlet heat kernel pagerank, and look at how to control running time with an approximation algorithm. The typical way to compute a Dirichlet heat kernel pagerank is with a matrix multiply.

$\epsilon$-approximate Dirichlet heat kernel pagerank vectors.

We want to efficiently approximate many values of $\rho_{S,t,b_2}$ where $t$ lies in the range $[\gamma, T]$. We do this by choosing $j$ uniformly at random from $[1,N]$ and setting $t=jT/N=j\gamma$. An exact computation of $\rho_{S,t,b_2}$ involves matrix exponentiation. Instead, we approximate with SolverApproxHKPR, which promises to compute vectors that $\epsilon$-approximate $\rho_{S,t,b_2}$.

Definition.

For a graph and subset $S$ and inputs $t$, $g$ a vector $\hat{\rho}$ is an $\epsilon$-approximate Dirichlet heat kernel pagerank vector for $\rho_{S,t,f}$ when $\hat{\rho}$, satisfies:

If $f$ is a probability distribution:

  • for every vertex $v \in V$ in the support of $\hat{\rho}$, $(1-\epsilon)\rho_{S,t,f}(v) \leq \hat{\rho}(v) \leq (1+\epsilon)\rho_{S,t,f}(v),$ and
  • for every vertex with $\hat{\rho}(v) = 0$, it must be that $\rho_{S,t,f}(v) \leq \epsilon$.

When $f$ is a general vector, we require additional additive error $\epsilon ||f||_1$.

The algorithm.

Our Dirichlet heat kernel pagerank approximation algorithm works by simulating random walks on the graph. Specifically, to approximate $\rho_{S,t,f}$, we transform $f$ into a probability function and launch random walks from vertices according to this function. A Chernoff bound is used to show that $16/\epsilon^3 \log n$ random walks are enough to approximate Dirichlet heat kernel pagerank values in each vertex satisfying the conditions of an $\epsilon$ approximation.

Recall the definition of Dirichlet heat kernel pagerank: $$\rho_{S,t,f} = e^{-t} \sum\limits_{k=0}^{\infty} \frac{t^k}{k!}f^T P_S^k.$$ This can be deconstructed as the expected value of the following random variable: perform $k$ random walk steps with probability $e^{-t} \frac{t^k}{k!}$. The probability of ever choosing $k \geq 2t$ is smaller than $\epsilon$ by Chernoff bounds for Poisson random variables, so random walks longer than $2t$ are safely forbidden.

To summarize the algorithm, we take the average of $16/\epsilon^3 \log n$ samples of random walk distributions after $\max{k, 2t}$ steps, where $k$ is drawn with probability $e^{-t}\frac{t^k}{k!}$.

Approximate vectors.

For the local linear solver algorithm, we will be approximating vectors $\rho_{S,t,f}$ for values of $t$ as high as $T = s^3\log(s^3 \gamma^{-1})$. For example, if $\gamma = 0.01$,

In [22]:
s = len(subset)
gamma = 0.01
T = s**3*(np.log(s**3*(1./gamma)))
print 'T is', T
T is 108738.936053

Let's first plot the $L_1$ and $L_2$ norms of exact vectors.

In [23]:
ts = np.logspace(0, np.log10(T), base=10, num=20)
In [24]:
l1 = []
l2 = []
for t in ts:
    exact = dolphins.exp_hkpr(subset, t, b2)
    l1.append(np.linalg.norm(exact, ord=1, axis=1)[0])
    l2.append(np.linalg.norm(exact, axis=1)[0])
In [25]:
plt.figure(figsize=(15,10))
plt.semilogx(ts, l1, 'r-', ts, l2, 'k:')
plt.xlabel('t', fontsize=15)
plt.legend(('L1 norm of Dirichlet heat kernel pagerank', 'maximum entry in absolute value'), loc='best')
plt.show()

In fact, the norm of the vectors drops off dramatically, so we limit the range to $T=500$.

In [26]:
T = 500
ts = np.logspace(0, np.log10(T), base=10, num=20)

l1_exact = []
l2_exact = []
l1_apprx = []
l2_apprx = []
epsilon = 0.1
for t in ts:
    exact = dolphins.exp_hkpr(subset, t, b2)
    apprx = solver.approx_dir_hkpr(dolphins, subset, t, b2, epsilon)
    l1_exact.append(np.linalg.norm(exact, ord=1, axis=1)[0])
    l2_exact.append(np.linalg.norm(exact, axis=1)[0])
    l1_apprx.append(np.linalg.norm(apprx, ord=1, axis=1)[0])
    l2_apprx.append(np.linalg.norm(apprx, axis=1)[0])
In [27]:
plt.figure(figsize=(15,10))
plt.semilogx(ts, l1_exact, 'ko-', ts, l1_apprx, 'b^-', ts, l2_exact, 'k--', ts, l2_apprx, 'b--')
plt.legend(('exact L1', 'approximate L1', 'exact L2', 'approximate L2'), loc='best')
plt.xlabel('t', fontsize=15)
plt.show()

We can also look at some example vectors.

In [28]:
ts_vecs = np.logspace(0, np.log10(250), base=10, num=5)

exact_vecs=[]
apprx_vecs=[]
err = []
epsilon = 0.1
for t in ts_vecs:
    exact = dolphins.exp_hkpr(subset, t, b2)
    apprx = solver.approx_dir_hkpr(dolphins, subset, t, b2, epsilon)
    exact_vecs.append(exact)
    apprx_vecs.append(apprx)
    err.append(solver.approx_hkpr_err(exact, apprx, b2, epsilon))
In [29]:
plt.figure(figsize=(20,10))
plt.plot(range(s), apprx_vecs[0][0], 'r^-', range(s), exact_vecs[0][0], 'ro-',
         range(s), apprx_vecs[1][0], 'b^-', range(s), exact_vecs[1][0], 'bo-',
         range(s), apprx_vecs[2][0], 'g^-', range(s), exact_vecs[2][0], 'go-',
         range(s), apprx_vecs[3][0], 'y^-', range(s), exact_vecs[3][0], 'yo-',
         range(s), apprx_vecs[4][0], 'm^-', range(s), exact_vecs[4][0], 'mo-',)
plt.xticks(range(len(exact_vecs[0])), subset)
plt.ylabel('Vertex Values', fontsize=15)
plt.xlabel('Vertices', fontsize=15)
plt.legend(('t='+str(ts_vecs[0]), 't='+str(ts_vecs[0]),
            't='+str(ts_vecs[1]), 't='+str(ts_vecs[1]),
            't='+str(ts_vecs[2]), 't='+str(ts_vecs[2]),
            't='+str(ts_vecs[3]), 't='+str(ts_vecs[3]),
            't='+str(ts_vecs[4]), 't='+str(ts_vecs[4])), loc='best')
plt.show()