# Supervised Learning with Neural Networks

In this tutorial we show how to optimize a neural networks to approximate the state given. In this example, we consider the ground state of the J1-J2 model in one-dimension obtained from ED using NetKet.

The Hamiltonian of the model is given by:

where the sum is over sites of the 1-D chain.

## Outline:

```
1. Obtain data from ED
2. Choosing the machine (variational ansatz) and the optimizer
3. Defining the Supervised Learning object
4. Running the Supervised Learning
5. Data Visualisation
```

```
# Import netket library
import netket as nk
from mpi4py import MPI
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
```

## 1) Obtain data from ED

For a supervised learning problem, we would need to provide the data including input and output label . The neural network is asked to learn the mapping . In our case, the input is the spin basis and the output is the coefficient of the corresponding spin basis.

First, we write a simple function to obtain data, i.e. ground state, from exact diagonalization. For detailed explanation see the tutorial for J1-J2 model for example.

```
def load_ed_data(L, J2=0.4):
# Sigma^z*Sigma^z interactions
sigmaz = np.array([[1, 0], [0, -1]])
mszsz = (np.kron(sigmaz, sigmaz))
# Exchange interactions
exchange = np.asarray(
[[0, 0, 0, 0], [0, 0, 2, 0], [0, 2, 0, 0], [0, 0, 0, 0]])
# Couplings J1 and J2
J = [1., J2]
mats = []
sites = []
for i in range(L):
for d in [0, 1]:
# \sum_i J*sigma^z(i)*sigma^z(i+d)
mats.append((J[d] * mszsz).tolist())
sites.append([i, (i + d + 1) % L])
# \sum_i J*(sigma^x(i)*sigma^x(i+d) + sigma^y(i)*sigma^y(i+d))
mats.append(((-1.)**(d + 1) * J[d] * exchange).tolist())
sites.append([i, (i + d + 1) % L])
# 1D Lattice
g = nk.graph.Hypercube(length=L, n_dim=1, pbc=True)
# Spin based Hilbert Space
hi = nk.hilbert.Spin(s=0.5, graph=g)
# Custom Hamiltonian operator
ha = nk.operator.LocalOperator(hi)
for mat, site in zip(mats, sites):
ha += nk.operator.LocalOperator(hi, mat, site)
# Perform Lanczos Exact Diagonalization to get lowest three eigenvalues
res = nk.exact.lanczos_ed(ha, first_n=3, compute_eigenvectors=True)
# Eigenvector
ttargets = []
tsamples = []
hind = nk.hilbert.HilbertIndex(hi)
for i in range(hind.n_states):
visible = hind.number_to_state(i)
# only pick zero-magnetization states
mag = np.sum(visible)
if(np.abs(mag) < 1.0e-4):
tsamples.append(visible.tolist())
ttargets.append([np.log(res.eigenvectors[0][i])])
return hi, tsamples, ttargets
```

After we obtain the result as `res`

, we return the hilbert space `hi`

, the spin basis `tsamples`

, and the coefficients `ttargets`

.

Notice that we restrict ourselves to symmetry sector to simplify the learning.

We now consider a small system and with , and obtain the data by calling the function `load_ed_data`

.

```
L = 10
J2 = 0.4
# Load the Hilbert space info and data
hi, training_samples, training_targets = load_ed_data(L, J2)
```

## 2) Choosing the Machine and the Optimizer

For this tutorial, we consider the Restricted Bolzmann Machine `nk.machine.RbmSpin`

and the AdaDelta optimizer `nk.optimizer.AdaDelta`

.

```
# Machine
ma = nk.machine.RbmSpin(hilbert=hi, alpha=1)
ma.init_random_parameters(seed=1234, sigma=0.01)
# Optimizer
op = nk.optimizer.AdaDelta()
```

## 3) Defining the Supervised Learning object

We have now have almost everything (machine, optimizer, data) for setting up a supervised learning object. We also need to provide the batch size, `batch_size`

, for stochatic gradient descent. For more details, see here.

```
# Supervised learning object
spvsd = nk.supervised.Supervised(
machine=ma,
optimizer=op,
batch_size=400,
samples=training_samples,
targets=training_targets)
```

## 4) Running the Supervised Learning

The very last piece we need for supervised learning is the loss function.

### Loss function

There are different loss functions one could define for the optimization problem, for example:

Here, we consider the latter one, which is the negative log of the overlap, as the loss function.

### Gradient estimate

Taking the derivative from overlap errror function above, we have

Note that is the size of the Hilbert space. In general, this could not be computed exactly.

We could estimate this gradient by sampling different distributions,

So for the overlap loss function, we have two gradient estimate, one is , `Overlap_uni`

, and , `Overlap_phi`

.

We save the loss function every iteration, and save the optimized parameters only every `save_params_every`

iterations.

```
# Number of iteration
n_iter = 4000
# Run with "Overlap_phi" loss. Also available currently is "MSE, Overlap_uni"
spvsd.run(n_iter=n_iter, loss_function="Overlap_phi",
output_prefix='output', save_params_every=50)
```

## 5) Data Visualisation

We have optimized our machine to approximate the ground state of the J1-J2 model. The results for the loss function are stored in the “.log” file and the optimized parameters in the “.wf” file. The files are all in json format.

```
# Load the data from the .log file
import json
data=json.load(open("output.log"))
# Extract the relevant information
iters=[]
log_overlap=[]
mse=[]
mse_log=[]
data=json.load(open('output.log'))
for iteration in data["Output"]:
iters.append(iteration["Iteration"])
log_overlap.append(iteration["log_overlap"])
mse.append(iteration["mse"])
mse_log.append(iteration["mse_log"])
overlap = np.exp(-np.array(log_overlap))
```

Now, we plot the overlap, i.e. fidelity, with respect to the number of iteration.

```
J2 = 0.4
plt.subplot(2, 1, 1)
plt.title(r'$J_1 J_2$ model, $J_2=' + str(J2) + '$')
plt.ylabel('Overlap = F')
plt.xlabel('Iteration #')
plt.plot(iters, overlap)
plt.axhline(y=1, xmin=0, xmax=iters[-1], linewidth=2, color='k',label='max accuracy = 1')
plt.legend(frameon=False)
plt.subplot(2, 1, 2)
plt.ylabel('Overlap Error = 1-F')
plt.xlabel('Iteration #')
plt.semilogy(iters, 1.-overlap)
plt.show()
```

The result suggests that indeed we could have a good approximate to the state given by supervised learning.