# Learning the ground-state of a spin model with different Neural Networks available in NetKet¶

The goal of this tutorial is to review various neural network architectures available in NetKet, and this in order to learn the ground-state of a paradigmatic spin model, namely the spin \(1/2\) Heisenberg antiferromagnetic spin chain.

The Hamiltonian we will consider for this tutorial is the following

\(L\) is the length of the chain, and we will use both open and periodic boundary conditions. \(\vec{\sigma}=(\sigma^x,\sigma^y,\sigma^z)\) denotes the vector of Pauli matrices. Please note that there is a factor of \(2\) between Pauli-matrices and spin-1/2 operators (thus a factor of \(4\) in \(H\)).

We will consider in this tutorial 5 possible ways of determining the ground-state of this model.

```
0. Defining the Hamiltonian
1. Exact Diagonalization (as a testbed)
2. Starting simple: the Jastrow ansatz
3. Learning with a Restricted Boltzmann Machine (RBM)
4. RBM again, this time with lattice symmetries
5. Learning with Feed Forward Neural Networks
```

Estimated time for this tutorial: 20 minutes.

First things first, let’s import the essentials

```
[1]:
```

```
# Import netket library
import netket as nk
# Import Json, this will be needed to examine log files
import json
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
import time
```

## 0) Defining the Hamiltonian¶

NetKet covers quite a few standard Hamiltonians and lattices, so let’s use this to quickly define the antiferromagnetic Heisenberg chain. For the moment we assume \(L=22\) and simply define a chain lattice in this way (using periodic boundary conditions for now).

```
[2]:
```

```
# Define a 1d chain
L = 22
g = nk.graph.Hypercube(length=L, n_dim=1, pbc=True)
```

Next, we need to define the Hilbert space on this graph. We have here spin-half degrees of freedom, and as we know that the ground-state sits in the zero magnetization sector, we can already impose this as a constraint in the Hilbert space. This is not mandatory, but will nicely speeds things up in the following.

```
[3]:
```

```
# Define the Hilbert space based on this graph
# We impose to have a fixed total magnetization of zero
hi = nk.hilbert.Spin(s=0.5, total_sz=0, N=g.n_nodes)
```

The final element of the triptych is of course the Hamiltonian acting in this Hilbert space, which in our case in already defined in NetKet. Note that the NetKet Hamiltonian uses Pauli Matrices (if you prefer to work with spin-\(1/2\) operators, it’s pretty trivial to define your own custom Hamiltonian, as covered in another tutorial)

```
[4]:
```

```
# calling the Heisenberg Hamiltonian
ha = nk.operator.Heisenberg(hilbert=hi, graph=g)
```

## 1. Exact Diagonalization (as a testbed)¶

Just as a matter of future comparison, let’s compute the exact ground-state energy (since this is still possible for \(L=22\) using brute-force exact diagonalization). NetKet provides wrappers to the Lanczos algorithm which we now use:

```
[5]:
```

```
# compute the ground-state energy (here we only need the lowest energy, and do not need the eigenstate)
evals = nk.exact.lanczos_ed(ha, compute_eigenvectors=False)
exact_gs_energy = evals[0]
print('The exact ground-state energy is E0=',exact_gs_energy)
# Just in case you can't do this calculation, here is the result
# exact_gs_energy = -39.14752260706246
```

```
The exact ground-state energy is E0= -39.147522607062804
```

## 2. Starting simple: the Jastrow ansatz¶

Let’s start with a simple ansatz for the ground-state: the Jastrow Ansatz.

$ \log\psi`(:nbsphinx-math:sigma`) = \sum_i a_i \sigma*i + :nbsphinx-math:`sum`*{i,j} \sigma*i J*{i,j} \sigma_j $

To show how it’s done, we write this simple ansatz as a flax.linen module. Flax.linen is wrapped inside of netket and it’s made to better work with complex numbers, so instead of using flax.linen you should use nk.nn.

In the example below, we write a function `single_evaluate`

that evaluates a single bit-string, and then we use jax.vmap to make sure that it works for batches of inputs

```
[6]:
```

```
import netket.nn as nknn
import flax.linen as nn
import jax.numpy as jnp
import jax
class Jastrow(nknn.Module):
@nknn.compact
def __call__(self, x):
x = jnp.atleast_2d(x)
return jax.vmap(self.single_evaluate, in_axes=(0))(x)
def single_evaluate(self, x):
v_bias = self.param(
"visible_bias", nn.initializers.normal(), (x.shape[-1],), complex
)
J = self.param(
"kernel", nn.initializers.normal(), (x.shape[-1],x.shape[-1]), complex
)
return x.T@J@x + jnp.dot(x, v_bias)
ma = Jastrow()
```

```
[7]:
```

```
# Build the sampler
sa = nk.sampler.MetropolisExchange(hilbert=hi,graph=g)
# Optimizer
op = nk.optimizer.Sgd(learning_rate=0.1)
# Stochastic Reconfiguration
sr = nk.optimizer.SR(diag_shift=0.1)
# The variational state
vs = nk.variational.MCState(sa, ma, n_samples=1000)
# The ground-state optimization loop
gs = nk.VMC(
hamiltonian=ha,
optimizer=op,
preconditioner=sr,
variational_state=vs)
start = time.time()
gs.run(300, out='Jastrow')
end = time.time()
print('### Jastrow calculation')
print('Has',nk.jax.tree_size(vs.parameters),'parameters')
print('The Jastrow calculation took',end-start,'seconds')
```

```
100%|██████████| 300/300 [00:15<00:00, 19.50it/s, Energy=-39.139-0.000j ± 0.018 [σ²=0.315, R̂=0.9989]]
```

```
### Jastrow calculation
Has 506 parameters
The Jastrow calculation took 21.097535133361816 seconds
```

```
```

```
[ ]:
```

```
```

```
[8]:
```

```
# import the data from log file
data_Jastrow=json.load(open("Jastrow.log"))
iters_Jastrow = data_Jastrow["Energy"]["iters"]
energy_Jastrow = data_Jastrow["Energy"]["Mean"]["real"]
fig, ax1 = plt.subplots()
ax1.plot(iters_Jastrow, energy_Jastrow, color='C8', label='Energy (Jastrow)')
ax1.set_ylabel('Energy')
ax1.set_xlabel('Iteration')
plt.axis([0,iters_Jastrow[-1],exact_gs_energy-0.1,exact_gs_energy+0.4])
plt.axhline(y=exact_gs_energy, xmin=0,
xmax=iters_Jastrow[-1], linewidth=2, color='k', label='Exact')
ax1.legend()
plt.show()
```

Well that’s not too bad for a simple ansatz. But we can do better, can’t we?

## 3. Learning with a Restricted Boltzmann Machine (RBM)¶

We will now consider another celebrated ansatz, the Restricted Boltzmann Machine (RBM). It simply consists of two layers: a visible one representing the \(L\) spin 1/2 degrees of freedom, and an hidden one which contains a different number \(M\) of hidden units. There are connections between all visible and hidden nodes. The ansatz is the following

\(\Psi_{\rm RBM} (\sigma_1^z,\sigma_2^z, ..., \sigma_L^z) = \exp ( \sum_{i=1}^L a_i \sigma_i^z ) \prod_{i=1}^M \cosh (b_i + \sum_j W_{ij} \sigma^z_j)\)

\(a_i\) (resp. \(b_i\)) are the visible (resp. hidden) bias. Together with the weights \(W_{ij}\), they are variational parameters that we (or rather NetKet) will optimize to minimize the energy. Netket gives you the control on the important parameters in this ansatz, such as \(M\) and the fact that you want to use or not the biases. The full explanation is here.

More conveniently (especially if you want to try another \(L\) in this tutorial), let’s define the hidden unit density \(\alpha = M / L\), and invoke the RBM ansatz in NetKet with as many hidden as visible units.

```
[9]:
```

```
# RBM ansatz with alpha=1
ma = nk.models.RBM(alpha=1)
```

And let us use the same sampler (Metropolis exchange) with some different random initial parameters, optimizer (stochastic gradient), and variational method (stochastic reconfiguration) as for the Jastrow ansatz, and let’s run things!

```
[10]:
```

```
# Build the sampler
sa = nk.sampler.MetropolisExchange(hilbert=hi,graph=g)
# Optimizer
op = nk.optimizer.Sgd(learning_rate=0.05)
# Stochastic Reconfiguration
sr = nk.optimizer.SR(diag_shift=0.1)
# The variational state
vs = nk.variational.MCState(sa, ma, n_samples=1000)
# The ground-state optimization loop
gs = nk.VMC(
hamiltonian=ha,
optimizer=op,
preconditioner=sr,
variational_state=vs)
start = time.time()
gs.run(out='RBM', n_iter=600)
end = time.time()
print('### RBM calculation')
print('Has',vs.n_parameters,'parameters')
print('The RBM calculation took',end-start,'seconds')
```

```
100%|██████████| 600/600 [00:19<00:00, 30.09it/s, Energy=-39.115 ± 0.015 [σ²=0.225, R̂=1.0043]]
```

```
### RBM calculation
Has 528 parameters
The RBM calculation took 24.721863985061646 seconds
```

```
```

```
[13]:
```

```
# import the data from log file
data=json.load(open("RBM.log"))
# Extract the relevant information
iters_RBM = data["Energy"]["iters"]
energy_RBM = data["Energy"]["Mean"]
fig, ax1 = plt.subplots()
ax1.plot(iters_Jastrow, energy_Jastrow, color='C8', label='Energy (Jastrow)')
ax1.plot(iters_RBM, energy_RBM, color='red', label='Energy (RBM)')
ax1.set_ylabel('Energy')
ax1.set_xlabel('Iteration')
plt.axis([0,iters_RBM[-1],exact_gs_energy-0.03,exact_gs_energy+0.2])
plt.axhline(y=exact_gs_energy, xmin=0,
xmax=iters_RBM[-1], linewidth=2, color='k', label='Exact')
ax1.legend()
plt.show()
```

Note that this plot zooms closer to the exact ground-state energy. With 600 iterations, we start to see convergence and reach the ground-state energy within about one per thousand, this is already nice! But we are not totally there yet, and in particular the simpler (less parameters) Jastrow wave-function seems to perform better for this example. How can we improve things? As an exercice, try to increase the number of hidden units and/or the number of iterations. What is happening? You can also check out the influence of the learning rate.

By playing with these parameters, you have hopefully arrived at an improved result, but likely at an increased CPU time cost. Let’s do things differently, and take to our advantage the symmetries of the Hamiltonian in the neural network construction.

## 4. RBM again, this time with lattice symmetries¶

Let’s define a similar RBM machine, which takes into account that the model has translational symmetries. All sites are equivalent and thus many of the wave-functions coefficients are related by symmetry. We use the same exact hyperparameters as in the previous RBM calculation (\(\alpha=1\), same learning rate, and number of samples and iterations in the Variational Monte Carlo) and run now a symmetric RBM.

```
[ ]:
```

```
## Symmetric RBM Spin Machine
#ma = nk.machine.RbmSpinSymm(alpha=1, hilbert=hi)
#ma.init_random_parameters(seed=1234, sigma=0.01)
#
## Metropolis Exchange Sampling
## Notice that this sampler exchanges two neighboring sites
## thus preservers the total magnetization
#sa = nk.sampler.MetropolisExchange(machine=ma, graph=g)
#
## Optimizer
#op = nk.optimizer.Sgd(learning_rate=0.05)
#
## Stochastic reconfiguration
#gs = nk.variational.Vmc(
# hamiltonian=ha,
# sampler=sa,
# optimizer=op,
# n_samples=1000,
# diag_shift=0.1,
# use_iterative=True,
# method='Sr')
#
#start = time.time()
#gs.run(output_prefix='RBMSymmetric', n_iter=300)
#end = time.time()
#
#print('### Symmetric RBM calculation')
#print('Has',ma.n_par,'parameters')
#print('The Symmetric RBM calculation took',end-start,'seconds')
```

The simulation was much faster, wasn’t it? There were of course much less parameters to optimize. Now let’s extract the results and plot them using a zoomed scale, and together with the previous results with the RBM.

```
[ ]:
```

```
## import the data from log file
#data=json.load(open("RBMSymmetric.log"))
#
## Extract the relevant information
#energy_symRBM=[]
#iters_symRBM=[]
#for iteration in data["Output"]:
# energy_symRBM.append(iteration["Energy"]["Mean"])
# iters_symRBM.append(iteration["Iteration"])
#
#fig, ax1 = plt.subplots()
#ax1.plot(iters_Jastrow, energy_Jastrow, color='C8', label='Energy (Jastrow)')
#ax1.plot(iters, energy_RBM, color='red', label='Energy (RBM)')
#ax1.plot(iters_symRBM, energy_symRBM, color='blue', label='Energy (Symmetric RBM)')
#
#ax1.set_ylabel('Energy')
#ax1.set_xlabel('Iteration')
#plt.axis([0,iters_symRBM[-1],exact_gs_energy-0.06,exact_gs_energy+0.12])
#plt.axhline(y=exact_gs_energy, xmin=0,
# xmax=iters[-1], linewidth=2, color='k', label='Exact')
#ax1.legend()
#plt.show()
```

Not only the simulation was faster in terms of CPU time, but we are now reaching the ground-state in a much lower number of iterations! Imposing symmetries greatly helps, and NetKet allows to do this. Note that there is also a symmetric version of the Jastrow ansatz that we tested earlier in NetKet, which is called `JastrowSymm`

. As an exercice, check it out. What you will find is that while it converges slightly faster in terms of iterations with
respect to the non-symmetric Jastrow, it does not improve the estimate of the ground-state energy. We actually see that the symmetric RBM sets the standard very high.

## 5. Learning with Feed Forward Neural Networks¶

Now let’s try a more complex network, namely a Feed Forward Neural Network (FFNN). There you will have more freedom to construct your own specific architecture. We’ll try two different FFNN in this tutorial.

The first one is a simple structure: the first layer takes as input L-dimensional input, applies a bias and outputs two times more data, just followed by a `Lncosh`

activation layer. The final layer `SumOuput`

is needed to obtain a single number for the wave-function coefficient associated to the input basis state.

```
[14]:
```

```
class Model(nk.nn.Module):
@nk.nn.compact
def __call__(self, x):
x = nk.nn.Dense(features=2*x.shape[-1], dtype=np.complex128, kernel_init=nk.nn.initializers.normal(stddev=0.1), bias_init=nk.nn.initializers.normal(stddev=0.1))(x)
x = nk.nn.activation.log_cosh(x)
return jax.numpy.sum(x, axis=-1)
ffnn = Model()
sa = nk.sampler.MetropolisExchange(hi, graph=g)
# The variational state
vs = nk.variational.MCState(sa, ffnn, n_samples=1000)
opt = nk.optimizer.Sgd(learning_rate=0.05)
# Stochastic Reconfiguration
sr = nk.optimizer.SR(diag_shift=0.1)
# The ground-state optimization loop
gs = nk.VMC(
hamiltonian=ha,
optimizer=op,
preconditioner=sr,
variational_state=vs)
start = time.time()
gs.run(out='FF', n_iter=300)
end = time.time()
print('### Feed Forward calculation')
print('Has',vs.n_parameters,'parameters')
print('The Feed Forward calculation took',end-start,'seconds')
```

```
100%|██████████| 300/300 [00:52<00:00, 5.66it/s, Energy=-39.143-0.021j ± 0.023 [σ²=0.545, R̂=1.0024]]
```

```
### Feed Forward calculation
Has 1012 parameters
The Feed Forward calculation took 58.5942440032959 seconds
```

```
```

```
[15]:
```

```
# import the data from log file
data=json.load(open("FF.log"))
# Extract the relevant information
iters_FF = data["Energy"]["iters"]
energy_FF = data["Energy"]["Mean"]["real"]
fig, ax1 = plt.subplots()
ax1.plot(iters_Jastrow, energy_Jastrow, color='C8',linestyle="None", marker='d',label='Energy (Jastrow)')
ax1.plot(iters_RBM, energy_RBM, color='red', marker='o',linestyle="None",label='Energy (RBM)')
#ax1.plot(iters_symRBM, energy_symRBM, color='blue',linestyle="None",marker='o',label='Energy (Symmetric RBM)')
ax1.plot(iters_FF, energy_FF, color='orange', marker='s',linestyle="None",label='Energy (Feed Forward, take 1)')
ax1.legend(bbox_to_anchor=(1.05, 0.3))
ax1.set_ylabel('Energy')
ax1.set_xlabel('Iteration')
plt.axis([0,iters_FF[-1],exact_gs_energy-0.02,exact_gs_energy+0.1])
plt.axhline(y=exact_gs_energy, xmin=0,
xmax=iters_RBM[-1], linewidth=2, color='k', label='Exact')
plt.show()
```

The results are clearly better than a simple (non-symmetrized RBB), and perform slightly better than the Jastrow ansatz. Let us increase the number of layers by adding a fully-connected layer with bias and `Lncosh`

activation (with \(2L\) inputs and ouputs) before the final `SumOuput`

layer.

```
[16]:
```

```
class Model2(nk.nn.Module):
@nk.nn.compact
def __call__(self, x):
x = nk.nn.Dense(features=2*x.shape[-1], dtype=np.complex128, kernel_init=nk.nn.initializers.normal(stddev=0.1), bias_init=nk.nn.initializers.normal(stddev=0.1))(x)
x = nk.nn.activation.log_cosh(x)
x = nk.nn.Dense(features=x.shape[-1], dtype=np.complex128, kernel_init=nk.nn.initializers.normal(stddev=0.1), bias_init=nk.nn.initializers.normal(stddev=0.1))(x)
x = nk.nn.activation.log_cosh(x)
return jax.numpy.sum(x, axis=-1)
ffnn2 = Model2()
# The variational state
vs = nk.variational.MCState(sa, ffnn, n_samples=1000)
opt = nk.optimizer.Sgd(learning_rate=0.05)
# Stochastic Reconfiguration
sr = nk.optimizer.SR(diag_shift=0.1)
# The ground-state optimization loop
gs = nk.VMC(
hamiltonian=ha,
optimizer=op,
preconditioner=sr,
variational_state=vs)
start = time.time()
gs.run(out='FF2', n_iter=600)
end = time.time()
print('### Feed Forward (more layers) calculation')
print('Has',vs.n_parameters,'parameters')
print('The Feed Forward (more layers) calculation took',end-start,'seconds')
```

```
100%|██████████| 600/600 [01:35<00:00, 6.27it/s, Energy=-39.102+0.011j ± 0.016 [σ²=0.273, R̂=1.0017]]
```

```
### Feed Forward (more layers) calculation
Has 1012 parameters
The Feed Forward (more layers) calculation took 95.85180711746216 seconds
```

```
```

```
[17]:
```

```
# import the data from log file
data=json.load(open("FF2.log"))
# Extract the relevant information
iters_FF_morelayers = data["Energy"]["iters"]
energy_FF_morelayers = data["Energy"]["Mean"]["real"]
fig, ax1 = plt.subplots()
#ax1.plot(iters_Jastrow, energy_Jastrow, color='C8',linestyle="None", marker='d',label='Energy (Jastrow)')
#ax1.plot(iters, energy_RBM, color='red', label='Energy (RBM)')
#ax1.plot(iters_symRBM, energy_symRBM, color='blue',linestyle="None",marker='o',label='Energy (Symmetric RBM)')
ax1.plot(iters_FF, energy_FF, color='orange', marker='s',alpha=0.5,linestyle="None",label='Energy (Feed Forward, take 1)')
ax1.plot(iters_FF_morelayers, energy_FF_morelayers, color='green',marker='s',linestyle="None", alpha=1,label='Energy (Feed Forward, more layers)')
ax1.legend(bbox_to_anchor=(1.05, 0.5))
ax1.set_ylabel('Energy')
ax1.set_xlabel('Iteration')
plt.axis([0,iters_RBM[-1],exact_gs_energy-0.02,exact_gs_energy+0.06])
plt.axhline(y=exact_gs_energy, xmin=0,
xmax=iters_RBM[-1], linewidth=2, color='k', label='Exact')
plt.show()
```

The results are even better, but at the price of an increase in computational time….

Note that more complex structures, such as Convolutional Neural Networks (CNN), can also be used within Netket. However, for such 1d systems, they do not bring too much compared to the symmetric RBM (as a matter of fact, the symmetric RBM is a special type of a simple CNN. CNNs show their full strength for more complex systems, such as 2d quantum systems. Convolutional Neural Networks will be the topic of another tutorial in NetKet (and we’ll make there the connection with the special case of the symmetric RBM).

Finally let us conclude that another type of machine, Matrix Product States (MPS), is also available in NetKet. They do perform extremely well for 1d quantum systems. Since however they are a bit different, they will be presented in another tutorial.