Restricted Boltzmann machine

A restricted Boltzmann machine (RBM) (also called a restricted Sherrington–Kirkpatrick model with external field or restricted stochastic Ising–Lenz–Little model) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.^[1]

RBMs were initially proposed under the name Harmonium by Paul Smolensky in 1986,^[2] and rose to prominence after Geoffrey Hinton and collaborators used fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction,^[3] classification,^[4] collaborative filtering,^[5] feature learning,^[6] topic modelling,^[7] immunology,^[8] and even [[Many-body problem|manyTemplate:Nbhbody quantum mechanics]].^[9] ^[10] ^[11]

They can be trained in either supervised or unsupervised ways, depending on the task.Template:Fact

As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph:

a pair of nodes from each of the two groups of units (commonly referred to as the "visible" and "hidden" units respectively) may have a symmetric connection between them; and
there are no connections between nodes within a group.

By contrast, "unrestricted" Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm.^[12]

Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs and optionally fine-tuning the resulting deep network with gradient descent and backpropagation.^[13]

Structure

The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a matrix of weights $W$ of size $m \times n$ . Each weight element $(w_{i, j})$ of the matrix is associated with the connection between the visible (input) unit $v_{i}$ and the hidden unit $h_{j}$ . In addition, there are bias weights (offsets) $a_{i}$ for $v_{i}$ and $b_{j}$ for $h_{j}$ . Given the weights and biases, the energy of a configuration (pair of Boolean vectors) Template:Math is defined as

E (v, h) = - \sum_{i} a_{i} v_{i} - \sum_{j} b_{j} h_{j} - \sum_{i} \sum_{j} v_{i} w_{i, j} h_{j}

or, in matrix notation,

E (v, h) = - a^{T} v - b^{T} h - v^{T} W h .

This energy function is analogous to that of a Hopfield network. As with general Boltzmann machines, the joint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows,^[14]

P (v, h) = \frac{1}{Z} e^{- E (v, h)}

where $Z$ is a partition function defined as the sum of $e^{- E (v, h)}$ over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1. The marginal probability of a visible vector is the sum of $P (v, h)$ over all possible hidden layer configurations,^[14]

P (v) = \frac{1}{Z} \sum_{{h}} e^{- E (v, h)}

,

and vice versa. Since the underlying graph structure of the RBM is bipartite (meaning there are no intra-layer connections), the hidden unit activations are mutually independent given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations.^[12] That is, for m visible units and n hidden units, the conditional probability of a configuration of the visible units Template:Mvar, given a configuration of the hidden units Template:Mvar, is

P (v | h) = \prod_{i = 1}^{m} P (v_{i} | h)

.

Conversely, the conditional probability of Template:Mvar given Template:Mvar is

P (h | v) = \prod_{j = 1}^{n} P (h_{j} | v)

.

The individual activation probabilities are given by

P (h_{j} = 1 | v) = σ (b_{j} + \sum_{i = 1}^{m} w_{i, j} v_{i})

and

P (v_{i} = 1 | h) = σ (a_{i} + \sum_{j = 1}^{n} w_{i, j} h_{j})

where $σ$ denotes the logistic sigmoid.

The visible units of Restricted Boltzmann Machine can be multinomial, although the hidden units are Bernoulli.Template:Clarify In this case, the logistic function for visible units is replaced by the softmax function

P (v_{i}^{k} = 1 | h) = \frac{\exp (a_{i}^{k} + Σ_{j} W_{i j}^{k} h_{j})}{Σ_{k^{'} = 1}^{K} \exp (a_{i}^{k^{'}} + Σ_{j} W_{i j}^{k^{'}} h_{j})}

where K is the number of discrete values that the visible values have. They are applied in topic modeling,^[7] and recommender systems.^[5]

Relation to other models

Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields.^[15]^[16]

The graphical model of RBMs corresponds to that of factor analysis.^[17]

Training algorithm

Template:Anchor

Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set $V$ (a matrix, each row of which is treated as a visible vector $v$ ),

\arg \max_{W} \prod_{v \in V} P (v)

or equivalently, to maximize the expected log probability of a training sample $v$ selected randomly from $V$ :^[15]^[16]

\arg \max_{W} 𝔼 [\log P (v)]

The algorithm most often used to train RBMs, that is, to optimize the weight matrix $W$ , is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train PoE (product of experts) models.^[18]^[19] The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.

The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:

Take a training sample Template:Mvar, compute the probabilities of the hidden units and sample a hidden activation vector Template:Mvar from this probability distribution.
Compute the outer product of Template:Mvar and Template:Mvar and call this the positive gradient.
From Template:Mvar, sample a reconstruction Template:Mvar of the visible units, then resample the hidden activations Template:Mvar from this. (Gibbs sampling step)
Compute the outer product of Template:Mvar and Template:Mvar and call this the negative gradient.
Let the update to the weight matrix $W$ be the positive gradient minus the negative gradient, times some learning rate: $Δ W = ϵ (v h^{𝖳} - v^{'} h'^{𝖳})$ .
Update the biases Template:Mvar and Template:Mvar analogously: $Δ a = ϵ (v - v^{'})$ , $Δ b = ϵ (h - h^{'})$ .

A Practical Guide to Training RBMs written by Hinton can be found on his homepage.^[14]

Stacked Restricted Boltzmann Machine

Template:Technical Template:More citations needed section Template:See also

The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes.
The usage of Stacked Boltzmann is to understand Natural languages, retrieve documents, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one.
Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure: $E = - \frac{1}{2} \sum_{i, j} w_{i j} s_{i} s_{j} + \sum_{i} θ_{i} s_{i}$ . The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with Gibbs sampling: Δw_ij = e*(p_ij - p'_ij)
The restricted Boltzmann's strength is it performs a non-linear transformation so it's easy to expand, and can give a hierarchical layer of features. The Weakness is that it has complicated calculations of integer and real-valued neurons. It does not follow the gradient of any function, so the approximation of Contrastive divergence to maximum likelihood is improvised.^[14]

Literature

Template:Citation

References

Template:Reflist

Bibliography

External links

Python implementation of Bernoulli RBM and tutorial
SimpleRBM is a very small RBM code (24kB) useful for you to learn about how RBMs learn and work.
Julia implementation of Restricted Boltzmann machines: https://github.com/cossio/RestrictedBoltzmannMachines.jl

↑ Template:Citation
↑ Template:Cite book
↑ Template:Cite journal
↑ Template:Cite conference
↑ ^5.0 ^5.1 Template:Cite conference
↑ Template:Cite conference
↑ ^7.0 ^7.1 Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model Template:Webarchive. Neural Information Processing Systems 23.
↑ Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal
↑ ^12.0 ^12.1 Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics.
↑ Template:Cite journal
↑ ^14.0 ^14.1 ^14.2 ^14.3 Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto.
↑ ^15.0 ^15.1 Template:Cite journal
↑ ^16.0 ^16.1 Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction Template:Webarchive. Pattern Recognition 47, pp. 25-39, 2014
↑ Template:Cite journal
↑ Geoffrey Hinton (1999). Products of Experts. ICANN 1999.
↑ Template:Cite journal

[1] Template:Citation

[2] Template:Cite book

[3] Template:Cite journal

[4] Template:Cite conference

[softCF-5] 5.0 ^5.1 Template:Cite conference

[coates2011-6] Template:Cite conference

[softTM-7] 7.0 ^7.1 Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model Template:Webarchive. Neural Information Processing Systems 23.

[8] Template:Cite journal

[9] Template:Cite journal

[10] Template:Cite journal

[11] Template:Cite journal

[oncd-12] 12.0 ^12.1 Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics.

[13] Template:Cite journal

[guide-14] 14.0 ^14.1 ^14.2 ^14.3 Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto.

[cdconvergence-15] 15.0 ^15.1 Template:Cite journal

[RBMTutorial-16] 16.0 ^16.1 Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction Template:Webarchive. Pattern Recognition 47, pp. 25-39, 2014

[17] Template:Cite journal

[18] Geoffrey Hinton (1999). Products of Experts. ICANN 1999.

[19] Template:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

Restricted Boltzmann machine

Contents

Structure

Relation to other models

Training algorithm

Stacked Restricted Boltzmann Machine

Literature

See also

References

Bibliography

External links

Navigation menu

Restricted Boltzmann machine

Structure

Relation to other models

Training algorithm

Stacked Restricted Boltzmann Machine

Literature

See also

References

Bibliography

External links

Navigation menu

Search