Hyperdimensional Sparse (Random) Vectors
Interesting properties for machine learning
A hyperdimensional sparse random vector (SRV) defined as a sparse vector of dimension, \(d>10^3\) in the binary vector space \(\mathbb{B}^d\) with elements drawn from the set \({\lbrace0,1\rbrace}^d\) with sparsity s, where the probability p, of a bit being set in the vector \(p=\frac s d\) has the following properties:
Combinatoric complexity harnessed
There are \({d \choose s}\) distinct vectors in the space.
Any two such vectors are orthogonal, \(u \cdot v = 0\) and therefore any sparse random vector may be used as a kind of basis vector.
Let the vectorspace described above be denoted by \(\mathbb{V} \subset \mathbb{B}^d\)
-- to compute binomial coefficient
choose :: Integral a => a -> a -> a
choose n 0 = 1
choose 0 k = 0
choose n k = choose (n-1) (k-1) * n `div` k So for a practical application of HDSS where we have defined \(d = 2^{14}\) and \(s = 2^{4}\) the space has a basis of order:
-- convert to double so ghci prints this in scientific notation.
fromIntegral $ choose (2^14) (2^4) :: Double 1.2791384791344545e54
This is a very large number and it obliges us to be sure that the pseudo random number generator (PRNG) has sufficient periodicity so as not to produce a sequence of random bits that is already in use.
Under the practical constraint of a high period and fast PRNG making a new sparse random vector is a simple operation with a computational cost of generating \(s\) random integers and we are statistically assured of its orthogonality to all other random vectors.
It is also true that if we take any such vector and shift/rotate its bits by an arbitrary number, up to \(d\) times then we can create a new vector from it that is related by a single integer \(n < d\) but is also statistically orthogonal to the original. This is very powerful idea such that we can have a mapping from a SRV to another SRV given by:
\(\mathbb{V}\times\mathbb{N} \to \mathbb{V}\)
\(u, n \mapsto v; n < d; v = u ROT n \implies u \perp v\)
This map is invertible, such that we can either find n within d shifts from any two vectors or, more efficiently, given n and v recover the original vector u.
\(v, -n \mapsto u\)
Features for free (well cheap)
This has at least one interesting application, one suggested by Dominic Widdows in The Geometry of Meaning where within a given text frame or word gram the relative positions of a co-incident term could be encoded by superposing its elemental (basis) vector shifted by its relative position. In our engine we have implemented the superposition operation to allow for a signed (directional) rotation of the source basis so we can investigate this further. There are other dimensions in data, for example time, that could be encoded using this property.
Implementation of the vectorspace \(\mathbb{V}\)
Firstly we define the scalar or dot product of two vectors as the bit count of the bitwise AND operation. We then scale this by the length or number of dimensions to a real number known as the overlap of the two vectors. We also define a normalized distance/closeness metric w.r.t. to the Hamming distance. We illustrate this in Haskell using a dense representation. In other work we have implemented this in both sparse and dense representations and this will be the subject of a further article.
import Data.Word
import qualified Data.Bits as Btype Vec = [Word]-- define a function to popcount a vector
popcount :: Vec -> Int
popcount v = sum $ map B.popCount v-- define a function to popcount after applying binary function
popop :: (Vec -> Vec-> Vec) -> Vec -> Vec -> Int
popop f u v = popcount $ f u v-- bitwise OR or addition of two vectors
orv :: Vec -> Vec -> Vec
orv = zipWith (B..|.) -- bitwise AND of two vectors
andv :: Vec -> Vec -> Vec
andv = zipWith (B..&.)-- XOR of two vectors
xorv :: Vec -> Vec -> Vec
xorv = zipWith (B.xor)-- dot product or overlap
dotp :: Vec -> Vec -> Int
dotp = popop andv-- distance
distance :: Vec -> Vec -> Int
distance = popop xorv-- this reflects our intuition for the scalar product
dotp [0,1,0] [1,1,1]1
distance [0,1,0] [1,1,1]2
-- the normalized inverse of distance - could also be used as a similarity metric
closeness u v = 1 - fromIntegral (distance u v) / fromIntegral (length u)-- the normalized overlap
overlap u v = fromIntegral (dotp u v) / fromIntegral (length u)overlap [1,0,0] [1,1,0]0.3333333333333333
closeness [1,0,0][1,1,0]0.6666666666666667
Formal properties and beyond
From the above informal definition of the vector space it is a small step to prove all the required formal properties of the space. What is not clear yet is how to form an algebra of these vectors which includes multiplication as well as addition of vectors, this may involve invoking concepts from geometric algebra which might provide further insights.
This is not a dry exercise, some authors have suggested that nothing short of a quantum algebra will be required to express all the queries that might render such a substrate useful for machine learning.
I will follow up on some of these ideas in subsequent articles. The next steps are to perform some empirical analysis of the distributions of the above metric space and make a journey into the topology of the associative memory suggested by Pentti Kanerva, the originator of this fascinating idea in his book Sparse Distributed Memory. This work formed the basis of Hierarchical Temporal Memory (HTM) around which there is a community centered around the Numenta organization.
Next
I hope to show in an upcoming post some of the data and results and try and establish the suitability of this space as a high performance machine learning substrate.
Elsewhere
This will also be a chance to showcase some of the tooling I am using such as Jupyter, IHaskell and H (inline R in Haskell) which these notes are created in. I’ll also take a look at the Data Haskell community and how it is evolving and some other techniques from the ANN world.