hdtorch/180edcf00ae8master
README.md
Getting started
HDTorch is PyTorch-based hyperdimensional (HD) computing library for HD learning. It includes custom CUDA extensions for speeding up hypervector operations, namely, bit-(un)packing and bit-array summation in the horizontal/vertical dimensions.
In the paper Simon W., Pale U., Teijeiro T. and Atienza D.: HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration (ICCAD 2022), we demonstrate HDTorch’s utility by analyzing four HDC benchmark datasets in terms of accuracy, runtime, and memory consumption, utilizing both classical and online HD training methodologies.
Basics of Hyperdimensional computing (HDC)
HD computing is a machine learning strategy whose defining feature is its representation of datapoints as long (’hyper’) vectors, which enables learning by ’accumulation’ of said vectors belonging to the same class. HD computing relies on two conditions; first, any two randomly generated HD vectors are with high probability orthogonal, and second, a vector generated by vector accumulation will be more similar to its components than a vector not of its class.
The most common type of used HDvectors are binary ones, where their elements can be only 0 or 1 or bipolar with elements -1 or 1. In practice, also tertiary (-1,0,1) or integer/float vectors are sometimes used (but for the moment, we don't support them in this library.
Typical HD workflow consists of several steps:
- Initializing basis vectors in memory that will be used to encode features to vectors. They represent basic units we need, such as vectors (their IDs) and possible vector values. If we have more complex data, such as EEG data, where we also have channels, we can have basis vectors for each of the channels too.
- Data (feature) values have to be discretized into several bins. Each of those values will have its own vector, which was initialized in the previous step.
- Discretized features are encoded to HD vectors so that for each sample of features, we instead now have HD vectors representing that sample.
- Learning is performed using all encoded data samples. Several approaches to learning are possible, but the most simple/classic approach is to accumulate all HD vectors representing samples of the same class. After accumulation (and normalization to get binary vectors again) these vectors are called 'model' vectors of classes. More complex training is 'online' training, which differs in that the class vectors are updated after every datapoint by multiplying its similarity to the target class by the vector before accumulating it into the class.
- Inference is performed in a way that sample is first encoded to HD vector, followed by comparing this vector with learned 'model' vectors. Comparison can be done using various metrics such as cosine similarity, dot similarity or hamming, but for binary vectors, hamming is the most memory and computation friendly. The label of the most similar 'model' vector is given as a prediction.
Generating Hypervectors
Encoding of data (set of features) to HD vectors can be done in several ways, but for most of them, the first step is generating basis hypervectors that are further combined together to form the final HD vector representation of the original data.
Here we provide several ways to initialize basis hypervectors since the data that they will represent can have different structures and relationships. For example, if we want to represent different categories (that don't have any special relationship between them, we might want to generate each HD vector randomly and independently. In case we want to create basis HD vectors that will represent values since values have a relationship, we might want to map the distance between values to the distance between corresponding vectors.
Thus, several options to generate a set of basis vectors that our code supports now are:
- 'random' - where every vector is randomly and independently generated
- 'sandwich' - where every two neighboring vectors have half of the vector the same, but the rest of the vector is random. In this vector are only similar (50%) with neighboring vectors but not with the ones further.
- 'scale' - or alternatively called 'level' initialization, where distance in values vectors represent is mapped to the similarity between those vectors.
- 'scaleWithRadius' - similar to 'scale' initialization, but only for vectors that are closer than the 'radius' distance. This way, vectors closer than 'radius' are similar proportionally to their distance (the further they are less similar are vectors), but after 'radius' they are orthogonal.
Example of basis vectors generation:
import hdtorch # generating 5 random hypervectors with dimension 10000 (not packed, on 'cuda') vecs = hdtorch.HDmodel.generateBasisHDVectors('random',5,10000,0,'cuda') # generating 20 hypervectors with dimension 500 (not packed, on 'cuda') where two vectors are similar reverse-proportionally to distance. Bits that are different between neighboring vectors are chosen in an increasing manner (instead of randomly) and the whole vector is eventually flipped. If the factor at the end was e.g. 2 only half of the total vector would be flipped. vecs = hdtorch.HDmodel.generateBasisHDVectors('scaleNoRand1',20,500,0,'cuda') # generating 100 hypervectors with dimension 10000 (not packed, on 'cuda') where they are proportionally to distance similar only to surrounding 10 vectors, and with all vectors further than that, they are almost orthogonal vecs = hdtorch.HDmodel.generateBasisHDVectors('scaleWithRadius10',1000,10000,0,'cuda')
Custom CUDA functions
In order to significantly lower computation time when operating with hypervectors and lower the memory used by them, we implemented custom CUDA C-functions for packing and unpacking them.
Below is shown how these functions are used:
import hdtorch # Generate random HD vectors of dimension 10000 vecs = hdtorch.HDmodel.generateBasisHDVectors('random',5,10000,0,'cuda') # Compress vectors to arrays with dimension [5,313], dtype = int32, (CUDA accelerated, 8x memory reduction). Dimension 313 is because of 10000/32 packed_vecs = hdtorch.pack(vecs) # Decompress vector to array with dimension [5,10000], dtype = int8 (CUDA accelerated) unpacked_vecs = hdtorch.unpack(packed_vecs, 10000)
Next, as encoding and learning in HDC is based on bitwise summing vectors in horizontal and vertical dimensions, those functions are implemented for packed vectors. This additionally lowers the computational time for encoding and training.
Using those C-based functions is as follows:
import hdtorch # Generate random HD vectors of dimension 10000 vecs = hdtorch.HDmodel.generateBasisHDVectors('random',5,10000,0,'cuda') # Compress vectors to arrays with dimension [5,313], dtype = int32, (CUDA accelerated, 8x memory reduction). Dimension 313 is because of 10000/32 packed_vecs = hdtorch.pack(vecs) # Horizontal summation of packed vector (CUDA accelerated), result is array with dimension [5] h_count = hdtorch.hcount(packed_vecs) # Vertical summation of packed vector (CUDA accelerated), result is array with dimension [10000] v_count = hdtorch.vcount(packed_vecs,10000)
Data encoding
In order to learn from train data or infer test data labels, data has to be encoded to HD vectors. This means that instead of having data in the form of 2D matrix [numSampl, numFeat] where each column is one feature, we represent it in the form of 2D matrix of corresponding HD vectors [numSampl, D]. For every sample, numFeat features are encoded to one D-dimensional hypervector.
There is more ways how this can be done, but the most typical is what we call 'FeatXORVal', where features (their ID vector) is bind to the value of that feature. Binding for binary vectors is typically done using XOR function. This binding is performed for each feature and it's value, using basis vector matrixes generated at the beginning. In the end, the vectors calculated are bundled for all features. Bundling means bitwise summing and normalizing with a number of summed vectors to again have binary vectors.
How this looks in the code is shown below:
import torch import hdtorch numFeat=30 D=10000 numSegmentationLevels=20 # initialize data (100 samples, with 30 features, having values between 0 and 256) features=torch.randint(0,256,(100, numFeat)).to(device='cuda') # initialize basis vectors featureIDs = hdtorch.HDmodel.generateBasisHDVectors('random',numFeat,D,0,'cuda') #randomly generated feature ID vectors, 1 for each of 30 features, with with D=10000, non packed featureVals = hdtorch.HDmodel.generateBasisHDVectors('scaleNoRand1',numSegmentationLevels,D,0,'cuda') #generated feature value vectors, using 'scale' method, 20 possible values, 1, with with D=10000, non packed #normalize data minFeat=torch.min(features, dim=0)[0] maxFeat=torch.max(features, dim=0)[0] featuresNorm = hdtorch.HDutil.normalizeAndDiscretizeData(features,minFeat, maxFeat, numSegmentationLevels ) #encode features using 'FeatXORVal' approach (encodedData, _) = hdtorch.HDencoding.EncodeDataToVectors (featuresNorm, featureIDs, featureVals, 'binary', 0, 'FeatXORVal', D) # or using e.g. 'FeatPermute' approach (encodedData, _) = hdtorch.HDencoding.EncodeDataToVectors (featuresNorm, featureIDs, featureVals, 'binary', 0, 'FeatPermute', D)
HD computing learning and inference
Finally, to use HD vectors to perform learning and inference, we show the whole process on the example of using MNIST data:
import torch import hdtorch from torchvision import datasets import torchvision.transforms as transforms # Setting various parameters class HDParams(): HDFlavor = 'binary' # 'binary', 'bipol' #binary 0,1, bipolar -1,1 D = 10000 # dimension of hypervectors numFeat = 784 numClasses = 10 device = 'cuda' # device to use (cpu, cuda) packed = True numSegmentationLevels = 20 # defines number of discretization levels to which data is discretized similarityType = 'hamming' # 'hamming','cosine' #similarity measure used for comparing HD vectors levelVecType = 'random' # 'random','sandwich','scaleNoRand1','scaleNoRand2','scaleRand1', ,'scaleRand2'... 'scaleWithRadius3', #defines how HD vectors are initialized IDVecType = 'random' encodingStrat = 'FeatXORVal' # 'FeatXORVal' 'FeatAppend' 'FeatPermute' #defines how HD vectors encoded hdParams = HDParams() batchSize = 1000 #learn in batches # Loading MNIST dataset print("Loading MNIST dataset") t = transforms.Compose([transforms.ToTensor(), transforms.ConvertImageDtype(torch.uint8)]) kwargs = {'num_workers': 1, 'pin_memory': True} if HDParams.device == 'cuda' else {} dataTrain = datasets.MNIST(root = './data', train = True, transform = t, download = True) dataTest = datasets.MNIST(root = './data', train = False, transform = t, download = True) trainLoader = torch.utils.data.DataLoader(dataset=dataTrain, batch_size=batchSize, shuffle=True, **kwargs) testLoader = torch.utils.data.DataLoader(dataset=dataTest, batch_size=batchSize, shuffle=False, **kwargs) # Calculate min and max valus on train set - will used for normalizing also test set minFeat = trainLoader.dataset.data.view(-1,784).min(0)[0].to(HDParams.device) maxFeat = trainLoader.dataset.data.view(-1,784).max(0)[0].to(HDParams.device) # Initialize HD classifier HDModel = hdtorch.HD_classifier(HDParams) # Training HD model in batches print("Training Model") for x,(data,labels) in enumerate(trainLoader): print(f'Training batch {x}') data = data.to(HDParams.device).view(-1,784) data = hdtorch.HDutil.normalizeAndDiscretizeData(data,minFeat, maxFeat, HDParams.numSegmentationLevels ) HDModel.trainModelVecOnData(data,labels.to(HDParams.device)) # Testing performance print("Testing Model") for x,(data,labels) in enumerate(testLoader): data = data.to(HDParams.device).view(-1,784) data = hdtorch.HDutil.normalizeAndDiscretizeData(data, minFeat, maxFeat, HDParams.numSegmentationLevels) (testPredictions,testDistances) = HDModel.givePrediction(data) acc_test = (testPredictions == labels.to(HDParams.device)).sum().item()/len(labels) print(f'Batch {x}: Acc: {acc_test}')