int hierarchicalClustering(const Matrix<typename Distance::ElementType>& features,
Matrix<typename Distance::ResultType>& centers,
const KMeansIndexParams& params,
Distance d = Distance())
\end{Verbatim}
\begin{description}
\item[features]{The points to be clustered}
\item[centers]{The centers of the clusters obtained. The number of rows in this matrix represents the number of clusters desired.
However, because of the way the cut in the hierarchical tree is choosen, the number of clusters computed will be
the highest number of the form $(branching-1)*k+1$ that's lower than the number of clusters desired, where $branching$ is the tree's
branching factor (see description of the KMeansIndexParams). }
\item[params]{Parameters used in the construction of the hierarchical k-means tree}
\end{description}
The function returns the number of clusters computed.
\subsubsection{flann::KdTreeCuda3dIndex}
\label{sec:flann::cuda}
FLANN provides a CUDA implementation of the kd-tree build and search algorithms to improve the build and query speed for large 3d data sets. This section will provide all the necessary information to use the \texttt{KdTreeCuda3dIndex} index type.
\textbf{Building:}
If CMake detects a CUDA install during the build (see section \ref{sec:downloading_and_compiling}), a library \texttt{libflann\_cuda.so} will be built.
\textbf{Basic Usage:}
To be able to use the new index type, you have to include the FLANN header this way:
If you define the symbol \texttt{FLANN\_USE\_CUDA} before including the FLANN header, you will have to link \texttt{libflann\_cuda.so} or \texttt{libflann\_cuda\_s.a} with your project.
However, you will not have to compile your source code with \texttt{nvcc}, except if you use other CUDA code, of course.
You can then create your index by using the \texttt{KDTreeCuda3dIndexParams} to create the index. The index will take care of copying all the data from and to the GPU for you, both
for index creation and search.
A word of caution: A general guideline for deciding whether to use the CUDA kd tree or a (multi-threaded) CPU implementation is hard to give, since it depends on the combination of CPU and GPU in each system and the data sets.
For example, on a system with a Phenom II 955 CPU and a Geforce GTX 260 GPU, the maximum search speedup on a synthetic (random) data set is a factor of about 8-9 vs the single-core CPU search, and starts to be reached at about 100k search and query points. (This includes transfer times.)
Build time does not profit as much from the GPU acceleration; here the benefit is about 2x at 200k points, but this largely depends on the data set. The only way to know which implementation is best suited is to try it.
\textbf{Advanced Usage:}
In some cases, you might already have your data in a buffer on the GPU. In this case, you can re-use these buffers instead of copying the buffers back to system RAM for index creation and search.
The way to do this is to pass GPU pointers via the \texttt{flann::Matrix} inputs and tell the index via the index and search params to treat the pointers as GPU pointers.
\item First, a GPU buffer of float4 values is created and filled with points. \footnote{For index creation, only \texttt{float4} points are supported, \texttt{float3} or structure-of-array (SOA) representations are currently not supported since
\texttt{float4} proved to be best in terms of access speed for tree creation and search.}
\item Then, a GPU pointer to the buffer is stored in a flann matrix with 3 columns and a stride of 4 (since the last element in the \texttt{float4} is unused).
\item Last, the index is created. The \texttt{input\_is\_gpu\_float4} flag tells the index to treat the matrix as a gpu buffer.
\end{description}
Similarly, you can specify GPU buffers for the search routines that return the result via flann matrices (but not for those that return them via \texttt{std::vector}s).
To do this, the pointers in the index and dist matrices have to point to GPU buffers and the \texttt{cols} value has to be set to 3 (as we are dealing with 3d points). Here, any value for \texttt{stride} can be used.
\item Note that you cannot mix matrices in system and CPU ram here!
\end{description}
\textbf{Search Parameters:}
The search routines support three parameters:
\begin{itemize}
\item \texttt{eps} - used for approximate knn search. The maximum possible error is $e= d_{best} * eps$, i.e. the distance of the returned neighbor is at maximum $eps$ times larget than the distance to the real best neighbor.
\item \texttt{use\_heap} - used in knn and radius search. If set to true, a heap structure will be used in the search to keep track of the distance to the farthest neighbor. Beneficial with about $k>64$ elements.
\item \texttt{sorted} - if set to true, the results of the radius and knn searches will be sorted in ascending order by their distance to the query point.
\item \texttt{matrices\_in\_gpu\_ram} - set to true to indicate that all (input and output) matrices are located in GPU RAM.
\end{itemize}
\subsection{Using FLANN from C}
FLANN can be used in C programs through the C bindings provided
with the library. Because there is no template support in C, there
are bindings provided for the following data types: \texttt{unsigned char},
\texttt{int}, \texttt{float} and \texttt{double}. For each of the functions
below there is a corresponding version for each of the for data types, for example
\item[\texttt{testset}] - a $d \times m$ matrix containing $m$ test points
whose k-nearest-neighbors need to be found
\item[\texttt{k}] - the number of nearest neighbors to be returned for each
point from \texttt{testset}
\item[\texttt{parameters}] - structure containing the search parameters.
Currently it has only one member, \texttt{parameters.checks}, denoting the
number of times the tree(s) in the index should be recursively traversed. A
higher value for this parameter would give better search precision, but
also take more time. If automatic configuration was used when the
index was created, the number of checks required to achieve the specified
precision is also computed. In such case, the parameters structure returned
by the \texttt{flann\_build\_index} function can be passed directly to the
\texttt{flann\_search} function.
\end{description}
The function returns two matrices, each of size $k \times m$. The first one contains, in which each column, the indexes (in the dataset matrix) of the $k$ nearest neighbors of the corresponding point from testset, while the second one contains the corresponding distances. The second matrix can be omitted when making the call if the distances to the nearest neighbors are not needed.
For the case where a single search will be performed with each index, the
\texttt{flann\_search} function accepts the dataset instead of the index as
first argument, in which case the index is created searched and then
deleted in one step. In this case the parameters structure passed to the
\texttt{flann\_search} function must also contain the fields of the
\texttt{build\_params} structure that would normally be passed to the
\texttt{flann\_build\_index} function if the index was build separately.
Let's look at a few examples showing how the functions described above are
used:
\subsubsection{Example 1:}
In this example the index is constructed using automatic parameter estimation, requesting 70\% as desired precision and using the default values for the build time and memory usage factors. The index is then used to search for the nearest-neighbors of the points in the testset matrix and finally the index is deleted.