🍈 Zettelkasten

❯

Machine Learning

❯

KD Trees

Aug 21, 20254 min read

programming
algorithm

https://www.youtube.com/watch?v=BK5x7IUTIyU

Again, like I said before FLANN uses the KD tree data structure. It is a datastructure used to assist in finding the closest point to a select point. It is one of the quickest methods to finding close distances of one point to another.

The KD tree is like a regular tree but instead of holding 1 binary value, it holds K values.

You see a regular tree would just have True or False. A K-D tree point will be able to hold K ammount of properties. This means that this tree can be in the 3rd dimension if K is 3, 4th dimension if K is 4.

the K-D tree above has 2 for K.

so, the question here is that we have a grid of arbitrary points here. And we want to plop a point down and find where the closest point to our placed point is.

lets say every point has the 2 properties x and y.

if we had all these points in memory, we could perform a binary search and just check every point with the previous highscore and see if it beats the distance. That would take a tremendous ammount of time though.

So, we would put these points into a data structure that allows us to better search through them. The K-D tree is the solution to this.

The K-D tree will split the points into 2 groups, then it will split those groups into 2 more groups, and on and on until there is only 1 node left in its own group.

And so our first split can look like this:

we split at lets say point A.

this is how the tree is constructed. It will find a point it wants to split at, then it makes that the rootnode/currentnode.

So our tree will start with root node A

.

so a general rule of thumb is that since we are using 2 dimensions, a 2 = K tree, its a good idea to alternate the splits on different axis. So split once on the X axis, split another on the Y axis.

so, we split not exactly straight, we split on a point, these points being point B and point C.

then, we repeat once more.

we would technically still split it once more time. Its not truly helpful.

we just have leaf nodes now.

So now we compare our point with a point on the grid.

our root node is A. we compare our point’s X to A’s X. if its less, then we continue on the left branch. We go to child B.

then we compare B.

my point has smaller Y than B, so I go right branch to E.

now compare E’s X

we are on E’s left side. Lets go to left branch.

And at last we are at the endpoint. G.

But we are not quite done yet.

There might be some tricky points like a point on the other side of E.

say like this. We never saw that because we are only checking for if our written point is left or right of E. it is left, but there is a green that is closer

so we need to make our way back up the tree A TINY bit.

We first say ok, the minimum distance is point G. then we ask ourselves, is the distance from our point to E smaller? If not, we are still OK. Now we want to check the values on the other side of E. if there is no values on the other side of the E that is closer than our current min distance, then we are good. As long as theres a change theres a point that could be super close, usually we just go up 1 or 2 nodes.

So il tell you this is how it checks.

we may be tempted to check B. on the other side of B. but, we firstly need to compare the distance of our min distance to the distance of B. as long as our min distance is bigger than the distance of B, then there is no point in checking the other side of B.

check A, no, still farther than our G point. And we are done. thus, we know our min distance point is G.

this data structure works very well for a 2d point because we know how to split things.

Graph View

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community