NeuralNine finger print recognition
We will make a program to find who’s fingerprint is this?
Dataset
We will be using a kaggle dataset finally!
https://www.kaggle.com/datasets/ruizgara/socofing
we use a fingerprint dataset from somebody
now, in this dataset we have 2 folders. One is altered data, so somebody has their fingerprint defected. Some computer drew some strokes on the finger. Another is real which has real fingerprints.
Real → altered fingers are shown above
I don’t know the context as to if these altered fingerprints are going to be counted as true human fingers or not, I believe they sohuld be because it kind of looks like the finger is pressing down on the camera.
Lets wait for the actual program to see for certain.
there datasets are always ginormous.
Once downloaded, lets put it inside of the same folder as our python program
SOCOFing
weird how there is a SOCOFing folder inside of another SOCOFing.
Understanding SOCOFing
https://arxiv.org/ftp/arxiv/papers/1807/1807.10609.pdf
the Sokoto Coventry Fingerprint dataset consists of 6,000 images of fingers from 600 africans. The images vary in gender and have 3 types of alterations. Obliteration, central rotation and z-cut. They are all denoted as Obl, CR, Zcut.
Below is an example of each:
central rotation. Part of the finger is rotated
obliteration. Some of the finger is noised.
Zcut. Some marking of a finger cut I guess. Very sloppily done.
All the files are in bmap too
there are also difficulties in the alterations.
Easy medium hard.
Depending on the difficulty, the alterations will be more intense. Hard has the most striking differences from the original.
Fingerprints are very complex and unique, its extremely unlikely to find 2 fingerprints that are the exact same. Thus, we are able to find a specific fingerprint regardless of the defects.
Programming
The we need 2 libraries. Os and cv2
im going to load in an image then im going to print it. It should be a numpy array of 255 color.
and yes it is.
The image I chose looks like this:
it 19th subject’s left thumb with a central rotation applied.
Note that this is a modified image. We give it a modified image, it tries to find the original image.
we have 2 variables imageguess and filename.
We also have a few variables for common similarities between the 2 images.
2 keypoints and 1 matchpoint variable to compare 2 images always. And through those variables, we will receive a score. We check the similarity score of the current image with the sample image to see if it beats our previous bestmatchscore.
Then, we want to go through all the files and then check each one if they are similar to the sample image.
there are in total 6,000 images. Windows file sorting will make it so that the more lowvalue digits there is, the higher it is on the sorting hierarchy. So 11 would be greater than 2. like I only have person no# 19, but I need to list around 2000 files to actually get there, like if it was following normal commonsense sorting then it would be 190 I believe.
Just know that file is a string for the filename. Its not a file location. Its just a string for the filename. In order for us to read the image, we need to append the directory(relative) to it.
and above code will print all the file images.
Ok remove the printing section, all we need is the cv2 image read line
ok, next we want to create a SIFT object. SIFT is Scale Invariant Feature Transfer. https://en.wikipedia.org/wiki/Scale-invariant_feature_transform
this will get us to find all the differences between the 2 images through finding keypoints.
SIFT keypoints are gathered through comparing the sample image to our 2000 original images and finding features based on euclidian distance of the feature vectors. Sets of keypoints that agree on location, scale orientation will tally up and contribute to an overall confidence rating. We find the correct image based on the confidence rating.
So, we make the sift mask like this:
keypoints are interesting and unique structures in the image, descriptors are descriptions of these keypoints. Lets make them like this:
so we get our keypoints from the sift. One for the sample image, and another for the real image we are comparing it to.
Next, we match the keypoints. We use Flann which is fast library for approximate nearest neighbours.
Algorithm 1 for KD tree data structure. Aswell as 10 trees for the algorithm to use. We have a knn match with 2 k for the 2 keypoints aswell as their descrptors.
Before we go further it is a good idea to understand SIFT a bit more.
SIFT
SIFT is used for object recognition. Finding similarities between 2 objects through use of keypoints.
Keypoints are specific points on an image that have a unique visual characteristic that would not change much if scaled, rotated or illuminated more. Keypoints are often found in the SIFT algorithm from applying a DoG(difference of gausiun) pyramin and finding the local extremas.
Descriptors are the numerical representation of the local image AROUND the keypoint. Typically a histogram of gradient orientations of the image patch around the image.
To match keypoints between different images, you need to compare the descriptors of keypoints from one image with the descriptors of keypoints from another image. Commonly we are using distance to compare 2 images.
Ok back to the code.
this is what we have next. We want to parse through our matches and make a list of the close matches. Our threshold is that the difference between 1 distance to another has to be less than 10%.
next we want a keypoint quantity. Note we are SIFTing 2 images so the 2 images may have different quantity of keypoints. We get the smaller number of keypoints to make parsing a lot easier.
we will always want the image with the smaller # of keypoints to use as our keypoint quantity.
Then, almost there. We want to check the similarscore of this image to that of the best score.
update the file, the imageguess, acutally you don’t even need the imageguess I believe.
Ok last thing.
print the best match score and show the matches of SIFT. And then show the image.
Cool I guess.****