A k-NN is a very simple and effective classifier in the toolbox of an data scientist. It is based on the philosophy that any new test data point, should be similar to the k nearest train data points that we have already seen. Here too we expect the new data points to be coming from the same data distribution as the already seen data points.
For any test data point, we must check its distance with every train data point in order to determine the k nearest data points to it. This could slow down our model at test time if we have a very huge train data set. To circumvent this problem, there are several algorithms for quickly being able to query the nearest points. 2 of these are k-d trees and ball trees, present in sklearn here and here. This approach helps us in case when accuracy and runtime during test is critical but we have no issues storing such a huge train set.
The purpose of this post however is to speed up the k-NN search using another way. Instead of employing a faster way to find k nearest neighbors, we try to find a way to sample a smaller subset of the train dataset. Having a smaller test size will speed up the classifier at test time as we need to check a lesser number of data points to find the k nearest ones. It is however important that the accuracy is not impacted much. We now proceed by determining a baseling for selection criteria and then proceed to refine the technique. We will use the 1-NN classifier for evaluation.
The MNIST Dataset
Sadly, this post also unimaginatively picks up the MNIST dataset for its purpose. For those who don’t know, it’s a famous dataset of handwritten digits, provided to us by (Yann LeCun)[http://yann.lecun.com]. Almost every beginner’s tutorial or course project seems to be using this. But that is because how wonderful, easy to use and just-right in size the dataset is.
The dataset provides 28x28 pixel images, each having 1 handwritten digit. We have 60,000 images for train set and 10,000 images in test set, along with ground truth labels for each one of them.
Baseline model: Random picker
We begin by using a naive subset creation technique: randomly sampling datapoints.
I recently started my grad school at UC San Diego on Sept 27th, pursuing a Master’s degree in Computer Science. Being a great fan of participating in hackathons in my undergrad, I jumped on the opportunity for participating in the first hackthon I saw being organised in my campus! The hackathon UC Health Hack is organised by the Health departments of several UC colleges and Rady Children’s Hospital San Diego.
I arrived at the event’s location with my friend Akshansh, early in the morning. The hackathon was supposed to start at 8:30 on a Saturday morning (6th Oct), which seemed a bit unusual to me as all other hackathons I’ve participated in, usually commence on Friday evening and continue over the weekend. Thankfully, the location was just a 3 minute walk from our residence. We checking in at the front desk and devoured the breakfast, like a bunch of hungry students 😋. After the opening remarks by some incredible people, we began with the team building session. That’s when we met Jacques, an experienced nurse. Akshansh and I not being from a medical background, knew that collaborating together with Jacques would really help us build a truly impactful product.
The Problem 🤔
We got to brainstorming, thinking of all that a new patient goes through, starting from when they feel uneasy, to everything that happens in the hospital. We realised that on getting to a hospitals, patients need to fill out several forms, wait in the waiting area, have themselves diagnosed and then only have tests run on them. Also many patients end up visiting an ED, where the case could have simply been handled at an urgent care or retail clinics, usually for much cheaper.
We set out to build a mobile app, that would be used by the patient or their friends/family, soon after the patient starts feeling unwell, before leaving for, or on the way to a hospital. The app would present the patient a series of questions, all tailored to the patient’s current medical condition. The responses to the questions would then be sent to the hospital. This would allow the hospital to assess the patient’s condition and emergency and be better equipped to handle them as soon as they arrive.
- Have meaningful questions
- Extract maximum details, from a minimal set of questions
- Tailor questions, depending on previous answers
- Focus on the user experience
- Have an interface with big UI elements and minimal clutter, to facilitate patients of all ages to use it
By the time we were pretty set with our idea, the lunch was served!
Our mobile app platform was decided to be Android. Akshansh and Jacques set out to build the question bank. The questions would be in a tree structures, where the answer to a question would determine the next questions. Using Jacques’ experience and the help from mentors at the Hackathon, we were able to come up with the question flow for the app.
The Android App
While Akshansh was helping Jacques with the questions, I started working on the Android app. The first thing I did was to create an interface for the question bank. It worked as follows:
- It would start with the root question first.
- Accept the answer provided by the patient.
- Determine the next questions.
- The view would then present the next question to the patient.
- Repeat from step 2. An important part was that this interface should support questions that have multiple choice answers, and even questions to which the answer could be any sentence.
I then proceeded to make the views, while Akshansh started feeding the questions into the question bank interface.
Some features present here:
- As you can see, a simple de-cluttered UI.
- We have an option for the user to speak in the answers. More on this later.
- A button, to dial 911, incase an emergency suddenly arises while on way to hospital.
- The questions are all spoken out using the Text-to-Speech (TTS) library in Android.
We use the Android’s speech recognition library to convert the user’s speech to text. The text is then used to determine which answer was the user referring to, for the given question.
- For text bsed question, we have prepared a set of keywords. Keywords like cut, bruised, laceration etc help us determine the type of injury. Keywords consisting of body parts help us determine which part is the user referring to. Keywords like fever help us determine that it is probably a non-injury case.
- For MCQ based question, we try to match the user’s input with each of the available options. The used heuristic for matching is the Levenshtein distance. The option with the closest distance is what the user probably said.
The question flow
We try to determine is much information as possible from the root question “How can I help you?”. For instance:
If a user says he has a cut, we ask him which body part have they injured and then send a picture of the said cut.
If however the user says “I have cut my finger”, the user is directly taken to the page asking them to send the picture of it, as we already know the body part.
For non injury, we present the user with different kinds of problems
and depending upon the selection, different questions are asked:
All the above intelligence is built into the Question Bank.
All this information collected is then sent to the hospital portal, where the care giver can see the incoming patient’s condition, evaluate it and be prepared even before the patient comes in.
The Hospital’s portal
The web portal was developed in Python, using the Django framework. The front end used the Bootstrap framework.
|List of patients|
How I improved from previous Hackathons
- Worked in a team with Jacques who had a different background than us. He is a nurse. This allowed us to use his domain specific knowledge and create a better product.
- I’m happy that this time the app has a good UI. Even if it might be minimalistic and simple, it is elegant and neat.
What I learnt from this Hackathon
- It is better to use APIs to reduce the amount of effort required. We built everything, including all chat bot functionality and answer text processing, from scratch. We could have used existing chat APIs and saved us some time.
- I went on to built functionality like login on hospital portal, profiles in patient app etc. I spent almost 2-3 hours on these functionalities, which could have instead been spent on perfecting the pitch or presentation.
- All finalists had really amazing and visually appealing presentations. It semms we should have put some more time making a better presentation.
It was amazing!
We had lots of fun hacking! We stayed up all night. From morning 8am on Saturday to 9 am on Sunday, with no sleep! Hacking all this together in less than 24 hours was a great experience. I’d really like to thank the amazon organisers and for their support and the loads of food, 🍕, fruits and chips that was provided.
Shout out to my teammates Jacques (and his company Ducfully) and Akshansh! And congrats to the winners!
Our business plan and Github repo.
Some more pics 😀
|Team!||The pitch 🙂||Bruno!!!|
|Hacking outdoors 👨🏼💻|
Peace n chow ✌🏼