The SQL Murder Mystery Game

Oh noes! There has been a murder in SQL City! Can you find the murderer?

The SQL Murder Mystery Game
© Image Copyright 2019 Northwestern University

Hello, fellow adventurers of the digital realm!

Today, I have an exhilarating tale to share with you—one that involves mysterious crimes, SQL wizardry, and the joy of stumbling upon a captivating learning experience. Join me on this thrilling journey as I recount my encounter with the SQL Murder Mystery game and my newfound quest to provide you with the ultimate solution!

But seriously, I recently came across this lovely game called SQL Murder Mystery Game published by the knight lab. The setting of the game is that there has been a murder in SQL City and it is your task to find the murderer. The clue of the game is that in order for you to find the perpetrator, you can only use SQL!

The knight lab themselves describe the game in the following way:

The SQL Murder Mystery is designed to be both a self-directed lesson to learn SQL concepts and commands and a fun game for experienced SQL users to solve an intriguing crime.

And indeed, it is. I had a blast trying it out, and I find it a wonderfully creative idea that encourages you to learn and use SQL beyond the usually very dry tutorials, and oftentimes dry reality of using SQL in actual production databases. So, I highly recommend everyone to give this one a go if they can spare some time. I somewhat disagree that you need to be an experienced SQL user because I believe this also accessible to beginner SQL users. However, understanding how JOIN's work is certainly important.

Now, for today's post, I thought it would be fun to walk through the solution with you, so feel free to follow along with my beginner-friendly solution. Obviously, this is a...


💡
🚨🚨🚨 SPOILER! 🚨🚨🚨

That means, if you want to solve it yourself, do NOT read on!

Now let's get started!

Step 1: Getting an overview about the database

Before we can even think about how to best approach finding the perpetrator, it is a good idea to actually figure out the structure of the available data. To that end, we should explore the actual database structure. The necessary query is already given by the creators of the game:

SELECT name 
  FROM sqlite_master
 WHERE type = 'table'

This will give you an overview about all the database tables. You can either explore the column types of each table using:

SELECT sql 
  FROM sqlite_master
 WHERE name = 'crime_scene_report'

and or explore the actual data using simple SELECT all queries, such as:

SELECT * FROM crime_scene_report

You will notice that there is quite a bit of data in most of the tables, so simply using SELECT all queries and going through the data row by row won't get you anywhere any time soon. But that's obviously also not the the fun of the game!

Now the relationships can get relatively complex relatively quickly, so I would recommend using an analog or digital notebook to make notes. Luckily, if you don't feel like doing this to map out the relations between the different tables, the creators of the game got you covered and provide you with a handy overview in the form of a database scheme diagram.

Step 2: Figuring out what the witnesses saw

Now that we have a first overview about the available data, we can start solving the crime!

We have been given a starting point by the creators of the game that highlights the important key points to get us going:

The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database.

Aha! So, we know that the murder took place on Jan. 15, 2018 in SQL City and that in order to get the witness reports, we need to query the crime_scene_report database. So let's do just that:

SELECT * FROM crime_scene_report
WHERE date = 20180115
AND type = "murder"
AND city = "SQL City";

... which gives us the following entry:

Security footage shows that there were 2 witnesses. The first witness lives at the last house on "Northwestern Dr". The second witness, named Annabel, lives somewhere on "Franklin Ave".

There we have it: two witnesses and we have some information on them. Now we only need to look whether we have any records of their interviews. Thus, let's query the interview table for just that.

But, wait a second... we only have the person_id along with the transcript's in that table. Unfortunately, we do not know the witnesses' person_id yet. So we actually have to start finding them. For that, we can use the person table.

We know that the first witness lives in the last house on Northwestern Dr. So let's query the table in the following way:

SELECT * FROM person
WHERE address_street_name = "Northwestern Dr"
ORDER BY address_number DESC;

And we get the first witness name, Morty Schapiro, and id, 14887. At least, so we think, because we will still need to verify whether we actually have an interview record on him.

Now let's find, the second witness. We know at least one of her names is Annabel and she lives in Franklin Ave. Let's query the table for that:

SELECT * FROM person
WHERE address_street_name = "Franklin Ave"
AND name LIKE "%Annabel%";

Notice the use of wildcard operator here for the name, since we don't know whether Annabel is her first or middle name. Luckily, we only get one entry returned showing that her name is Annabel Miller and her id is 16371.

Now we are ready to query the interview table to find out what they claimed to have seen:

SELECT * FROM interview
JOIN person
ON interview.person_id = person.id
WHERE person_id = 14887
OR person_id = 16371;
We technically do not need to JOIN here, but it is nice to see the clear names along with their interview statements

Great! We actually have witness reports of both of them. Now, our first witness, Morty Schapiro, claims that he saw a man carrying a Get Fit Now Gym bag with the membership number starting with 48Z running out of the building. He is certain that only gold members have those bags. Furthermore, he saw him getting into a car with the license plate that included "H42W". Our second witness Annabel Miller, on the other hand, even claims to have seen the murder taken place and to have recognized the killer from her gym. Apparently, she had seen him when working out last week on January 9th.

Now there is a lot of information to work with. Looking at the data in the tables available, there is at least two angles we can work. We can dig deeper into the gym membership data to find out the murderer's identity or attempt to find him via the car he got into. As a good detective, we should follow up on both leads either way to have concurring information.

Step 3: Working the gym membership angle

Let us first start by working the gym membership angle. Recapping the information we have on the suspected murderer is that he has been in the gym on January 9th, at the same time as our second witness, Annabel Miller, was there. He also has a gold membership and a membership id starting with "48Z".

We could simply run the following query on the get_fit_now_member table:

SELECT * FROM get_fit_now_member
WHERE id LIKE "48Z%"
AND membership_status = "gold";

However, this will return us two gym members. So we will need to further refine our query. For that, let's join the tables get_fit_now_member and get_fit_now_check_in and restrict our query to the date Annabel reported to have been at the gym:

SELECT * FROM get_fit_now_member
JOIN get_fit_now_check_in
ON get_fit_now_member.id = get_fit_now_check_in.membership_id
WHERE id LIKE "48Z%"
AND membership_status = "gold"
AND check_in_date = 20180109;

Damn it! Still the same two members as both have checked in on the same day Annabel did. However, Annabel said she actually saw the suspect that day. So, if we figure out when Annabel was actually in the gym, and compare it to when either of the two persons were present, we might know more. So we notice from this query the check-in times of the two members: a Joe Germuska was checked in from 16.00 till 17.30 and a Jeremy Bowers from 15.30 till 17.00.

Now let's check the check-in times of our second witness, Annabel Miller:

SELECT * FROM get_fit_now_member
JOIN get_fit_now_check_in
ON get_fit_now_member.id = get_fit_now_check_in.membership_id
WHERE name = "Annabel Miller"
AND check_in_date = 20180109;

Unfortunately, this doesn't help us either. Annabel Miller was checked in from 16.00 till 17.00, so she could have seen either Joe Germuska or Jeremy Bowers during that time.

At this point, the gym membership lead has run dry. But we have learned a great deal and narrowed down our suspect list to two persons. Maybe combining this information with the information we might get from working the car angle will gives us a single suspect. Let's do that next!

Step 4: Working the car angle

Let us recap what we know about the car angle: our first witness has seen the suspect getting into a car with the license plate that included "H42W". To get the name of the owner of a given car with a given license plate, we need to join the person and drivers_license tables along with our query. So let's do that:

SELECT * FROM drivers_license
JOIN person
on drivers_license.id = person.license_id
WHERE plate_number LIKE "%H42W%";

A-ha ! Gotcha! It is only Jeremy Bowers, who owns a car that fits this description. We could double-check and would find that Joe Germuska doesn't even have a car under his name.

Step 5: Putting it all together and report our prime suspect

At this point, we can put all the information together: We know that two witnesses have seen a man at the site of the murder, with one going as far as to claim she has actually seen a man commiting the crime. Following up the leads given by these two witnesses provided strong evidence for only one suspect that simultaneously owns a car with the reported licenses plate, as well as being a member in the gym of one of the witnesses who recognized him a week before the murder. And that man is Jeremy Bowers. So let's report him as our solution:

INSERT INTO solution VALUES (1, 'Jeremy Bowers');
        
        SELECT value FROM solution;

Ladies and gentlemen, we got him!

Congrats, you found the murderer! But wait, there's more... If you think you're up for a challenge, try querying the interview transcript of the murderer to find the real villain behind this crime. If you feel especially confident in your SQL skills, try to complete this final step with no more than 2 queries. Use this same INSERT statement with your new suspect to check your answer.

Step 6: Finding the real villain

Of course... there must have been a mastermind behind this crime! So, let us find that mastermind with the added difficulty that we only have two queries to achieve that.

Now, we already know our first query, which is querying for the interview transcript of the actual suspected murderer, Jeremy Bowers. We can apply the same query logic that we used earlier when querying for the transcripts of the witnesses:

SELECT * FROM interview
JOIN person
ON person_id = id
WHERE name = "Jeremy Bowers";

We get quite a bit of information from the interview transcript as the suspected murderer reports the following:

I was hired by a woman with a lot of money. I don't know her name but I know she's around 5'5" (65") or 5'7" (67"). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.

How can we use all this information in just one query to find the real villain?

Our goal is to find the name of the constituent of the murder and our best bet for finding actual names in the person table. Next, we know some physical features of that person, which we can only really find in the drivers_license table. Coincidentally, we can also find the car brand and car model information in the same table. Finally, we can further narrow down the search by using the facebook_event_checkin table since it is the only table that has information on a person's event attendance on a given date (other than going to the gym). In other words, we will need to join three tables, person, drivers_license and facebook_event_checkin and then restrict our query to the specific parameters mentioned by the suspected murderer. So let's do just that:

SELECT * FROM person
JOIN drivers_license
ON person.license_id = drivers_license.id
JOIN facebook_event_checkin
ON person.id = facebook_event_checkin.person_id
WHERE drivers_license.gender = "female"
AND drivers_license.hair_color = "red"
AND drivers_license.height BETWEEN 65 AND 67
AND drivers_license.car_make = "Tesla"
AND drivers_license.car_model = "Model S"
AND facebook_event_checkin.event_name LIKE "%SQL Symphony Concert%"
AND facebook_event_checkin.date BETWEEN 20171201 AND 20171231;

Note: You might wonder why we did not use the income information from the interview transcript. The reason is that, with our solution approach to this game, we actually haven't used income information (from the income table) before. We therefore have no way of knowing what is "a lot of money", neither in general nor in particular since we have never actually seen any income from the income table. We could have joined the income table as well, and sorted the results by income.annual_income but this would have only been informative if there was more than one result otherwise annual_income would have been redundant. And if that were the case, we still wouldn't know which one of the two or more options, could be considered a woman having "a lot of money".

Luckily, the result of the above query is pretty clear. Three event entries with the same person whose description matches perfectly with what was said in the interview. We can therefore tentatively say that we have found our mastermind. It must be Miranda Priestly!

Let us verify our solution with the game:

INSERT INTO solution VALUES (1, 'Miranda Priestly');
        
        SELECT value FROM solution;

Which returns:

Congrats, you found the brains behind the murder! Everyone in SQL City hails you as the greatest SQL detective of all time. Time to break out the champagne!

Congrats to us, indeed! We have beaten the game !!!

A final note

Now, obviously this was just one way to find the solution. There is multiple solutions as well as multiple ways of reducing the number of queries to the database to find the murderer and mastermind . So, I highly encourage you to try other solutions out for yourself and maybe share them here in the comments!

In any event, I hope you had fun following along the solution to the SQL Murder Mystery Game and I hope to see you in the next one!


Bonus

DevOps Directive's YouTube channel posted a rather hacky solution to this game in which you can find the murderer and mastermind with not a single query. If that's something that interests you, check it out here: