How I over-engineered exam studying
I hate exams. Between the frantic last-minute cramming, late nights and the general end-of-semester lack of motivation, it can make for a ton of stress.
Personally, a big portion of this stress comes from organizing the whole effort and not the actual studying itself. Blindly rushing into exam season is a surefire way to burn yourself out or even fail a course. To avoid this, I like to know exactly where I'm at with all my courses; what's at risk and what's left to be done.
To this end, I've taken to using checklists to keep all my pending studying in order. These checklists - several per course - track the work left to be done including lecture revision, problem sets and practice exams. Trello, a free project management tool, has been invaluable in keeping these in order. I've included a screenshot of what my fourth-year exam prep board looks like - the four lists are my courses, each with cards containing subtasks and checklists.
While these checklists are useful to get organized, they don't help with the big picture study questions - am I on track to complete course material? Do I have enough time? Which courses are high priority? With no way to really visualize progress, I was still feeling overwhelmed.
I realized that this was fundamentally a data problem as I had lots of information but no good way to visualize it. I've recently discovered a love for all things data and knew I had the tools to tackle this, so I decided to do something - I coded up a exam burndown chart.
For the uninitiated, a burndown chart is simply a graph of work remaining over time. It's typically used in agile software development methodologies to track project progress. A list of tasks are created and are assigned points based on their perceived difficulty. The points are summed up and plotted at the end of each day to reflect the amount of work completed. The end goal is to ensure the chart reaches zero by the end of the work period - in my case, I needed to complete all my exam studying in the two week study period.
Before I could setup my chart, I had to assign difficulty scores to all my tasks. This was the tricky part as it required quite a bit of guesstimation and there was great variation among course material - some had more involved assignments while some had lengthier slide decks. In general:
I've included two sample checklists below, where the scores for each checklist can be seen in the square brackets.
I fired up my current go-to data science stack which consists of Jupyter Notebook, Python 3 and Pandas. Using the fantastic Requests library and the Trello API, I wrote some code to connect to my Trello board and extract my checklist data. I ran this code every day I sat down to study and stored the output locally in csv files. Eventually I ended up with daily snapshots of the status of my Trello board.
All that was left to do was clean the data, load it into a pandas DataFrame and plot it with plotly! This was what I ended up with:
My studying began on July 25th and I wrote my last exam on August 10th. For the most part, I kept up a constant rate of work, which was great to see. It's also quite easy to identify the days I was most productive and the days I, quite literally, did nothing.
I knew I could do better. I wanted to track my progress on a per-course level and I also wanted to visualize my deadlines. The new chart includes a trace for each course and a corresponding vertical line to indicate the exam date.
This graph also revealed my time management strategy - due to the way my exams were grouped, I didn't touch the last two exams until I had completed the first two.
I wanted to be able to predict if I would hit my exam deadlines - i.e. do I need to speed up my studying? To get a rough estimate, I used the last two data points to extend each burndown line into the future (dotted segments). This is what my graph looked like on July 29th:
It was at this point I realized I should probably stop playing with this and get back to studying.
There were a couple of interesting queries I realized I could run on my data. For example, to find the day I was most productive, I had to look at the rate of change of work. Since I was using pandas, it was quite straightforward:
In [7]: diff = course_burndown.groupby(['date']).sum().diff()
diff.ix[diff['complete'].idxmax()]
Out[7]:
total 0.0
complete 325.0
remaining -325.0
Name: 2017-08-01 00:00:00, dtype: float64
The last line of the output tells me that I completed the most work on August 1st. Nice.
I'm confident there's plenty more I can do with this data and I plan on experimenting with it further. If this is interesting to you, the notebook with all my code and the associated data is available on GitHub.
While the two semesters I used this technique have yielded some of the highest GPAs in my university career, it's quite difficult to prove causation, especially with the many other factors involved. What I can say for sure is that this has helped take some of the stress off and is a great motivator - there is nothing more satisfying than checking off an item and watching that graph approach zero.
In terms of technology used, could I have made my life easier and done this in Excel or Google Sheets? Of course, but where's the fun in that?
All in all, this was a great exercise and much needed distraction from the monotony of exam season.