Unit Testing Unit Tests
I’m going to introduce the idea of testing, both unit and integration, at about week five after teaching functions, because I want students to get into the habit of writing unit tests. I also want to autograde everything I can possibly autograde.
I’m using a platform called Mimir to handle assignments, because I want to autograde. Autograde, autograde, autograde! I really do not want to read code, download and run code, or anything of the sort. I’m basically trying to minimize the amount of busywork I need to do in order to free up time for my students.
So on Mimir, I can write up an assignment description, upload starter code and solution code, and provide test cases for I/O (ie read input() and print()) or write actual unit test cases (ie assert some_function(with_parameter) == some_value). I test student code this way against my unit tests.
But, how do I test student unit tests? I was trying really hard for the past few days to figure this out, and finally on Friday, I made a breakthrough. Mimir has what’s called writing custom test cases, and it basically allows for bash scripting. My idea was to first run the student unit tests to see if they all pass, and then I’ll actually dig into each unit test. I haven’t written much bash so it took me a while to run the student unit test file and compare the outcome with a string — for some reason, it took me forever to figure out how to save the output into a variable. I did search it but it never did the right thing. But I did finally get it to work, and it was pretty fun to write. Who knew I would be writing so much code?!
For digging into each unit test, my idea is to read each line of the code and check if a function is called x number of times. I’m going to tell my students that for each function, there has to be at least three (probably — I haven’t actually decided yet and I should base it off of the actual complexity of the function so it shouldn’t be the same for all functions) unit tests, so I’m going to count the number of times that function shows up in the test to ensure how many times they’ve called it.
I’ll also check for basic edge cases, like empty string, 0, negative numbers, empty list / dictionary / set etc, depending on the parameter data type. Beyond that, I’ll just spot-check their code because again, I want to autograde but I probably can’t actually autograde each test and see what they’re doing unless I open each file and look at it which seems like way too much work.
I’m really glad I figured this out because it was on my mind for a while!