Having just finished the final units I need to qualify for an undergraduate degree, the topic of examinations is still fresh in my mind. Generally these fall into two categories: open-book and closed-book; with two major categories of question: multiple-choice and short-answer.

The exact mix of open/closed and MC/SA will vary from professor to professor and course to course. It can also vary based on the nature of the field and the ratio of teaching staff to students. As a law student I faced a common theme of open-book short-answer exams. During an “intro to psych” unit, all exams were multiple choice — there were 600 students in the course and two lecturers.

But all of these formats have one thing in common: the exam *questions* are secrets.

Much of the efficacy of the exam is tied up with protecting the questions from disclosure. In a sense this is a bit like relying on a secret key for the efficiency of a cipher: as soon as the key is revealed, the cipher is no longer effective.

Why have a secrecy requirement? Consider the opposite case where the questions are simply reused every year. The problem is that the student can simply memorise “the” answers. This is generally considered unacceptable because the potential set of questions is always going to be too small to properly determine the student’s mastery of the subject.

What this reveals is that exams are basically an attempt at statistical sampling: some quasi-random subset of all possible questions is selected. The student’s performance on that subset is taken as a meaningful proxy of their overall mastery of the subject.

So far so good. But note that I said it’s a quasi-random subset. Why does that subset have to be created from scratch each year? Because of the secrecy-of-questions requirement.

But what if, instead of creating new questions each year, there was instead some portfolio of (say) 1,000 questions that is reused each year? The student is then examined on (say) 10 of these in the final exam.

At no point are the questions secret. Students may study and review them whenever and however they please. They simply will not know in advance *which* of the questions will be asked of them. Some set of questions will be randomly selected immediately before the exam papers are printed. It could even be made double-blind, with lecturers not knowing which questions will be asked.

I imagine that one of three things could happen:

- Students could devise and memorise answers to all, or a large subset of, the questions. In which case, won’t they have had to learn the subject matter? Even the act of rote memorisation can lead to pre-conscious synthesis of key principles as a basis for future reasoning.
- Students with prodigious memory or trained in mnemonic techniques will do better; but they do so already.
- Some students will not be motivated and will simply fail under the new scheme. Again, no change.

Therefore I hypothesise that this approach – the “question portfolio” – would provide a better method of examination than the current approach.

Additional benefits:

- Questions are already linked with learning outcomes — students could be told what the link is.
- Students can precisely calibrate their current understanding by taking randomised tests when it suits them.
- Questions can receive much higher investment, as they will not be discarded each year.

Drawbacks:

- High initial cost of developing a large corpus of questions.
- Ongoing costs of “managing the portfolio” to reflect improvements, changes in subject etc.
- It’s unusual and may face resistance or bureaucratic inertia. For instance, it may not be compatible with university rules.

Of course this is all mere speculation on my part. I am not an expert in education; but with the greatest possible respect, neither are my professors.

At the very least, we could put this to the test. Develop a corpus of questions for (say) 20 subjects. Then, at the beginning of the semester, randomly select 10 of them to be taught with open questions, 10 of them to be taught to secret questions. Compare the average performance of those two sets with historical performances. That should give a fuzzy feel for whether it works better or not. I’m sure Andrew Leigh would know a better way to do it, but that’s my gut sense of how it might work.

Thoughts?