Last week was the Seventh Circuit Electronic Discovery Committee Workshop on Computer-Assisted Review, held in Chicago, and if you missed it, you missed a lot. But like any event in which a lot of information is thrown at you, it answered a lot of questions, but raised others.
For example, I believe I heard one of the participants compare human reviewers to predictive coding by remarking that the numbers with respect to recall weren’t that far apart, but that the “computers knocked it out of the park” with respect to precision.
Recall is the percentage of relevant documents actually retrieved by a search methodology. Say there are 1000 relevant documents in a data set of 5000 documents and predictive coding finds 850 of them; it has a recall of 85%. According to the speaker, this is not that much better than what can be expected of a human reviewer.
Precision is the percentage of the documents retrieved that are relevant. Let’s say the found set returned by the predictive coding software referred to in the paragraph above returns the 850 relevant documents, but also includes 500 documents that are found to be not relevant on closer inspection. Its precision would be 850 divided by the total number of documents returned, 1350, or about 63%. According to the speaker, human reviewers are likely to return significantly more irrelevant documents for each relevant one.
But I’m not sure of the practical significance of this. “Precision” has such a nice ring to it, but what does it mean? I’m not sure I care what percentage of generally relevant documents a system returns to me; what I want to know is what percentage of the key documents – the handful likely to become case-determinative exhibits – are being returned to me.
Never bring a knife to a gun fight, I know. And when it comes to a gun fight over statistics, I can barely bring a sharpened pencil, much less a knife. But having said that, let’s take another look at the data set of 5000 documents. By definition, it contains 1000 relevant documents and predictive coding was able to find 850 of them. If I heard the speaker right, human reviewers are likely to find slightly fewer, maybe only 800. We do not, however, know the overlap.
Think of the 1000 relevant documents schematically as a line of numbered documents stretching from 1 to 1000. From which parts of the line are the 850 found by predictive coding? And from which parts of the line are the 800 found by the humans? If predictive coding finds documents No. 1 through 850 and the humans find 1 through 800, but the handful of key documents are between No. 930 and 970; neither system is going to find them. If predictive coding finds documents 1 through 850 and the humans find 200 through 1000, then the humans are going to find all of the key documents and predictive coding will find none of them. Likewise, if the key documents are clustered around No. 150, the situation reverses – predictive coding finds all of them, humans none of them. Those are the simplest situations. Of course the handful of key documents can be scattered irregularly throughout 1 to 1000 and the different methodologies don’t have to have contiguous returns; they could return Nos. 1 through 37, 42 through 123, etc., with the question being whether or not the key docs are hiding in the gaps.
The need to consider the quality of the documents missed by either system – the “keyness” of overlooked documents – was mentioned at the workshop, but it is not clear how this might be done. One of the names told me afterwards that predictive coding looks for words and patterns of words and is happiest with at least six words to work with, meaning that “cryptics” are at risk of being overlooked. I haven’t any studies or even anecdotal evidence, but it seems to me that key docs may well be cryptic because people seldom announce they are going to do something unethical, illegal or wrong with full and proper syntax. It also seems to me that a human reviewer is perhaps more likely, using his or her humaness and experience, to deem a cryptic relevant because of surrounding documents that may give it more context, or the memory of other documents that might shed light on its relevance in a manner a computer cannot. So I’m not sure if humans or machines win and thus still have to wonder why I would go with predictive coding at this point.
I would very much like to see this point addressed, shot down, disposed of, etc., by someone with the cred to do so.
Richard Neidinger, J.D.