Comparing AI performance to humans
Updated: Mar 13
If we use accuracy and efficiency as our key metrics to compare AI and human performance, we need to put them into a matrix. A simple way to do this is to create performance bands. If we take human performance as the benchmark, we can create adjacent bands for below-human performance, and above-human performance.
We can then use the matrix (see below) to plot where AI performance lands on this scale. The benefit of this matrix is that it can be applied to many different contexts, and the standards for below-human, human, and above-human are relative to each task.
Lets look at a tangible example
Radiology exam - think small, discrete tasks AI works best when automating specific tasks, rather than entire processes. When we isolate tasks in a process and compare them using the matrix, we can see which are the best candidates for AI. For example, the tasks involved in a radiology exam may include organising time for a patient, operating imaging machinery, reading x-rays, and discussing results. Of these tasks, reading x-rays is ideal for AI. AI can process x-rays with above-human efficiency, and is capable of giving diagnoses with above-human accuracy. However, AI would struggle at discussing results, as the complexities around the patient, diagnosis, and treatment are too difficult to process at anything greater than a below-human level. By focusing on specific tasks, we can reframe our thinking from asking ‘how can AI replace people?’ to ‘how will people use AI?’