Edd Gent in Singularity Hub: Despite their usefulness, large language models still have a reliability problem. A new study shows that a team of AIs working together can score up to 97 percent on US medical licensing exams, outperforming any single AI. While recent progress in large language models (LLMs) has led to systems capable of passing professional…