●researcherEstablishes a benchmark for LLM performance on clinical discourse annotation, with kappa-based inter-rater comparisons useful for calibrating NLP evaluation in low-resource clinical tasks.
●healthWorth watching because automated CIU scoring could reduce burden on trained raters in aphasia assessment, but zero-shot failure means prompt engineering or fine-tuning is required before clinical use.