[arXiv]score: 0.68
LLM Vector Embeddings Leak Sensitive Clinical Attributes Even After Access Controls
May 27, 2026
Auditing clinical discharge-summary LLM representations shows that reducing sensitive-label recoverability (EHR-recorded race) from one exported artifact (final hidden state vs. mean-pooled prompt) doesn't prevent leakage from the other, with SurfaceLoRA proposed as a mitigation.
cs.CL
HOW THIS AFFECTS YOU
●
builderIf your pipeline exports LLM embeddings to downstream services, those vectors may expose sensitive attributes from restricted source documents even without direct data access.
●
researcherDemonstrates that multi-artifact auditing is necessary — mitigating sensitive-attribute leakage in one representation type doesn't generalize to others.
●
policyResidual information-disclosure risk from exported LLM vectors challenges the assumption that restricting source document access is sufficient for privacy compliance.
●
healthClinical EHR data processed through LLM summarization systems may leak protected attributes (race, likely others) via exported embeddings passed to audit or analytics workflows.