[arXiv]score: 0.68

LLM Vector Embeddings Leak Sensitive Clinical Attributes Even After Access Controls

May 27, 2026

Auditing clinical discharge-summary LLM representations shows that reducing sensitive-label recoverability (EHR-recorded race) from one exported artifact (final hidden state vs. mean-pooled prompt) doesn't prevent leakage from the other, with SurfaceLoRA proposed as a mitigation.

cs.CL

HOW THIS AFFECTS YOU

●

builderIf your pipeline exports LLM embeddings to downstream services, those vectors may expose sensitive attributes from restricted source documents even without direct data access.

●

researcherDemonstrates that multi-artifact auditing is necessary — mitigating sensitive-attribute leakage in one representation type doesn't generalize to others.

●

policyResidual information-disclosure risk from exported LLM vectors challenges the assumption that restricting source document access is sufficient for privacy compliance.

●

healthClinical EHR data processed through LLM summarization systems may leak protected attributes (race, likely others) via exported embeddings passed to audit or analytics workflows.

SOURCE

https://arxiv.org/abs/2605.26433

← back to feed