Patient de-identification and GDPR

What is anonymization?

Following industry best-practices, eScan Academy uses a standards-based approach to de-identification of DICOM images to insure that images are free of protected health information (PHI). Our de-identification process is developed in accordance with the requirements set forth by the Federal Drug Administration (FDA) and the European Medicines Agency (EMA) and comply with the EU General Data Protection Regulation (GDPR).


These requirements are defined in the Health Insurance Portability and Accountability Act (HIPAA) section164.514(b)(2) of the HIPPA Privacy Rule. The standard for de-identification of DICOM objects is defined by the DICOM Standard PS 3.15-2011 Digital Imaging and Communications in Medicine (DICOM), Part 15: Security and System Management Profiles.

eScan Academy does not store sensitive patient information. The de-identification process in eScan Academy happens automatically and all data is de-identified locally at the submitting site before data it is uploaded via secure communication to the eScan Academy servers.

What is Protected Health Information (PHI)?

PHI is defined as "individually identifiable health information". In other words, information that can be used to directly or indirectly identify an individual in relation to the individual’s past, present or future health condition and the provision of health care to the individual. Common types of PHI includes: patient name, address, birth date, social security number, medical and laboratory reports, physician name, hospital name, and date of examination. PHI can be embedded in both DICOM tags and pixel data.

Our de-identification process

The process of de-identification, by which PHI are removed from the health information in the data process by eScan Academy, mitigates privacy risks to individuals and thereby supports the use of data for educational purposes. The de-identification process in eScan Academy is an automated 3-step process, in which two de-identification methods are deployed: 1) automated redaction of individual PHI identifiers (in DICOM tags and pixel data), and 2) formal determination by a qualified expert (i.e. only a qualified individual can determine when PHI has been properly removed). Both methods are run locally (in your web-browser) which means that no PHI will leave your closed network. Upon successful completion of the de-identification process, the de-identified data is automatically uploaded via secure communication to the eScan Academy servers.

Note, however, that both methods, even when properly applied, yield de-identified data that retains some risk of identification.  Although the risk is very small, it is not zero, and there is a possibility that de-identified data could be linked back to the identity of the patient to which it corresponds. Regardless, neither HIPAA, EMA, and GDPR restricts the use or disclosure of de-identified health information, as it is no longer considered protected health information. Data processed by eScan Academy is an example of de-identified health information.

Step 1: Automated redaction of individual PHI identifiers stored in DICOM tags is the first step in our de-identification process. This step conform to the current DICOM standard to ensure that data imported to eScan Academy is transformed using approved reduction techniques such as generalisation of the data by grouping of values into categories, and suppression/masking of data where specific values, or whole records are removed from the dataset. See list of de-identified DICOM tags below.

Step 2: The second step in the eScan Academy de-identification process involves our optical character recognition (OCR) engine. In this step all images that are commonly known to store PHI (such as x-ray and mammography) is thoroughly scanned for characters embedded directly in the pixel data. This happens automatically, and in the event that PHI (or what our engine believes is PHI) is detected, the affected data will be invalidated. Invalidated data cannot be uploaded and requires manual expert review, as explained in step 3.

Step 3: Invalid data is automatically sent to manual expert review. During the review, the person responsible for data upload will get access to an OCR report detailing all detected characters. After careful consideration the invalid data can be either omitted from upload, or manually validated by redaction (detected characters are blanked out), or accepted (in the event the discovered characters does not hold PHI) and in turn proceed to upload.

Step 1 to 3 is performed locally in your web browser, and ensures no data containing PHI is uploaded to the eScan Academy servers. After successful de-identification, validated data is uploaded via secure communication to the eScan Academy servers.


Base level de-identification 

Patient Name and Patient ID are either blanked or modified. eScan Academy does not perform ID mapping between the original Patient ID and the system ID that the images will have within eScan Academy. Any mapping that is performed manually at the submitting site, is the sole responsibility of the submitting site, and eScan Academy never sees the original Patient ID. Such data is defined as pseudo-de-identified data. To show that the Patient Identity has been removed, the term “YES” is written into DICOM tag 00120062 “PatientIdentityRemoved”.

Exam identifiers

DICOM makes extensive use of universal identifiers (UID) that could be used to identify a subject if a user had access to the PACS system at the institution where the images originated. eScan Academy uses its own root UID and then removes the original UID. UIDs have no special meaning other than serving as unique identifiers. This technique insures that images stay associated with the appropriate series, study, and subject as well as ensuring that referenced images between secondary capture images, structured reports, PET/CT, etc. are still valid references to images within eScan Academy.

Patient demographics

The keep "Patient Characteristics Option" in the DICOM standard allows keeping some patient demographics. The allowed fields are Patient’s Sex, Patient’s Age, Patient’s Size, Patient’s Weight, Ethnic Group, Smoking Status, and Pregnancy Status. If a subject is over 90 years of age, then the age must be listed as 90+.  Allergies, Patient State (this is not where they live, rather their condition), Pre-Medication, and Special Needs are defined by the DICOM standard as “clean” and are kept by eScan Academy and examined for PHI along with all tags during curation. Other patient demographics such as birthdate, address, religious affiliations, etc. are removed or emptied.

Free text

The following free text fields are removed by eScan Acadmey during the curation process: Allergies, Patient State, Study Description, Series Description, Admitting Diagnoses Description, Admitting Diagnoses Code Sequence, Derivation Description, Identifying Comments, Medical Alerts, Occupation, Additional Patient History, Patient Comments, Contrast Bolus Agent, Protocol Name, Acquisition Device Processing Description, Acquisition Comments, Acquisition Protocol Description, Contribution Description, Image Comments, Frame Comments, Reason for Study, Requested Procedure Description, Requested Contrast Agent, Study Comments, Discharge Diagnosis Description, Service Episode Description, Visit Comments, Scheduled Procedure Step Description, Performed Procedure Step Description, Comments on Performed Procedure Step, Requested Procedure Comments, Reason for Imaging Service Request, Imaging Service Request Comments, Interpretation Text, Interpretation Diagnosis Description, Impressions, and Results Comments.


The Retain Device Identity Option of the DICOM de-identification standard allows for the retention of information related to the scanner used. The option allows for the following relevant tags to be retained: Station Name, Device Serial Number, Device UID, Plate ID, Generator ID, Cassette ID, Gantry ID, Detector ID, Scheduled Study Location, Scheduled Study Location AE Title, Scheduled Station AE Title, Scheduled Station Name, Scheduled Procedure Step Location, Performed Station AE Title, Performed Station Name, Performed Station Name Code Sequence, Scheduled Station Name Code Sequence, Scheduled Station Geographic Location Code Sequence, and Performed Station Geographic Location Code Sequence.  The tags listed above are retained if they are found to be free of PHI after eScan Academy curation of the submitted DICOM objects.

Private tags

When a submitting site sends DICOM data to eScan Academy all private tags are removed.

Table 1

All odd group numbered tags are deleted. Table 1 details the de-identification performed for even grouped numbered tags at the submitting site by way of a eScan Academy supplied de-identification script.