- Remove the following elements of personally identifiable information from datasets: [Assignment: organization-defined elements of personally identifiable information]; and
- Evaluate [Assignment: organization-defined frequency] for effectiveness of de-identification.
De-identification is the general term for the process of removing the association between a set of identifying data and the data subject. Many datasets contain information about individuals that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records. Datasets may also contain other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information. Personally identifiable information is removed from datasets by trained individuals when such information is not (or no longer) necessary to satisfy the requirements envisioned for the data. For example, if the dataset is only used to produce aggregate statistics, the identifiers that are not needed for producing those statistics are removed. Removing identifiers improves privacy protection since information that is removed cannot be inadvertently disclosed or improperly used. Organizations may be subject to specific de-identification definitions or methods under applicable laws, regulations, or policies. Re-identification is a residual risk with de-identified data. Re-identification attacks can vary, including combining new datasets or other improvements in data analytics. Maintaining awareness of potential attacks and evaluating for the effectiveness of the de-identification over time support the management of this residual risk.
De-identify the dataset upon collection by not collecting personally identifiable information.
Prohibit archiving of personally identifiable information elements if those elements in a dataset will not be needed after the dataset is archived.
Remove personally identifiable information elements from a dataset prior to its release if those elements in the dataset do not need to be part of the data release.
Remove, mask, encrypt, hash, or replace direct identifiers in a dataset.
Manipulate numerical data, contingency tables, and statistical findings so that no individual or organization is identifiable in the results of the analysis.
Prevent disclosure of personally identifiable information by adding non-deterministic noise to the results of mathematical operations before the results are reported.
Perform de-identification using validated algorithms and software that is validated to implement the algorithms.
Perform a motivated intruder test on the de-identified dataset to determine if the identified data remains or if the de-identified data can be re-identified.