SI-19: De-identification
Control Family:
Baselines:
- Low
N/A
- Moderate
N/A
- High
N/A
- Privacy
- SI-19
Control is new to this version of the control set.
Control Statement
- Remove the following elements of personally identifiable information from datasets: [Assignment: organization-defined elements of personally identifiable information]; and
- Evaluate [Assignment: organization-defined frequency] for effectiveness of de-identification.
Supplemental Guidance
De-identification is the general term for the process of removing the association between a set of identifying data and the data subject. Many datasets contain information about individuals that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records. Datasets may also contain other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information. Personally identifiable information is removed from datasets by trained individuals when such information is not (or no longer) necessary to satisfy the requirements envisioned for the data. For example, if the dataset is only used to produce aggregate statistics, the identifiers that are not needed for producing those statistics are removed. Removing identifiers improves privacy protection since information that is removed cannot be inadvertently disclosed or improperly used. Organizations may be subject to specific de-identification definitions or methods under applicable laws, regulations, or policies. Re-identification is a residual risk with de-identified data. Re-identification attacks can vary, including combining new datasets or other improvements in data analytics. Maintaining awareness of potential attacks and evaluating for the effectiveness of the de-identification over time support the management of this residual risk.
Control Enhancements
SI-19(1): Collection
Baseline(s):
De-identify the dataset upon collection by not collecting personally identifiable information.
SI-19(2): Archiving
Baseline(s):
Prohibit archiving of personally identifiable information elements if those elements in a dataset will not be needed after the dataset is archived.
SI-19(3): Release
Baseline(s):
Remove personally identifiable information elements from a dataset prior to its release if those elements in the dataset do not need to be part of the data release.
SI-19(4): Removal, Masking, Encryption, Hashing, or Replacement of Direct Identifiers
Baseline(s):
Remove, mask, encrypt, hash, or replace direct identifiers in a dataset.
SI-19(5): Statistical Disclosure Control
Baseline(s):
Manipulate numerical data, contingency tables, and statistical findings so that no individual or organization is identifiable in the results of the analysis.
SI-19(6): Differential Privacy
Baseline(s):
Prevent disclosure of personally identifiable information by adding non-deterministic noise to the results of mathematical operations before the results are reported.
SI-19(7): Validated Algorithms and Software
Baseline(s):
Perform de-identification using validated algorithms and software that is validated to implement the algorithms.
SI-19(8): Motivated Intruder
Baseline(s):
Perform a motivated intruder test on the de-identified dataset to determine if the identified data remains or if the de-identified data can be re-identified.