It is often difficult, even sometimes impossible, to share denominalized data between organisations and researchers due to ethical constraints regarding participant confidentiality. Synthetic datasets could facilitate data sharing. However, many current methods, which use multiple imputation (MI) techniques for missing data, lower the analysis potential and the quality of the results.
This project therefore aims to assess the confidentialy guarantees of a promising new data synthesis method. This method adds a data masking step to a multiple imputation technique to generate synthetic data based on the risk of each observation. In particular, attribute disclosure risks, which refer to the disclosure of certain attributes based on other, known ones, will be tested.
The feasibility and quality of the results will be tesed on a dataset provided by l’Institut de la statistique du Québec.