Data sharing is often limited by privacy issues. This is very common in particular for health datasets, given the inherent sensitivity of this type of data. When sharing of the original dataset is not possible, one method that can be used is to generate a synthetic dataset, which contains as much statistical information as possible from the original dataset, but which provides data on false individuals in order to protect the confidentiality of respondents. One way to ensure that these synthetic data effectively protect respondents is to use differential confidentiality, a rigorous measure of disclosure risk.
This project is interested in how to analyze these synthetic datasets to obtain valid statistical results, as traditional methods of inference need to be modified to account for the variability added by the generation of the synthetic dataset.
Data sharing is often limited by privacy issues. This is very common in particular for health datasets, given the inherent sensitivity of this type of data. When sharing of the original dataset is not possible, one method that can be used is to generate a synthetic dataset, which contains as much statistical information as possible from the original dataset, but which provides data on false individuals in order to protect the confidentiality of respondents.
This project is interested in rigorously measuring the confidentiality protection offered by a synthetic dataset. We will carefully examine some measures proposed in the literature, to understand their guarantees and the differences and similarities between them in order to identify the measure (s) that would be the most relevant for the sharing of synthetic data.
The research project is about the suitability of laws, legal principles and general framework surrounding health-related data, including those regulating the involved medical liability, in Canada and in the European Union.
It aims to identify its weaknesses and aspires to provide regulatory solutions that are more appropriate to the realities of artificial intelligence. These solutions should better balance private and public, individual, social, commercial and health-related interests at stake. Also, this project considers a different view of the law and of our current legal systems with missing satisfactory answers.
The aim of this doctoral thesis is to develop a tool able to automatically provide organs of interest segmentation in computed tomography images using machine learning techniques.
This tool will then be used to calculate organ doses in order to establish personalized dosimetric records in medical imaging. Doses will be calculated using informations obtained from images, radiographic technique and GPU-based Monte Carlo dose calculation algorithm (GPUMCD). Automated pipelines will be implemented to process large amounts of data.
The proposed tool provides a better evaluation of population exposure to ionizing radiation caused by medical imaging procedures.
This research project aims to examine the mutational signature of ionizing radiation using single-cell sequencing techniques.
The project is using human lymphoblastoid cells donated by the Ashkenazi trio that have a well characterized genome. The cells are irradiated and sequenced to determine the mutations that are induced as a result of the exposure to the ionizing radiation.
Through biostatistical analysis of the human genomic data thus obtained, we will be able to identify the mutational signature of ionizing radiation.
The objective of this project is to extract a set of relevant data from the files produced by medical imaging devices.
The process consists of building ETL (extract-transform-load) pipelines to make the data consumable for analysis and visualization. An example of analysis consists in observing the trend in dose administered to patients according to the establishment, protocol or device used, in order to possibly identify non-standard practices.
The data extracted could also guide practice by making it possible to assess the relevance of certain studies, and thus to optimize resources in the health network.
The clinical and economic burden of prostate cancer in Canada is substantial and is rising. It has been indicated that 1 in 7 men will develop prostate cancer during their lifetime, and another 1 in 27 will die due to the prostate cancer. However, only a part of prostate cancer cases is clinically important which make the prostate cancer case discrimination crucial to avoid over-treatment. Compared to ultrasound imaging method, advanced MRI modalities have demonstrated a better diagnostic accuracy and is becoming a clinical routine examination for patients at risk of having clinically significant prostate cancer. Even though the version two of PI-RADS has been recently published to facilitate MRI modalities application in prostate cancer, they still present limitations. For instance, variability is reported in terms of inter-reader agreement and diagnosis accuracy, mainly depend on reader experience.
This project aims to find a machine learning based approach for predication and segmentation of intraprostatic lesions to better guide radiation treatment. For accomplishing this task, the most advanced MRI modalities including DTI-MRI and DWI-MRI along with the anatomical MRI modalities will be employed. From the quantitative MRI modalities several maps that enhance specific features of the lesion will be extracted. Then after, texture information of the MRI modalities and selected maps will be extracted. In this step machine learning methods will be employed for feature selection and classification purposes. Finally, the prostate cancer extension and its type are identified.
High-dose-rate (HDR) brachytherapy is a standard treatment modality to treat cancer (e.g., prostate and cervical cancer) using the ionizing radiation of a small encapsulated radioactive source. The curative aim in the clinic is to create treatment plans that maximize the dose to the tumor while minimizing the dose to normal tissues. When it comes to the treatment plan generation, manual fine tuning of an objective function is necessary to achieve optimal trade-offs between these two conflicting objectives. Therefore, the plan generation is a time-consuming iterative task for practitioners; the plan quality can be dependent on the user skills.
The purpose of the project is to implement efficient optimization algorithms on GPU that can generate thousands of alternative plans with optimal trade-offs (Pareto-optimal plans) within seconds. Using real-time plan navigation tools, the user can quickly explore the trade-offs through the set of Pareto-optimal plans and select the best plan for the patient at hand. The impact of these novel optimization approaches is quantified and compared to the standard clinical approach.
Prostate cancer is the most common form of cancer in men in Canada.
This research project aims to establish a prognosis for a patient suffering from prostate cancer as well as predict the final pathology, by predicting the presence of lymph node metastases, from a FDG PET-CT. Radiomic characteristics are defined as the process of quantitative extraction of usable high-dimensional data from medical images. These are biomarkers that are difficult to see with the naked eye, such as texture and intensity. The database is made up of 250 prostate cancer patients. After filtration, a subset of 331 radiomic characteristics was selected. The accuracy of the model is 74.5%. This corresponds to an increase in precision of 6% compared to a model trained on all the extracted characteristics.
Ultimately, the algorithm will better predict the risk of recurrent prostate cancer and help improve methods and choice of treatment.
Supervised classification allows to build predictive models based on complex data to help human decision making processes. It has undergone an impressive development in recent years, particularly thanks to neural networks and the use of big data. However, these methods are not relevant to use on databases in which only a few instances are available to build the model, and even less when these instances are described by a large number of features. This type of problem, called fat data, is recurrent in the medical field, in which the extraction of data on patients is costly, but provides a large amount of information for each one. Moreover, in the medical field, it is common to perfrom several types of analysis on the same patient : genomic, metabolomic, transcriptomic, etc. This type of database is called multi-omics.
The goal of this project is to use and develop multi-view classification algorithms relevant to the processing of multi-omic fat data.
Discover

Featured project
This research project is based on the analysis of massive data on the NOL index and other intraoperative clinical parameters used by anesthesiologists during surgery. These parameters help them make analgesic treatment decisions in a non-communicating patient under general anesthesia and in whom it is impossible to assess pain and analgesic needs by standard questionnaires performed on awake patients.
First, the objective is to interpret the values of this index in relation to the decisions made by the clinician.