In this task, we try to explore the opportunities and challenges introduced by the growing abundance of digital data captured during the delivery of health care in this project. In particular, we have been studying to realize a personalized risk prediction for diabetic AI patients.
We have proposed a novel prediction approach that seamlessly integrates deep neural network with collaborative filtering to realize a personalized health risk prediction model. Deep neural network is employed in our approach because of its advanced capability on integrated feature learning and handling complex and multi-modality data. The proposed approach utilizes multiple layers of many hidden neurons of deep neural network to learn data representations or features with multiple levels of abstraction. By using the backpropagation algorithm, deep learnin can generate features that are more sophisticated and difficult to elaborate in human descriptive means from the complex medical data set.
To enable deep learning to better interface with healthcare/ medical data, we employed collaborative filtering (CF) as an assistance. CF has been widely used to predict the person’s preferences based on other similar persons’ preferences. CF-based technologies can handle very sparse dataset pretty well, and they are also capable of utilizing and processing unstructured data. Inspired by these advantages of CF and by the analogy between predicting users’ preference to predicting patients’ health risk, we use CF techniques to assist the health risk prediction. To overcome the data sparsity problem which conventional CF-based methods may suffer, auxiliary information such as the info stored in the unstructured data (patient’s demographic data, social economic data) will be utilized. This information will help us uncover unexpected relationships. Detail of this work can be found in our paper:
As our data collection is in the initial stage, currently we do not have statistically enough data for analyzing. Therefore, we have to turn to public available data to test our methodology. We will continue to evaluate and improve the proposed algorithm based on more complex and larger-scale data that will be generated from this project. We also plan to add more context dimensions (such as time) to the algorithm.