Advisor
Research Topic
Analysis of machine learning and data mining techniques on data stored in distributed environments
Research Abstract
The main aim of my research activity is the definition and implementation of a framework, built on a tiered architecture that implements data mining tools, harnessing the power of modern distributed architectures. Within this scope, I have been identified a topic of particular interest that includes the analysis of huge data streams in real time. The classical techniques of data mining are not well suited to the analysis of data generated in a continuous and uninterrupted way. The requirements in terms of time constraints make traditional solutions poorly effective. Another objective of the research is the definition of algorithms that support the analysis of incomplete or non-homogeneous data sets. That is achieved by combining streams coming from different data sources. In this case, a single algorithm is not well suited to achieve good results, but it is more advantageous the combination of different algorithms trained on different subsample of data.