Loris Belcastro

Advisor

Prof. Domenico Talia

Research Topic

Parallel and distributed algorithms for big data mining on cloud computing, social network and big data analysis

Research Abstract

The research is mainly focused on the development of parallel and distributed algorithms and applications, running on Cloud, that are able to deal with big data. Big data analytics, that is the process of examining large data sets containing a variety of data types to discover hidden patterns, unknown correlations, market trends, customer preferences and other useful information. To efficiently deal with big data, data analysts make large use of a newer class of technologies that includes Hadoop MapReduce and related tools such as Spark, Hive, Pig, and NoSQL databases. Algorithms and applications written according to MapReduce paradigm are automatically parallelized and can be executed on a large number of servers, making MapReduce a powerful tool for developing parallel data mining applications, with numerous advantages in terms of scalability and fault tolerance. MapReduce paradigm can be also integrated in high-level systems and services distributed as DAaaS (Data and Analytics as a Service), such as data analysis workflow systems running on Cloud. Data mining techniques are also used to analyze geotagged data gathered from social networks, with the aim of extracting knowledge from data, such as frequent trajectory patterns, regions of interest, and interesting information about people behaviors.