# When collecting numbers is not enough: a view on data analysis in Cross-CPP

The collection and integration of data coming from different sources will probably be one of the key elements of many future markets and services, and is indeed the vision buttressing all efforts being made in Cross-CPP. Still, as any analyst would tell, having data is only a necessary condition, not a sufficient one: what is also required are capabilities to analyse and manipulate those data. This realisation was the seed behind the introduction of the CPP Data Analytics Toolbox, a suit of modules designed to simplify the analysis of data, covering from basic statistical functions to complex predictive models.

Yet, could the ambition be to provide a toolbox able to solve any foreseen (and yet to be foreseen) analytics, over data whose nature will evolve and change, and satisfying the needs of services yet to be specified? This is clearly beyond the reach of any 3-years research project. Furthermore, some service providers will prefer to resort to their in-house algorithms and models, especially when these are part of their core business – to illustrate, a weather forecast company would not rely on external models to predict tomorrow’s rain. Instead, the project decided to follow a different strategy: provide basic, yet comprehensive tools that would allow service providers to fast develop prototypes and test ideas.

The Data Analytics Toolbox is based on a modular structure, with different components offering different types of analysis; yet, all of them share the same way of communicating with the user, and of retrieving data from and returning results to the system. We here start reviewing these modules, by focusing on two of them, respectively for trajectories and network analysis.

**Trajectories Analysis Component.**The concept of “trajectory analysis” is a very general one, encompassing many different analyses on data representing a spatio-temporal evolution. With the exception of buildings, all CPPs composing the Cross-CPP system will be expected to move, at some point of their life. With these concepts in mind, this component aims at providing a set of basic tools to simplify the handling and manipulation of this mathematical object. On one hand, this includes a set of functions to analyse trajectories in an individual fashion, i.e. without considering their interconnections. On the other hand, a second level deals with the analysis of multiple trajectories by taking into account the relationships between them, for instance to detect groups of similar trajectories, or the presence of causal relationships between them.

**Network analysis.**Sensors in the Cross-CPP ecosystem are organised in complex interaction structures. These structures may be physical, as for instance sensors in a car can be connected through the CAN BUS, and can therefore directly share information. Yet, such structures can also be *functional*, i.e. the result of the fact that sensors are embedded in a common context. To illustrate, two temperature sensors in two different cars can be yielding the same (or very similar) time series, provided the two cars travel along similar paths. From a mathematical point of view, such connectivity networks can be analysed by means of complex network theory, a statistical physics understanding of the classical graph theory. Complex networks have been used, for instance, to assess and reduce the vulnerability of the resulting communication patterns, or the optimisation of the spread of a new information in the system. This component provides several functions to both manage and analyse networks, like the extraction of metrics or the identification of groups of strongly connected objects.