Snowball Sampling

0 0
  • 0 Collaborators

In Snowball Sampling, a random sample of individuals is drawn from a given finite population. ...learn more

HPC

Overview / Usage

When do we say that a graph has become large or the amount of data in the graph has become big? Why do we sample a graph? When do we say that a graph is sampled? What should be the size of our sampled graph? What difference does it make to work on the original graph and the sampled graph? These are some questions that are very common when people start working on real world of graphs that often span hundreds of millions or even billions of nodes and interactions between them. By the thumb of rule, we can say that 'large graphs' are those graphs exploration of which requires long computation time and 'big data' is typically the data which takes too much memory space to be stored on a single hard drive.
Why do we need to sample the original graph? First and the foremost reason is that the sheer size of many networks makes it computationally infeasible to study the entire network. Moreover, the size of the network may not be as large but the measurements required to observe the underlying network are costly. Thus, network sampling is at the heart and foundation of our study to understand network structure. A good sampled graph must include useful knowledge. Our primary goal is to find the important properties that effectively summarizes the graph.

Graphs are used to represent real life situations where entities of internet are related to each other. In such situations, entities can be represented as nodes, and the relationship between them can be represented as edges. Graph Modelling of real life situations results into into networks. Thus, there are transport networks, road networks, biological networks, technology networks etc. Analysis and importance of these networks has given rise to a recent discipline of network science.
Analysis of networks that are large and dense is a challenging task because of the associated computational expense. Focusing on smaller and dense areas of network is often preferred due to two reasons :
\newline
(i) reduced computation in terms of both time and memory
(ii) better insights.

Comments (0)