![]() |
![]() |
All data analysis and visualization have been performed on a Intel CPU based PC running Ubuntu Linux 8.04. At the very beginning, we decided to look at call histograms based on callers and receivers identifiers.
The open source statistical software package called “R” was used to compute and plot the histograms.
These early investigations clearly showed a significant shift in the network structure after the seventh day. Figure 1 shows the histograms for day 7 and day 8. We could not include histograms for each of the 10 days here but it was observed that the plots for day 1 through day 7 had very similar structures whereas those for day 08 through 10 showed dramatic differences. Figure 1(a) and 1(b) show the “FROM” and “TO” histograms respectively for day 7. There does not appear to be any predominant callers . The “TO” histogram, however, clearly identifies a set of predominant receivers – in this case #1 and 5. Very similar trends are noticed for all of the previous days.
Figure 1(c) and 1(d) show the caller and receiver histograms respectively for day 8. There are two clear predominant receivers with identifiers 306 and 309 as is obvious from Figure 1(d). This characteristic was observed for all the 10 days, indicating that a large number of callers call just a handful of receivers.
(a) |
(b) |
(c) |
(d) |
Figure 1: Histogram of caller identifiers (labeled “FROM”) and receiver identifiers (labeled “TO”) for Day 7 and Day 8.
Click on any image to see the full resolution version.
Next, we rendered undirected network graphs using “Graphviz”. We also built a variety of filters to select subgraphs according to a set of filtering criteria.
To test the hypothesis that Ferdinando Catalano had ID 200, we extracted the subgraph of all the callers who are linked to identifier 200 directly or indirectly. Figure 2 shows the graph for day one.
Figure 2: Cell call network starting with identifier 200 for day 01
These graphs show that during the first seven days, the identifier 200 directly calls or receives calls mostly from identifiers 1, 2, 3, and 5, and makes one call to ID 137 on day 2. In turn, Ids 1, 2, 3, and 5 call larger number of other people. This pattern makes us conclude with a high level of confidence that identifier 200 is Ferdinando Catalano, since the head of an organization usually communicates to a handful of immediate subordinates who in turn communicate with larger number of lower ranked members. We believe IDs 1, 2, 3, and 5 belong to David Vidro, Juan Vidro, Jorge Vidro, and Estaban Catalano though we could not associate these identifiers to specific names yet. We were also quite surprised to find that Ferdinando Catalano had no communication at all with anyone on Day 8.
To get further insight, a custom interactive visualization tool called ICAVE (Interactive Call Analysis and Visualization Environment) was developed that integrated and presented all the fields in the dataset for unified visual analysis and comprehension. Figure 3 shows a screenshot of ICAVE.
View an animated demonstration of ICAVE.
Figure 3: Screenshot of ICAVE
It is a 3D interactive visualization tool that represents each call as a semi-transparent icon composed of a green circle embedded in a pink square. The are of the circle is proportional to the duration of the call. Because of perspective issues, it was important to have the the pink squares (of equal area) provide visual references for maximum possible call duration. The “floor” is divided into a grid with callers along x–axis and receivers along y–axis. An icon is placed over a grid (x, y) if a call was made by “x” to “y”. The height of the icon above the floor corresponds to the time of call. The icon footprints on the floor are marked as dark boxes. The map of tower locations is shown on the left wall, and semi-transparent yellow lines join each icon to the call origination tower location on this wall. To remedy clutter issues, bold red lines are drawn from an icon to the floor, the tower location, and the time wall when an icon is clicked. These help in identifying the caller, the receiver, the time, and the origination tower for the call. Various filtering capabilities are built into the tool. For example, Figure 3 shows all the calls involving identifier 200 for the second day. The icon corresponding to the call made my identifier 200 to 5 was clicked resulting in the red lines.
Based on the assumption that Ferdinando Catalano would communicate with Estaban Catalano most often, it was easy to identify ID 5 as Estaban since he makes the most calls to #200 (seven calls) over the 10 days and receives the most calls from #200 (7) too.
With ICAVE, it is was easy to see that among all the IDs Ferdinando calls directly (1, 2, 3, 5, 97 and 137), #1 received most calls from callers with the widest coverage of cell towers. Therefore, #1 is most likely to be David Vidro since he coordinates high level Paraiso activities. So #2 and #3 are likely to be Juan Vidro and Jorge Vidro but it was not possible to positively determine who has ID 2 (or ID 3) based on this dataset alone.
One of the most significant changes to the network after day 7 was that the dominant receives changed from 1 and 5 to 306 and 309. However, they never communicated during the first seven days. ICAVE also showed right away that 306 and 309 had extremely few common callers. ID 306 mostly received calls from towers 1, 12, 15, 17, 18, 19, 22 and 28 whereas 309 mostly received calls from towers 11 13, 22, 28, 29. Thus 306 and 309 had somewhat divided jurisdictions. They tend to work quite independently since there was only one call among themselves on day 8. The high level authority of 306 is confirmed by the fact that he is removed by one level from Ferdinando Catalano through ID 97 as 97 called 306. 306 and 309 are also located in different physical location as e 306 make most calls from towers 12, 28 and 29 whereas 309 calls from towers 11 and 22.
In contrast, the key high level player IDs 1 and 2 operate mostly from areas of towers 11 and 29; ID 3 operates from areas of towers 30 (mostly) and 10; and ID 5 operates from the area of tower 30.