TCS-InnovationLab,Delhi-iNePe–MC2

VAST 2009 Challenge
Challenge 2 - Social Network and Geospatial

Authors and Affiliations:

Lipika Dey, TCS Innovation Labs, Delhi, India, lipika.dey@tcs.com

Nithi, TCS Innovation Labs, Delhi, India, nithi.1@tcs.com

Tool(s):

This tool has been developed in-house over last six months at Innovation Labs, Delhi, India. The primary aim of this tool is provide a powerful, yet intuitive, visual query interface to aid investigative analysis. It is written in Java, connects to Oracle Database at the backend and uses YEd and JFreeChart for drawing graphs. The power of the tool lies in its smart interface using which a user can specify arbitrarily complex constraints on properties and relationships for entities to be retrieved from the database. The result can be viewed as graphs or tables. Graph drawing specifications can be given for arbitrary depths, with each depth having its own constraints. Users have the option of progressively tightening or relaxing the constraints. Constraints can be generic or type-specific depending on the type of entities chosen. The tool supports various generic entities and can be extended with new entity types. Unlike traditional visualization tools which rely on users to discover patterns after visual layout, this tool is primarily designed as an investigator’s aid to verify patterns and hypotheses. This greatly reduces investigation time. Investigators can use past knowledge to identify a small seed set of highly probable targets which is thereafter expanded in a focused manner. The tool is ideally suited for social network analysis where the number of interesting entities and links are very low compared to the total number of entities and links in the data. The tool was primarily developed by Ms. Nithi with inputs from Dr. Lipika Dey.

 

Video:

 iNePe.wmv

 

 

ANSWERS:


MC2.1: Which of the two social structures, A or B, most closely match the scenario you have identified in the data?

A


MC2.2:  Provide the social network structure you have identified as a tab delimitated file. It should contain the employee, one or more handler, any middle folks, and the localized leader with their international contacts. What are the Flitter names of the persons involved? Please identify only key connections (not all single links for example) as well as any other nodes related to the scenario (if any) you may have discovered that were not described in the two scenarios A and B above. 

Flitter.txt


MC2.3:  Characterize the difference between your social network and the closest social structure you selected (A or B). If you include extra nodes please explain how they fit in to your scenario or analysis. 

The network consisting of @schaffter (employee), @reitenspies(handler), @pettersson(handler), @kushnir(handler), @good (Boris), @szemerdi(Fearless Leader) satisfy all structural conditions and attributes given in A with respect to (i) number of Flitter contacts, (ii) constraints on links and (iii) having international contacts.

Our analysis was query-driven. Since our tool supports formulation of arbitrarily complex queries, we started by converting the problem statement in A to a series of queries that retrieved entities whose relationships could be verified visually.

Step 1: We formulated the first query to find all group in the network that satisfy the following conditions given in A

1.     All links are defined between two people

2.     There are four levels in the graph with links at each level constrained as mentioned below.

3.     The first level links are defined between nodes representing employees and handlers. The employee nodes are constrained to have minimum 30 Flitter contacts and maximum 50 Flitter contacts. The handler nodes are also constrained to have 30 to 40 contacts each.

4.     The second level links are defined between handlers and middlemen. The middlemen are constrained to have at most 6 contacts.

5.     The third level links are defined between middlemen and Fearless Leader. The Fearless leaders are constrained to have at least 100 Flitter contacts, with at least one International contact.

6.     The fourth level links are between the Fearless leader and his International contacts.

7.     Additional constraints are imposed on paths connecting nodes at different levels

a)     There should be at least three distinct paths with no overlapping links between an employee and a middleman.

first.png

Fig1 : Screenshot – Defining links and its constraints. Two people are linked with a FRIEND_OF relation if these two have direct Flitter contact. A person is related to his native city by the relation LOCATED_AT. 

 

 

Picture2.png

Fig2: Screenshot – Adding constraints on entities

The above query yielded the following nodes at various depths.

Possible Employee

Possible Handlers

Possible Boris

Possible Fearless Leader

Possible international contacts

@schaffter

@reitenspies

@good

@szemeredi

@nakhaeizadeh

@heyderhoff

@tolbert

@bolotov

@streng

@pettersson

@decker

@reed

@avouris

@wenocur

@kodama

@kushnir

@wotawa

@hogstedt

@chandru

@barvinok

@lafouge

@formenti

@rosch

@cornell

@konakovsky

@anderssen

@tusera

@lyonns

@haase

@lonning

@gates

@agnew

@crockett

@anily

@kechadi

@krintz

@geibel

@grabe

@singh

@baik

@konakovsky

 

Fig 3:  Auto layout of the two groups with structure described in part A after re-coloring handlers to green.

Query 2: Now we issued a query to expand the networks of the handlers to visually analyze and verify whether the condition that no two handlers are supposed to interact with each other directly, is satisfied by the resulting sub graphs or not.  It was found that both the sets satisfy the condition.

Query 3: Expand the graphs of possible Boris to find the other member of the criminal organization.

The other contacts of possible middleman Boris are given to be part of the organization.

Fig 4: Auto Layout of the possible networks formed by adding the contacts of handlers and the middleman with employee dragged to the one end.

However, we find that in the second graph (Network 2), one possible handler (@lonning) has direct contact with the possible leader (@cornell) which either contradicts the existence  of middleman – middleman exists to avoid the direct interaction of handlers with the leader or suggests that possible middleman is unaware that @cornell is LEADER.

 

Query 5. We now try to find the entire network. For this we issue a query to expand the network to include all people who have contacts with at least two of the identified persons excluding international contacts are added to the graph (colored red). The final graph is shown in Fig 4

Fig 5: Final networks including international ties.

The second network is not supported by geographical locations also. We therefore eliminate the second network

We now try to find a network similar to part B.

Query 5. A query was issued to find all groups in the network satisfying the following constraints

1.     All links are defined between two people

2.     There are four levels in the graph with links at each level constrained as mentioned below.

3.     The first level links are defined between nodes representing employees and handlers. The employee nodes are constrained to have minimum 30 Flitter contacts and maximum 50 Flitter contacts. The handler nodes are also constrained to have 30 to 40 contacts each.

4.     The second level links are defined between handlers and middlemen. The middlemen are constrained to have at most 4 contacts.

5.     The third level links are defined between middlemen and Fearless Leader. The Fearless leaders are constrained to have at least 100 Flitter contacts, with at least one International contact.

6.     The fourth level links are between the Fearless leader and his International contacts.

7.     Additional constraints are imposed on links connecting nodes at different levels

There should be at least three distinct handlers for each employee with each handler linked to at least one middleman, which in turn are all linked to the same leader.

 

The result of the query includes all groups with structure satisfying Part B. The query returned an empty set.

 

Query 6. We now relax Query 5. The maximum number of contacts for the middleman in Query 5 was then relaxed to be at most 5. This query gave us the following nodes at various levels.

 

Possible employee

Possible handler

Possible middleman

Possible leader

@ramiller

@abdennadher

 

@thery

@jantke

 

@desbois

@crescenzi

@khachiyan

@nerbonne

@klimov

Other result was same as Network 1 of Part A.

 

Query 7. This query was issued to add the contacts of handlers to the graph which was visually analyzed to verify whether the handlers interact with each other.

We found that the handlers did not interact with each other.

 

The network described in the table was discarded as the employee did not belong to Flovania.


MC2.4:  How is your hypothesis about the social structure in Part 1 supported by the city locations of Flovania? What part(s), if any, did the role of geographical information play in the social network of part one?

Fig 6: Locations of employee -  @schaffter, handlers - @reitenspies, @kushnir, @pettersson, middleman - @good and Fearless leader -@szemeredi in the first network

First network - Employee and all his handlers are in the same large city Prounov which is important for the smooth flow of information. The middleman is in a nearly medium city (Kannvic). Moreover, the LEADER is in a medium city (Kouvnic) close to location of middleman. The network appears like an information chain.

Fig 7: Locations of employee - @lafouge, handlers - @krintz, @formenti, @lonning , middleman- @rosch and Fearless leader - @cornell in the second network

Second network - Employee is in city Prounov whereas all the 3 handlers and the middleman are in Koul. Different locations of employee and his handlers does not favor information flow. The possible leader is located in a small city far from the location of middleman. The network does not form an information chain and is hence discarded.

Conclusion: First network is the targeted network.


MC2.5:  In general, how are the Flitter users dispersed throughout the cities of this challenge? Which of the surrounding countries may have ties to this criminal operation?  Why might some be of more significant concern than others?

Fig 8: Map with color intensity of the city indicating the number of Flitter users (left) and number of members of criminal organization (right) in the city

The darkest color of Koul in the first map indicates that Koul has maximum number of Flitter users (1998). Among the small and medium cities, Kouvnic (city of LEADER) has maximum number of users. Dark color of Posana amongst neighboring countries in the second map shows that criminal operation has maximum ties to Posana which validates the location of the LEADER - Kouvnic.