In mathematics, graph theory exists to study graphs, but what exactly is a graph?
Several deffinitions exist:
Graphs are mathematical structures used to study pairwise relationships between objects and entities.
A graph is a structure with a set of objects, these objects can be related to other objects in pairs. The objects correspond to mathematical concepts called vertices, and the connected or related pairs of vertices are called edges.
A graph can also be thought of as: A binary relation (=edges) between elements of a set (=vertices).
Vertices are also often called nodes or points.
Edges can be called links or lines.
Graphs use the terms vertex and edge commonly, while networks usually use node and link.
Examples:
Nodes = vertices (ex: vertices = {A, B, C, D, E})
Edges = links (ex: edges = {AB}, {AC}, {B,C}, {C,E})
Suppose you booked an Uber cab. One of the most important things that is critical to Uber’s functioning is its ability to match drivers with riders in an efficient way. Consider there are 6 possible rides that you can be matched with. So, how does Uber allocate a ride to you? We can make use of graphs to visualize how the process of allotting a ride might be.
Which Ride (driver) for the rider makes the most sense?
History:
Can you take a walk through the town, visting each part of the town and crossing each bridge only once?
Rules: 1. No doubling back. 2. You can only cross a bridge one time. 3. You end at the same place you started.
Helpful terms:
Simple path vs Euler path
A path in a graph represents a way to get from an origin to a destination node by traversing the edges of the graph.
Simple path: a route around a graph that visits every vertex one is called a simple path.
Euler path: a route around a graph that visits every edge once is called an Euler path.
You can tell which graphs have a Euler path by counting how many vertices have an odd degree. The number of vertices of odd degree must be either zero or two, if not it is not a Euler path.
Two vertices are adjacent if they share a common edge (an edge joins the vertices).
The degree of a vertex is the total number of vertices that are adjacent to the vertex.
Before computers graph theory could become quite complex depending on the application.
Recently the use of graphs and graph theory have become increasingly popular due to advancements in computing power and applications in diverse fields.
Graphs provide a better way of dealing with abstract concepts like relationships and interactions. They also offer an intuitively visual way of thinking about these concepts. Graphs also form a natural basis for analyzing relationships in a Social context
Graph Databases have become common computational tools and alternatives to SQL (structured query language) and NoSQL databases
Graphs are used to model analytics workflows in the form of DAGs (Directed acyclic graphs)
Some Neural Network Frameworks also use DAGs to model the various operations in different layers
Graph Theory concepts are used to study and model Social Networks, Fraud patterns, Power consumption patterns, Virality and Influence in Social Media. Social Network Analysis (SNA) is probably the best known application of Graph Theory for Data Science
It is used in Clustering algorithms – Specifically K-Means
System Dynamics also uses some Graph Theory concepts – Specifically loops
Path Optimization is a subset of the Optimization problem that also uses Graph concepts
From a Computer Science perspective – Graphs offer computational efficiency. The Big O complexity for some algorithms is better for data arranged in the form of Graphs (compared to tabular data)
Source: https://www.analyticsvidhya.com/blog/2018/09/introduction-graph-theory-applications-python/
Edges = lines or links, edges can have directions, signs, weights, functional equations, or locations.
Vertices = nodes or points, can have labels, weights, or locations.
Directed Graph = a graph that is made up of a set of vertices connected by edges, where the edges have a direction associated with them
Undirected Graph = a graph with a set of vertices or nodes that are connected, where all the edges are bidirectional, sometimes called an undirected network
Average path length = average number of steps along the shortest paths for all possible pairs of network nodes, measure of efficiency of information or mass transport on a network
Breadth first search (BFS) = algorithm that traverses a graph data structure, starts on a node and explores all neighbor nodes before moving to nodes at the next depth level
Depth first search (DFS) = similar to BFS, starts at a node on a graph data structure, searches as far as possible along each branch before backtracking
BFS and DFS are algorithms to search for nodes in a graph
Centrality = identify the most important vertices (nodes) within a graph, importance needs to be defined prior
Degree centrality = simple centrality measure that counts how many neighbors a node has
Closeness centrality = measure of centrality in a network, calculated as the sum of the length of the shortest paths between the node and all other nodes in the graph
Betweenness centrality = measure of centrality based on shortest paths
Isolation = an isolated vertex in a graph with, if a graph has an isolated vertex then the graph is disconnected
Adjacency/Incident = vertices are called adjacent if they connected by an edge, two edges are called incident if they share a vertex
Network density = a measure of how many edges a graph has
Isomorphic graph/s = two graphs which contain the same number of graph vertices connected in the same way are said to be isomorphic
Bipartite graph = special kind of graph with the following properties:
The embbedded graph is an example of a bipartite graph because:
• Marketing Analytics – Graphs can be used to figure out the most influential people in a Social Network. Advertisers and Marketers can estimate the biggest bang for the marketing buck by routing their message through the most influential people in a Social Network
• Banking Transactions – Graphs can be used to find unusual patterns helping in mitigating Fraudulent transactions. There have been examples where Terrorist activity has been detected by analyzing the flow of money across interconnected Banking networks
• Supply Chain – Graphs help in identifying optimum routes for your delivery trucks and in identifying locations for warehouses and delivery centers
• Pharma – Pharma companies can optimize the routes of the salesman using Graph theory. This helps in cutting costs and reducing the travel time for salesman
• Telecom – Telecom companies typically use Graphs (Voronoi diagrams) to understand the quantity and location of Cell towers to ensure maximum coverage
• Ecological flow networks using graph theory to real seqeuntial chains in which energy passes from producers to consumers in complex food webs. -Allesina, S., Bodini, A., & Bondavalli, C. (2005). Ecological subsystems via graph theory: the role of strongly connected components. Oikos, 110(1), 164-176.
• Landscape connectivity using focal-species analysis to demonstrate the utility of a mathematical graph as an ecological construct with respect to habitat connectivity. -Bunn, A. G., Urban, D. L., & Keitt, T. H. (2000). Landscape connectivity: a conservation application of graph theory. Journal of environmental management, 59(4), 265-278.
• Network neuroscience being used with graph theory methods to aid in the detection of network communities or modules, and the identification of central network elements that facilitate communication and signal transfer in brains. -Sporns, O. (2018). Graph theory methods: applications in brain networks. Dialogues in clinical neuroscience, 20(2), 111.
Some growing applications include: computer science, linguistics, physics, chemistry, social sciences, biology/ecology, medical sciences, and mathematics
IGraph - a collection of network analysis tools with the emphasis on efficiency, portability and ease of use
Rgraphviz(requires graphiz) - provides plotting capabilities for R graph objects
ggplot2 - one of the leading packages for creating graphs in R
raster - Geographic Data Analysis and Modeling Reading, writing, manipulating, analyzing and modeling of gridded spatial data
reshape2 - easily transform data between wide and long formats
ggraph (extension of ggplot) - extension of ggplot, but geared for use in graphs and networks
tidygraph - can help manipulate data used in graph and network applications, and provide tidy interfaces for common graph algorithms
networkD3 (interactive network graphs) - Creates D3 JavaScript network, tree, dendrogram, and Sankey graphs from R
visNetwork (interactive network graphs) - used for network visualization
sp - classes and methods for spatial data
dismo - species distribution modeling
NetIndices - Estimating network indices, including trophic structure of foodwebs in R
First we need to download a package to be able to make and visualize graphs
igraph is on CRAN and the igraph package can be found at: http://igraph.org/r/
V(dataset) = returns all vertices
V(dataset)[#, #:#] = shows vertices in the specified positions
V(dataset)[degree(dataset) < #] = finds vertices satisfying the specified condition
V(dataset)[nei(‘vertex name’)] = shows vertex neighbors
V(dataset)[‘vertex name’, ‘vertex name’] = selects given vertices
E(dataset) = returns all edges
E(dataset)[vertexset %–% vertexset] = shows all edges between two vertex sets
#install.packages c ("igraph", "reshape2", "ggplot2", "raster")
library(igraph) #to start we will just explore the igraph package
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
First lets create a basic network and visualize it to understand the vertices and edges
For the igraph demonstration we will first create our own data
graph1 <- graph (edges = c(1,2, 2,3, 3,4, 4,5, 1,5), n = 5) #Assign edges and number of vertices
plot(graph1) #Plot the network
Now lets take our first graph and add more edges
graph1 <- graph(edges = c(1,2, 2,3, 3,4, 4,5, 1,5, 2,4, 3,5, 5,2, 1,4, 2,4, 3,1), n =5)
plot(graph1)
Is this network directed or undirected? Why?
Now lets make a simple network that is undirected
graph2 <- graph (edges=c(1,2, 2,3, 3,1), n = 3, directed = F)
plot(graph2)
The code below generates an undirected graph with three edges. The numbers are interpreted as vertex IDs, so the edges are 1–>2, 2–>3, 3–>1.
By default if we do not specify the network as directed or undirected igraph assumes directed
What if we are working with a complex network and wanted to know what the edges were?
We can call on the assigned network and it will report the edges
class(graph2)
## [1] "igraph"
graph2 #We can call on the assigned graph/network to see what the edges are
## IGRAPH 95b6fc8 U--- 3 3 --
## + edges from 95b6fc8:
## [1] 1--2 2--3 1--3
E(graph2)
## + 3/3 edges from 95b6fc8:
## [1] 1--2 2--3 1--3
Now lets create and visualize a directed network and make it have 15 vertices, but assign the edges between vertices as directed and not connect all the vertices
graph3 <- graph(edges = c(1,2, 2,3, 3,1, 5,6, 6,9, 9,13, 13,2, 5,1), n = 15, directed = T)
plot(graph3)
graph3
## IGRAPH 95c6901 D--- 15 8 --
## + edges from 95c6901:
## [1] 1-> 2 2-> 3 3-> 1 5-> 6 6-> 9 9->13 13-> 2 5-> 1
Like the above graph we created, we can specify what vertices we want to be isolated by providing a list of the isolated vetices
graph4 <- graph( c("1", "2", "2", "3", "2", "3", "1", "1"),
isolates=c("4", "5", "6", "7") )
# In named graphs we can specify isolates by providing a list of their names.
plot(graph4, edge.arrow.size=.5, vertex.color="gold", vertex.size=15, #Plotting asthetics
vertex.frame.color="gray", vertex.label.color="black",
vertex.label.cex=0.8, vertex.label.dist=2, edge.curved=0.2)
We can also assign names to vertices instead of just numbers, this could be useful for unique identifiers in a data set
Bridgegraph <- graph( c("Bridge1","Bridge2", "Bridge3","Bridge4", "Bridge2","Bridge3", "Bridge1", "Bridge4","Bridge5", "Bridge3","Bridge5", "Bridge5","Bridge7")) # named vertices
## Warning in matrix(edges, ncol = 2, byrow = TRUE): data length [13] is not a
## sub-multiple or multiple of the number of rows [7]
# When the edge list has vertex names, the number of edges/nodes is not needed
plot(Bridgegraph)
Bridgegraph
## IGRAPH 95de9a4 DN-- 6 7 --
## + attr: name (v/c)
## + edges from 95de9a4 (vertex names):
## [1] Bridge1->Bridge2 Bridge3->Bridge4 Bridge2->Bridge3 Bridge1->Bridge4
## [5] Bridge5->Bridge3 Bridge5->Bridge5 Bridge7->Bridge1
#Directly accessing edges and vertices
E(Bridgegraph) #The edges of the object
## + 7/7 edges from 95de9a4 (vertex names):
## [1] Bridge1->Bridge2 Bridge3->Bridge4 Bridge2->Bridge3 Bridge1->Bridge4
## [5] Bridge5->Bridge3 Bridge5->Bridge5 Bridge7->Bridge1
V(Bridgegraph) #The vertices of the object
## + 6/6 vertices, named, from 95de9a4:
## [1] Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 Bridge7
We can also create graphs using a dataframe or an adjacency matrix, lets start with a dataframe
#we're going to randomly generate a graph - first specify the names of the nodes and the number of edges we want
node_names = c("A", "B", "C", "D", "E", "F")
n_edges = 8
#to make graph, we're going to create 'n_edges' edges between the nodes
all_possible_edges_df = t(combn(node_names, m=2)) #find all possible combinations of the nodes
#create a graph using a data frame
#-------
set.seed(746)
edges_df = data.frame(all_possible_edges_df[sample(1:nrow(all_possible_edges_df), n_edges),], stringsAsFactors = FALSE) #randomly select 'n_edges' of these rows
edges_df
## X1 X2
## 1 C F
## 2 A C
## 3 D E
## 4 E F
## 5 C D
## 6 A B
## 7 B D
## 8 D F
dfgraph = graph_from_data_frame(edges_df, directed = FALSE, vertices = node_names)
plot(dfgraph)
Now lets use an adjacency matrix
#create a graph using an adjacency matrix
#transform the data frame to be in the form of an adjacency matrix
edges_mat = matrix(nrow = length(node_names), ncol = length(node_names), data=0, dimnames = list(node_names, node_names))
edges_mat[as.matrix(edges_df)] = 1 #put 1s in the matrix to represent which edges are connected (0 is disconnected, 1 is connected)
edges_mat
## A B C D E F
## A 0 1 1 0 0 0
## B 0 0 0 1 0 0
## C 0 0 0 1 0 1
## D 0 0 0 0 1 1
## E 0 0 0 0 0 1
## F 0 0 0 0 0 0
Agraph = graph_from_adjacency_matrix(edges_mat, mode=c("undirected"))
plot(Agraph)
Now lets compare the two graphs and see if they are isomorphic
Isomorphic graphs are: Two graphs which contain the same number of graph vertices connected in the same way are said to be isomorphic.
#compare the two graphs
isomorphic(dfgraph, Agraph)
## [1] TRUE
In the next section, we will work primarily with two small example data sets. Both contain data about media organizations. One involves a network of hyperlinks and mentions among news sources. The second is a network of links between media venues and consumers. While the example data used here is small, many of the ideas behind the analyses and visualizations we will generate apply to medium and large-scale networks.
Source: “https://kateto.net/networks-r-igraph”
#Set your working directory
#setwd("C:\Users\aig02\Desktop\Graph Theory") #Set your working directory to the location of your data
# DATASET 1: edgelist
nodes <- read.csv("Dataset1-Media-Example-NODES.csv", header=T, as.is=T) #nodes/vertices
links <- read.csv("Dataset1-Media-Example-EDGES.csv", header=T, as.is=T) #edges
# Examine the data:
head(nodes)
## id media media.type type.label audience.size
## 1 s01 NY Times 1 Newspaper 20
## 2 s02 Washington Post 1 Newspaper 25
## 3 s03 Wall Street Journal 1 Newspaper 30
## 4 s04 USA Today 1 Newspaper 32
## 5 s05 LA Times 1 Newspaper 20
## 6 s06 New York Post 1 Newspaper 50
head(links)
## from to weight type
## 1 s01 s02 10 hyperlink
## 2 s01 s02 12 hyperlink
## 3 s01 s03 22 hyperlink
## 4 s01 s04 21 hyperlink
## 5 s04 s11 22 mention
## 6 s05 s15 21 mention
nrow(nodes); length(unique(nodes$id))
## [1] 17
## [1] 17
nrow(links); nrow(unique(links[,c("from", "to")]))
## [1] 52
## [1] 49
Notice that there are more links than unique from-to combinations. That means we have cases in the data where there are multiple links between the same two nodes. We will collapse all links of the same type between the same two nodes by summing their weights, using aggregate() by “from”, “to”, & “type”. We don’t use simplify() here so as not to collapse different link types.
# Collapse multiple links of the same type between the same two nodes
# by summing their weights, using aggregate() by "from", "to", & "type":
# (we don't use "simplify()" here so as not to collapse different link types)
links <- aggregate(links[,3], links[,-3], sum)
links <- links[order(links$from, links$to),]
colnames(links)[4] <- "weight"
rownames(links) <- NULL
Two-mode or bipartite graphs have two different types of actors and links that go across, but not within each type. Our second media example is a network of that kind, examining links between news sources and their consumers.
nodes2 <- read.csv("Dataset2-Media-User-Example-NODES.csv", header=T, as.is=T)
links2 <- read.csv("Dataset2-Media-User-Example-EDGES.csv", header=T, row.names=1)
# Examine the data:
head(nodes2)
## id media media.type media.name audience.size
## 1 s01 NYT 1 Newspaper 20
## 2 s02 WaPo 1 Newspaper 25
## 3 s03 WSJ 1 Newspaper 30
## 4 s04 USAT 1 Newspaper 32
## 5 s05 LATimes 1 Newspaper 20
## 6 s06 CNN 2 TV 56
head(links2)
## U01 U02 U03 U04 U05 U06 U07 U08 U09 U10 U11 U12 U13 U14 U15 U16 U17
## s01 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## s02 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
## s03 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
## s04 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
## s05 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0
## s06 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
## U18 U19 U20
## s01 0 0 0
## s02 0 0 1
## s03 0 0 0
## s04 0 0 0
## s05 0 0 0
## s06 0 0 0
#We can see that links2 is an adjacency matrix for a two-mode network:
# links2 is an adjacency matrix for a two-mode network:
links2 <- as.matrix(links2)
dim(links2)
## [1] 10 20
dim(nodes2)
## [1] 30 5
Turn the network data into igraph objects
# ------->> DATASET 1 --------
# Converting the data to an igraph object:
# The graph.data.frame function, which takes two data frames: 'd' and 'vertices'.
# 'd' describes the edges of the network - it should start with two columns
# containing the source and target node IDs for each network tie.
# 'vertices' should start with a column of node IDs.
# Any additional columns in either data frame are interpreted as attributes.
net <- graph_from_data_frame(d=links, vertices=nodes, directed=T)
# Examine the resulting object:
class(net)
## [1] "igraph"
net
## IGRAPH 961b5ee DNW- 17 49 --
## + attr: name (v/c), media (v/c), media.type (v/n), type.label
## | (v/c), audience.size (v/n), type (e/c), weight (e/n)
## + edges from 961b5ee (vertex names):
## [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09
## [8] s02->s10 s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11
## [15] s03->s12 s04->s03 s04->s06 s04->s11 s04->s12 s04->s17 s05->s01
## [22] s05->s02 s05->s09 s05->s15 s06->s06 s06->s16 s06->s17 s07->s03
## [29] s07->s08 s07->s10 s07->s14 s08->s03 s08->s07 s08->s09 s09->s10
## [36] s10->s03 s12->s06 s12->s13 s12->s14 s13->s12 s13->s17 s14->s11
## [43] s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17 s17->s04
# We can look at the nodes, edges, and their attributes:
E(net) # Edges of the "net" object
## + 49/49 edges from 961b5ee (vertex names):
## [1] s01->s02 s01->s03 s01->s04 s01->s15 s02->s01 s02->s03 s02->s09
## [8] s02->s10 s03->s01 s03->s04 s03->s05 s03->s08 s03->s10 s03->s11
## [15] s03->s12 s04->s03 s04->s06 s04->s11 s04->s12 s04->s17 s05->s01
## [22] s05->s02 s05->s09 s05->s15 s06->s06 s06->s16 s06->s17 s07->s03
## [29] s07->s08 s07->s10 s07->s14 s08->s03 s08->s07 s08->s09 s09->s10
## [36] s10->s03 s12->s06 s12->s13 s12->s14 s13->s12 s13->s17 s14->s11
## [43] s14->s13 s15->s01 s15->s04 s15->s06 s16->s06 s16->s17 s17->s04
V(net) # vertices of the "net" object
## + 17/17 vertices, named, from 961b5ee:
## [1] s01 s02 s03 s04 s05 s06 s07 s08 s09 s10 s11 s12 s13 s14 s15 s16 s17
E(net)$type # Edge attribute "type"
## [1] "hyperlink" "hyperlink" "hyperlink" "mention" "hyperlink"
## [6] "hyperlink" "hyperlink" "hyperlink" "hyperlink" "hyperlink"
## [11] "hyperlink" "hyperlink" "mention" "hyperlink" "hyperlink"
## [16] "hyperlink" "mention" "mention" "hyperlink" "mention"
## [21] "mention" "hyperlink" "hyperlink" "mention" "hyperlink"
## [26] "hyperlink" "mention" "mention" "mention" "hyperlink"
## [31] "mention" "hyperlink" "mention" "mention" "mention"
## [36] "hyperlink" "mention" "hyperlink" "mention" "hyperlink"
## [41] "mention" "mention" "mention" "hyperlink" "hyperlink"
## [46] "hyperlink" "hyperlink" "mention" "hyperlink"
V(net)$media # Vertex attribute "media"
## [1] "NY Times" "Washington Post" "Wall Street Journal"
## [4] "USA Today" "LA Times" "New York Post"
## [7] "CNN" "MSNBC" "FOX News"
## [10] "ABC" "BBC" "Yahoo News"
## [13] "Google News" "Reuters.com" "NYTimes.com"
## [16] "WashingtonPost.com" "AOL.com"
plot(net, edge.arrow.size=.4,vertex.label=NA)
# Removing loops from the graph:
net <- simplify(net, remove.multiple = F, remove.loops = T)
# If you need them, you can extract an edge list or a matrix from igraph networks.
as_edgelist(net, names=T)
## [,1] [,2]
## [1,] "s01" "s02"
## [2,] "s01" "s03"
## [3,] "s01" "s04"
## [4,] "s01" "s15"
## [5,] "s02" "s01"
## [6,] "s02" "s03"
## [7,] "s02" "s09"
## [8,] "s02" "s10"
## [9,] "s03" "s01"
## [10,] "s03" "s04"
## [11,] "s03" "s05"
## [12,] "s03" "s08"
## [13,] "s03" "s10"
## [14,] "s03" "s11"
## [15,] "s03" "s12"
## [16,] "s04" "s03"
## [17,] "s04" "s06"
## [18,] "s04" "s11"
## [19,] "s04" "s12"
## [20,] "s04" "s17"
## [21,] "s05" "s01"
## [22,] "s05" "s02"
## [23,] "s05" "s09"
## [24,] "s05" "s15"
## [25,] "s06" "s16"
## [26,] "s06" "s17"
## [27,] "s07" "s03"
## [28,] "s07" "s08"
## [29,] "s07" "s10"
## [30,] "s07" "s14"
## [31,] "s08" "s03"
## [32,] "s08" "s07"
## [33,] "s08" "s09"
## [34,] "s09" "s10"
## [35,] "s10" "s03"
## [36,] "s12" "s06"
## [37,] "s12" "s13"
## [38,] "s12" "s14"
## [39,] "s13" "s12"
## [40,] "s13" "s17"
## [41,] "s14" "s11"
## [42,] "s14" "s13"
## [43,] "s15" "s01"
## [44,] "s15" "s04"
## [45,] "s15" "s06"
## [46,] "s16" "s06"
## [47,] "s16" "s17"
## [48,] "s17" "s04"
as_adjacency_matrix(net, attr="weight")
## 17 x 17 sparse Matrix of class "dgCMatrix"
## [[ suppressing 17 column names 's01', 's02', 's03' ... ]]
##
## s01 . 22 22 21 . . . . . . . . . . 20 . .
## s02 23 . 21 . . . . . 1 5 . . . . . . .
## s03 21 . . 22 1 . . 4 . 2 1 1 . . . . .
## s04 . . 23 . . 1 . . . . 22 3 . . . . 2
## s05 1 21 . . . . . . 2 . . . . . 21 . .
## s06 . . . . . . . . . . . . . . . 21 21
## s07 . . 1 . . . . 22 . 21 . . . 4 . . .
## s08 . . 2 . . . 21 . 23 . . . . . . . .
## s09 . . . . . . . . . 21 . . . . . . .
## s10 . . 2 . . . . . . . . . . . . . .
## s11 . . . . . . . . . . . . . . . . .
## s12 . . . . . 2 . . . . . . 22 22 . . .
## s13 . . . . . . . . . . . 21 . . . . 1
## s14 . . . . . . . . . . 1 . 21 . . . .
## s15 22 . . 1 . 4 . . . . . . . . . . .
## s16 . . . . . 23 . . . . . . . . . . 21
## s17 . . . 4 . . . . . . . . . . . . .
# data frames describing nodes and edges:
as_data_frame(net, what="edges")
## from to type weight
## 1 s01 s02 hyperlink 22
## 2 s01 s03 hyperlink 22
## 3 s01 s04 hyperlink 21
## 4 s01 s15 mention 20
## 5 s02 s01 hyperlink 23
## 6 s02 s03 hyperlink 21
## 7 s02 s09 hyperlink 1
## 8 s02 s10 hyperlink 5
## 9 s03 s01 hyperlink 21
## 10 s03 s04 hyperlink 22
## 11 s03 s05 hyperlink 1
## 12 s03 s08 hyperlink 4
## 13 s03 s10 mention 2
## 14 s03 s11 hyperlink 1
## 15 s03 s12 hyperlink 1
## 16 s04 s03 hyperlink 23
## 17 s04 s06 mention 1
## 18 s04 s11 mention 22
## 19 s04 s12 hyperlink 3
## 20 s04 s17 mention 2
## 21 s05 s01 mention 1
## 22 s05 s02 hyperlink 21
## 23 s05 s09 hyperlink 2
## 24 s05 s15 mention 21
## 25 s06 s16 hyperlink 21
## 26 s06 s17 mention 21
## 27 s07 s03 mention 1
## 28 s07 s08 mention 22
## 29 s07 s10 hyperlink 21
## 30 s07 s14 mention 4
## 31 s08 s03 hyperlink 2
## 32 s08 s07 mention 21
## 33 s08 s09 mention 23
## 34 s09 s10 mention 21
## 35 s10 s03 hyperlink 2
## 36 s12 s06 mention 2
## 37 s12 s13 hyperlink 22
## 38 s12 s14 mention 22
## 39 s13 s12 hyperlink 21
## 40 s13 s17 mention 1
## 41 s14 s11 mention 1
## 42 s14 s13 mention 21
## 43 s15 s01 hyperlink 22
## 44 s15 s04 hyperlink 1
## 45 s15 s06 hyperlink 4
## 46 s16 s06 hyperlink 23
## 47 s16 s17 mention 21
## 48 s17 s04 hyperlink 4
as_data_frame(net, what="vertices")
## name media media.type type.label audience.size
## s01 s01 NY Times 1 Newspaper 20
## s02 s02 Washington Post 1 Newspaper 25
## s03 s03 Wall Street Journal 1 Newspaper 30
## s04 s04 USA Today 1 Newspaper 32
## s05 s05 LA Times 1 Newspaper 20
## s06 s06 New York Post 1 Newspaper 50
## s07 s07 CNN 2 TV 56
## s08 s08 MSNBC 2 TV 34
## s09 s09 FOX News 2 TV 60
## s10 s10 ABC 2 TV 23
## s11 s11 BBC 2 TV 34
## s12 s12 Yahoo News 3 Online 33
## s13 s13 Google News 3 Online 23
## s14 s14 Reuters.com 3 Online 12
## s15 s15 NYTimes.com 3 Online 24
## s16 s16 WashingtonPost.com 3 Online 28
## s17 s17 AOL.com 3 Online 33
Check out ?igraph.plotting for more information on igraph plotting parameters
Density = the proportion of present edges from all possible edges in the network
# Density
edge_density(net, loops=F)
## [1] 0.1764706
ecount(net)/(vcount(net)*(vcount(net)-1)) #for a directed network
## [1] 0.1764706
Reciprocity = the proportion of reciprocated ties (for a directed network)
# Reciprocity
reciprocity(net)
## [1] 0.4166667
dyad_census(net) # Mutual, asymmetric, and null node pairs
## $mut
## [1] 10
##
## $asym
## [1] 28
##
## $null
## [1] 98
2*dyad_census(net)$mut/ecount(net) # Calculating reciprocity
## [1] 0.4166667
Transitivity = - global - ratio of triangles (direction disregarded) to connected triples - local - ratio of triangles to connected triples each vertex is part of
# Transitivity
transitivity(net, type="global") # net is treated as an undirected network
## [1] 0.372549
transitivity(as.undirected(net, mode="collapse")) # same as above
## [1] 0.372549
transitivity(net, type="local")
## [1] 0.2142857 0.4000000 0.1153846 0.1944444 0.5000000 0.2666667 0.2000000
## [8] 0.1000000 0.3333333 0.3000000 0.3333333 0.2000000 0.1666667 0.1666667
## [15] 0.3000000 0.3333333 0.2000000
triad_census(net) # for directed networks
## [1] 244 241 80 13 11 27 15 22 4 1 8 4 4 3 3 0
# Triad types (per Davis & Leinhardt):
#
# 003 A, B, C, empty triad.
# 012 A->B, C
# 102 A<->B, C
# 021D A<-B->C
# 021U A->B<-C
# 021C A->B->C
# 111D A<->B<-C
# 111U A<->B->C
# 030T A->B<-C, A->C
# 030C A<-B<-C, A->C.
# 201 A<->B<->C.
# 120D A<-B->C, A<->C.
# 120U A->B<-C, A<->C.
# 120C A->B->C, A<->C.
# 210 A->B<->C, A<->C.
# 300 A<->B<->C, A<->C, completely connected.
Diameter = A network diameter is the longest geodesic distance (length of the shortest path between two nodes) in the network. In igraph, diameter() returns the distance, while get_diameter() returns the nodes along the first found path of that distance.
# Diameter (longest geodesic distance)
# Note that edge weights are used by default, unless set to NA.
diameter(net, directed=F, weights=NA)
## [1] 4
diameter(net, directed=F)
## [1] 28
diam <- get_diameter(net, directed=T)
diam
## + 7/17 vertices, named, from 961df52:
## [1] s12 s06 s17 s04 s03 s08 s07
# Note: vertex sequences asked to behave as a vector produce numeric index of nodes
class(diam)
## [1] "igraph.vs"
as.vector(diam)
## [1] 12 6 17 4 3 8 7
# Color nodes along the diameter:
vcol <- rep("gray40", vcount(net))
vcol[diam] <- "gold"
ecol <- rep("gray80", ecount(net))
ecol[E(net, path=diam)] <- "orange"
# E(net, path=diam) finds edges along a path, here 'diam'
plot(net, vertex.color=vcol, edge.color=ecol, edge.arrow.mode=0)
Node degrees = The function degree() has a mode of in for in-degree, out for out-degree, and all or total for total degree.
# Node degrees
# 'degree' has a mode of 'in' for in-degree, 'out' for out-degree,
# and 'all' or 'total' for total degree.
deg <- degree(net, mode="all")
plot(net, vertex.size=deg*3)
hist(deg, breaks=1:vcount(net)-1, main="Histogram of node degree")
# Degree distribution
deg.dist <- degree_distribution(net, cumulative=T, mode="all")
plot( x=0:max(deg), y=1-deg.dist, pch=19, cex=1.2, col="orange",
xlab="Degree", ylab="Cumulative Frequency")
Centrality functions (vertex level) and centralization functions (graph level). The centralization functions return res - vertex centrality, centralization, and theoretical_max - maximum centralization score for a graph of that size. The centrality function can run on a subset of nodes (set with the vids parameter). This is helpful for large graphs where calculating all centralities may be a resource-intensive and time-consuming task.
# Centrality & centralization
# Centrality functions (vertex level) and centralization functions (graph level).
# The centralization functions return "res" - vertex centrality, "centralization",
# and "theoretical_max" - maximum centralization score for a graph of that size.
# The centrality functions can run on a subset of nodes (set with the "vids" parameter)
# Degree (number of ties)
degree(net, mode="in")
## s01 s02 s03 s04 s05 s06 s07 s08 s09 s10 s11 s12 s13 s14 s15 s16 s17
## 4 2 6 4 1 4 1 2 3 4 3 3 2 2 2 1 4
centr_degree(net, mode="in", normalized=T)
## $res
## [1] 4 2 6 4 1 4 1 2 3 4 3 3 2 2 2 1 4
##
## $centralization
## [1] 0.1985294
##
## $theoretical_max
## [1] 272
# Closeness (centrality based on distance to others in the graph)
# Inverse of the node's average geodesic distance to others in the network
closeness(net, mode="all", weights=NA)
## s01 s02 s03 s04 s05 s06
## 0.03333333 0.03030303 0.04166667 0.03846154 0.03225806 0.03125000
## s07 s08 s09 s10 s11 s12
## 0.03030303 0.02857143 0.02564103 0.02941176 0.03225806 0.03571429
## s13 s14 s15 s16 s17
## 0.02702703 0.02941176 0.03030303 0.02222222 0.02857143
centr_clo(net, mode="all", normalized=T)
## $res
## [1] 0.5333333 0.4848485 0.6666667 0.6153846 0.5161290 0.5000000 0.4848485
## [8] 0.4571429 0.4102564 0.4705882 0.5161290 0.5714286 0.4324324 0.4705882
## [15] 0.4848485 0.3555556 0.4571429
##
## $centralization
## [1] 0.3753596
##
## $theoretical_max
## [1] 7.741935
# Eigenvector (centrality proportional to the sum of connection centralities)
# Values of the first eigenvector of the graph adjacency matrix
eigen_centrality(net, directed=T, weights=NA)
## $vector
## s01 s02 s03 s04 s05 s06 s07
## 0.6638179 0.3314674 1.0000000 0.9133129 0.3326443 0.7468249 0.1244195
## s08 s09 s10 s11 s12 s13 s14
## 0.3740317 0.3453324 0.5991652 0.7334202 0.7519086 0.3470857 0.2915055
## s15 s16 s17
## 0.3314674 0.2484270 0.7503292
##
## $value
## [1] 3.006215
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 17
##
## $options$which
## [1] "LR"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 7
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 31
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 18
centr_eigen(net, directed=T, normalized=T)
## $vector
## [1] 0.6638179 0.3314674 1.0000000 0.9133129 0.3326443 0.7468249 0.1244195
## [8] 0.3740317 0.3453324 0.5991652 0.7334202 0.7519086 0.3470857 0.2915055
## [15] 0.3314674 0.2484270 0.7503292
##
## $value
## [1] 3.006215
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 17
##
## $options$which
## [1] "LR"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 7
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 31
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 18
##
##
## $centralization
## [1] 0.5071775
##
## $theoretical_max
## [1] 16
# Betweenness (centrality based on a broker position connecting others)
# (Number of geodesics that pass through the node or the edge)
betweenness(net, directed=T, weights=NA)
## s01 s02 s03 s04 s05 s06
## 24.0000000 5.8333333 127.0000000 93.5000000 16.5000000 20.3333333
## s07 s08 s09 s10 s11 s12
## 1.8333333 19.5000000 0.8333333 15.0000000 0.0000000 33.5000000
## s13 s14 s15 s16 s17
## 20.0000000 4.0000000 5.6666667 0.0000000 58.5000000
edge_betweenness(net, directed=T, weights=NA)
## [1] 10.833333 11.333333 8.333333 9.500000 4.000000 12.500000 3.000000
## [8] 2.333333 24.000000 16.000000 31.500000 32.500000 9.500000 6.500000
## [15] 23.000000 65.333333 11.000000 6.500000 18.000000 8.666667 5.333333
## [22] 10.000000 6.000000 11.166667 15.000000 21.333333 10.000000 2.000000
## [29] 1.333333 4.500000 11.833333 16.833333 6.833333 16.833333 31.000000
## [36] 17.000000 18.000000 14.500000 7.500000 28.500000 3.000000 17.000000
## [43] 5.666667 9.666667 6.333333 1.000000 15.000000 74.500000
centr_betw(net, directed=T, normalized=T)
## $res
## [1] 24.0000000 5.8333333 127.0000000 93.5000000 16.5000000
## [6] 20.3333333 1.8333333 19.5000000 0.8333333 15.0000000
## [11] 0.0000000 33.5000000 20.0000000 4.0000000 5.6666667
## [16] 0.0000000 58.5000000
##
## $centralization
## [1] 0.4460938
##
## $theoretical_max
## [1] 3840
First load/install the required packages if neccessary:
#install.packages c ("igraph", "reshape2", "ggplot2", "raster")