Emails are used for communications and therefore represent a network. Is it possible to discover structures from our emails? This post answers this question.
Looking to see how my email connection structure look like, who are the central people who connect between teams, who I'm most interact with and which communities could be discovered?
Using Python, Pandas and NetworkX as the main analytics tools, Jupyter notebook with matplotlib or D3.js as the presentation and a Python implementation of BigCLAM for community detection. For privacy reasons, all emails addresses and names were replaced by User1, User2, etc.
There are few steps involved in email discovery, each is described in more details with code examples:
The graph below is the result after applying steps 1-5. Each node represents a mail address and each edge represents a communication between one mailer to another. The graph shows all top 500 connections - mailers that have most interactions between them. The mailer color is based on its domain, the size is based on the number of connections it has, and the link width is related to the number of mails between two connected mailers. Some mailers are used more for connecting or coordinating between teams. These relations in the graph are shown by clicking on the buttons which marks the relevant nodes.
The actual node name is masked for privacy.
I'm using Outlook Exchange as my mailing system. In order to extract the mail info, I've used a Visual Basic program. The program exports the sender and recipients for each mail along with the mail header and time into a CSV file.
Visual Basic was used since outlook does not allow to export the time with the standard export.
The Visual Basic program below reads all emails for a folder. For each mail it generates the sender address, sender name, list of recipients addresses and names, time and email title. I saved the output in file "nir_mails_merged.csv"
# Show / Hide code Button
from IPython.display import HTML
HTML('''<button class="toggle">Show/Hide Code</button>''')
Sub ExportToExcel() ' Need to enable Tools --> References (in the VBE). Check Microsoft Excel XXX Object Library Dim appExcel As Excel.Application Dim wkb As Excel.Workbook Dim wks As Excel.Worksheet Dim rng As Excel.Range Dim strSheet Dim strPath As String Dim intRowCounter As Integer Dim msg As Outlook.MailItem Dim nms As Outlook.NameSpace Dim fld As Outlook.MAPIFolder Dim itm As Object Dim Header As Variant 'Select export folder Set nms = Application.GetNamespace("MAPI") Set fld = nms.PickFolder 'Handle potential errors with Select Folder dialog box. If fld Is Nothing Then MsgBox "There are no mail messages to export", vbOKOnly, "Error" Exit Sub ElseIf fld.DefaultItemType <> olMailItem Then MsgBox "There are no mail messages to export", vbOKOnly, "Error" Exit Sub ElseIf fld.Items.Count = 0 Then MsgBox "There are no mail messages to export", vbOKOnly, "Error" Exit Sub End If 'Open and activate Excel workbook. Set appExcel = CreateObject("Excel.Application") strSheet = appExcel.GetSaveAsFilename(Title:="Choose Output File", FileFilter:="Excel Files *.xls* (*.xls*),") If strSheet = False Then MsgBox "No file selected.", vbExclamation, "Sorry!" Exit Sub Else appExcel.Workbooks.Open (strSheet) Set wkb = appExcel.ActiveWorkbook Set wks = wkb.Sheets(1) wks.Activate appExcel.Application.Visible = True End If 'Add header Header = Array("From: (Address)", "From: (Name)", "To: (Address)", "To: (Name)", "CC: (Address)", "CC: (Name)", "BCC: (Address)", "BCC: (Name)", "Received", "Subject") wks.Range("A1:J1") = Header 'Copy field items in mail folder. intRowCounter = 1 For Each itm In fld.Items If TypeOf itm Is MailItem Then Set msg = itm If msg.MessageClass <> "IPM.Note.SMIME.MultipartSigned" Then intRowCounter = intRowCounter + 1 Set rng = wks.Cells(intRowCounter, 1) rng.Value = msg.SenderEmailAddress Set rng = wks.Cells(intRowCounter, 2) rng.Value = msg.SenderName Set rng = wks.Cells(intRowCounter, 3) rng.Value = getRecepiantAddress(msg, olTo) Set rng = wks.Cells(intRowCounter, 4) rng.Value = msg.To Set rng = wks.Cells(intRowCounter, 5) rng.Value = msg.CC Set rng = wks.Cells(intRowCounter, 6) rng.Value = getRecepiantAddress(msg, olCC) Set rng = wks.Cells(intRowCounter, 7) rng.Value = msg.BCC Set rng = wks.Cells(intRowCounter, 8) rng.Value = getRecepiantAddress(msg, olBCC) Set rng = wks.Cells(intRowCounter, 9) rng.Value = msg.ReceivedTime Set rng = wks.Cells(intRowCounter, 10) rng.Value = msg.Subject End If End If Next itm End Sub Function getRecepiantAddress(msg As Outlook.MailItem, rtype As OlMailRecipientType) As String Dim recip As Recipient Dim allRecips As String allRecips = "" For Each recip In msg.Recipients If recip.Type = rtype Then If (Len(allRecips) > 0) Then allRecips = allRecips & "; " allRecips = allRecips & recip.Address End If Next getRecepiantAddress = allRecips End Function
After running the Visual Basic program, I run a pre-process script to create the list of ndoes, addresses and associated ID (attribute.xls), the number of emails sent from every sender to every recipiant (email_counts.csv) and a mapping between a group ID to the group name/domain (groups.xls)
The scripts are a slightly modifed version of: http://flowingdata.com/2014/05/07/downloading-your-email-metadata/
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
import csv
import sys
import json
from datetime import datetime
import networkx as nx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcl
%matplotlib inline
File "nir_mails_merged.csv" is the output of step 1, setting.json is given below
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
{ "no_special_characters":false, "group_by_address_domain":true, "include_email_in_attributes":true, "default_group_name":"DEFAULT", "address_delimiter": ";", "split_address_by_text": "CN=RECIPIENTS/CN=", "name_delimiter": ";", "output_delimiter": ",", "csv_keys": { "from_email": "From: (Address)", "to_email": "To: (Address)", "cc_email": "CC: (Address)", "bcc_email": "BCC: (Address)", "from_name":"From: (Name)", "to_name":"To: (Name)", "cc_name":"CC: (Name)", "bcc_name":"BCC: (Name)", "date":"Received", "subject":"Subject" }, "address_string_to_ignore":"/O=CISCO SYSTEMS" }
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
# Run parameters
csv_input_file = "nir_mails_merged.csv"
# Prompts for what to include
include_cc = 'N'
include_bcc = 'N'
trigger_max_recipients = 'N'
max_recipients = 0
to_date = "10/8/2017 23:59" # mails to this date are included
date_range_days = 365 # Include one year of mails
# Array of which types of emails to include
recipient_scope = ['to']
if include_cc == 'Y':
recipient_scope.append('cc')
if include_bcc == 'Y':
recipient_scope.append('bcc')
# Settings read into variables
settings_file = open('settings.json', 'r')
settings = settings_file.read()
settings = json.loads(settings)
csv_keys = settings['csv_keys']
no_special_characters = settings['no_special_characters']
output_delimiter = settings['output_delimiter'].encode('utf_8')
name_delimiter = settings['name_delimiter'].encode('utf_8')
address_delimiter = settings['address_delimiter'].encode('utf_8')
split_address_by_text = settings['split_address_by_text'].encode('utf_8')
group_by_address_domain = settings['group_by_address_domain']
default_group_name = settings['default_group_name'].encode('utf_8')
address_string_to_ignore = settings['address_string_to_ignore'].encode('utf_8')
include_email_in_attributes = settings['include_email_in_attributes']
# Helper functions
def strip_non_ascii(string):
# Returns the string without non ASCII characters
stripped = (c for c in string if 0 < ord(c) < 127)
return ''.join(stripped)
def process_name(string):
string = string.strip().replace("'","")
if string is None or string == "":
string = "BLANK"
if no_special_characters:
string = strip_non_ascii(string)
return string
def date_diff_days(from_date, to_date):
fd = datetime.strptime(from_date, "%m/%d/%Y %H:%M")
td = datetime.strptime(to_date, "%m/%d/%Y %H:%M")
delta = td-fd
return delta.days
# Open input file
input_file = csv.DictReader(open(csv_input_file))
# Variables to store node and tie details
nodes = {}
nodes_with_ids = {}
ties = {}
# Preparing text strings
group_text = "\"sep=" + output_delimiter + "\"" + "\n"
group_text += "group_id" + output_delimiter + "group_name\n"
ties_text = "\"sep=" + output_delimiter + "\"" + "\n"
ties_text += "sender" + output_delimiter + "recipient" + output_delimiter + "count\n"
attribute_text = "" # \"sep=" + output_delimiter + "\"" + "\n"
attribute_text += "\"id\"" + output_delimiter + "\"name\"" + output_delimiter + "\"group_id\""
if include_email_in_attributes:
attribute_text += output_delimiter + "\"email\""
attribute_text += "\n"
# Read each line in the CSV file
for row in input_file:
msg_days = date_diff_days(row[csv_keys["date"]], to_date)
if msg_days>0 and msg_days<date_range_days:
# Iterate through each scope that has been included (to, cc, bcc)
for scope in recipient_scope:
# Populating nodes with recipients
recipient_addresses = []
sender_addresses = []
if row[csv_keys[scope + '_email']] is not None and row[csv_keys[scope + '_name']] is not None:
temp_recipient_addresses = row[csv_keys[scope + '_email']].upper().replace(address_delimiter, split_address_by_text).split(split_address_by_text)
recipient_names = row[csv_keys[scope + '_name']].upper().split(name_delimiter)
for temp_recipient_address in temp_recipient_addresses:
if address_string_to_ignore not in temp_recipient_address:
recipient_addresses.append(temp_recipient_address)
if len(recipient_addresses) > len(recipient_names):
recipient_addresses.pop(0)
elif len(recipient_addresses) != len(recipient_names):
print 'PROBLEM: ' + str(len(recipient_addresses)) + " ADDRESSES VS " + str(len(recipient_names)) + " NAMES"
# Check against max count and skip if more included
if trigger_max_recipients == 'N' or (trigger_max_recipients == 'Y' and len(recipient_names) <= int(max_recipients)):
for idx, recipient_address in enumerate(recipient_addresses):
nodes[recipient_address.split(address_delimiter)[0]] = recipient_names[idx]
# Populating nodes with senders
if row[csv_keys['from_email']] is not None and row[csv_keys['from_name']] is not None:
temp_sender_addresses = row[csv_keys['from_email']].upper().replace(address_delimiter, split_address_by_text).split(split_address_by_text)
sender_names = row[csv_keys['from_name']].upper().split(";")
for temp_sender_address in temp_sender_addresses:
if address_string_to_ignore not in temp_sender_address:
sender_addresses.append(temp_sender_address)
# Populating ties
for idx, sender_address in enumerate(sender_addresses):
nodes[sender_address.upper().split(address_delimiter)[0]] = sender_names[idx]
if sender_address not in ties:
ties[sender_address] = {}
for recipient_address in recipient_addresses:
recipient_address = recipient_address.upper().split(";")[0]
if recipient_address not in ties[sender_address]:
ties[sender_address][recipient_address] = 0
ties[sender_address][recipient_address] += 1
# Variables for setting ids and capture group details
node_id = 1
groups = {}
group_id = 1
nodes_listed_by_id = {}
# Structure node information and write group text
for node_email, node_name in nodes.iteritems():
nodes_with_ids[node_email] = {}
nodes_with_ids[node_email]['id'] = node_id
nodes_with_ids[node_email]['name'] = node_name
nodes_with_ids[node_email]['email'] = node_email
nodes_listed_by_id[node_id] = node_email
# Capturing group name if relevant - TODO: also if not set
if group_by_address_domain:
split_email = node_email.split("@")
if len(split_email) > 1:
group_name = split_email[1].strip()
else:
group_name = default_group_name
if group_name not in groups.keys():
groups[group_name] = group_id
group_id += 1
else:
# Note: will be set to the same over an over again
group_name = default_group_name
groups[group_name] = group_id
nodes_with_ids[node_email]['group_name'] = group_name
nodes_with_ids[node_email]['group_id'] = groups[group_name]
node_id += 1
# Write group text
for group_name, group_id in groups.iteritems():
group_text += str(group_id) + output_delimiter + "\"" + group_name + "\"\n"
# Write tie text
for sender, recipients in ties.iteritems():
for recipient, count in recipients.iteritems():
ties_text += str(nodes_with_ids[sender]['id']) + output_delimiter + str(nodes_with_ids[recipient]['id']) + output_delimiter + str(count) + "\n"
# Write attribute text
for node_id, node_email in nodes_listed_by_id.iteritems():
node_details = nodes_with_ids[node_email]
attribute_text += str(node_details['id']) + output_delimiter + "\"" + process_name(node_details['name']) + "\"" + output_delimiter
if group_by_address_domain:
attribute_text += str(node_details['group_id'])
else:
attribute_text += str(1)
if include_email_in_attributes:
attribute_text += output_delimiter + "\"" + process_name(node_details['email']) + "\""
attribute_text += "\n"
# Write output files
attribute_file = open('attributes.csv', 'w')
attribute_file.write(attribute_text)
attribute_file.close()
groups_file = open('groups.csv', 'w')
groups_file.write(group_text)
groups_file.close()
email_counts_file = open('email_counts.csv', 'w')
email_counts_file.write(ties_text)
email_counts_file.close()
# Check that names are in ascii form, if not it creates an error in the graph visualization in networkx
nodeNames_df = pd.read_csv('attributes.csv', header=None, skiprows=1, names=['ID', 'Name', 'GroupID', 'Email'])
nodeNames_df.set_index(['ID'], inplace=True)
# Check ascii encoding
def check_non_ascii_labels(node_labels):
non_ascii = False
for n in node_labels.items():
try:
txt = n[1].decode('ascii')
except:
non_ascii = True
n_pos = []
for i,c in enumerate(n[1]):
oc = ord(c)
if oc >= 128: n_pos.append(i)
print n[1], " :Non ascii Pos:", n_pos
if non_ascii == False: print "Non-ascii not found - attribute list is good to go!"
node_labels =dict(nodeNames_df['Name'])
check_non_ascii_labels(node_labels)
This step reads the email_counts.csv and attributes.csv files created in step 2 and filter top connections.
Here's how these tables look like after reading into Pandas
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
email_dff = pd.read_csv('email_counts.csv', header=None, skiprows=2, names=['n1', 'n2', 'count'])
email_dff.head()
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
nodeNames_df = pd.read_csv('attributes.csv', header=None, skiprows=1, names=['ID', 'Name', 'GroupID', 'Email'])
nodeNames_df.set_index(['ID'], inplace=True)
# Replace names that include the ? char with the email address
hnodes = nodeNames_df['Name'].str.contains("\?")
nodeNames_df.loc[hnodes,'Name'] = nodeNames_df.loc[hnodes,'Email']
# Masked user names and keep my node as "Me"
nodeNames_df['MaskedName'] = ['User'+str(i) for i in nodeNames_df.index]
nodeNames_df.loc[nodeNames_df['Email'].str.contains("NIRBD"), 'MaskedName'] = 'Me'
nodeNames_df[['GroupID', 'MaskedName']].head()
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
# Filter X top count edges
use_top_vals = True
num_top_vals = 500
doMaskNames = True
email_dff.sort_values(['count'], axis=0, ascending=False, inplace=True)
if use_top_vals:
email_df = email_dff[:num_top_vals]
This step creates and visualize the email connection graph from the selected connections
To visualize the graph, I used a custom layout where nodes are placed according to their shortest path length from each node to other nodes.
The edges width represent the email count between two nodes
The node color represent the email group/domain
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
G = nx.from_pandas_dataframe(email_df, 'n1', 'n2', edge_attr='count', create_using=nx.DiGraph())
# Filter node labels for only those who are used in the top edges
nodeNames_df = nodeNames_df.ix[list(G.nodes()),:]
def layout_dist(G):
# This controls the distance between nodes
# It allocates a shortest path distance +1, or 10 when no path exists
ret = dict()
for row, nr in enumerate(G):
rdist = dict()
for col, nc in enumerate(G):
try:
d = nx.shortest_path_length(G,source=nr,target=nc)
except:
d = 10
rdist[nc] = d + 10
ret[nr] = rdist
return ret
pos = nx.kamada_kawai_layout(G, weight='count', dist=layout_dist(G))
node_color = np.array(nodeNames_df['GroupID'])
edge_width = np.array([G[u][v]['count'] for u,v in G.edges()])
edge_width = 1.0 + 3.0 * edge_width / max(edge_width)
if doMaskNames:
node_labels = dict(nodeNames_df['MaskedName'])
else:
node_labels = dict(nodeNames_df['Name'])
# Function to draw the graph using Python and matplotlib.
# It's not used here, the graph in the bottom is based on D3.JS
def draw_graph(G, pos, node_color, edge_width, node_labels, node_size=300, label_fsize=8):
fig = plt.figure(figsize=(20,18))
nodes = nx.draw_networkx_nodes(G, pos, node_shape='o', alpha=0.4, node_color=node_color,
node_size=node_size, cmap=plt.cm.terrain)
nx.draw_networkx_labels(G, pos, labels=dict([(n, str(n)) for n in G.nodes()]), font_size=10, font_color='black')
lpos = dict(map(lambda v:(v[0], np.array([v[1][0], v[1][1]+0.02])), pos.items()))
nx.draw_networkx_labels(G, lpos, labels=node_labels , font_size=label_fsize, font_color='black')
nx.draw_networkx_edges(G, pos, alpha=0.4, edge_color='.4', width=edge_width, arrows=False, cmap=plt.cm.Pastel1)
# Reset graph edge color and width
node_edgecolor = ['black'] * G.number_of_nodes()
node_linewidth = [1] * G.number_of_nodes()
nodes.set_edgecolor(node_edgecolor)
nodes.set_linewidth(node_linewidth)
plt.axis('off')
plt.tight_layout();
return nodes, fig
# nodes are pathCollection when drawing nodes, node_list are label of nodes to mark
def mark_nodes(nodes, node_list, color, reset):
node_edgecolor = nodes.get_edgecolor()
node_linewidth = nodes.get_linewidth()
for i, n in enumerate(G):
if n in node_list:
node_edgecolor[i]=mcl.colorConverter.to_rgba(color)
node_linewidth[i] = 5
else:
if reset:
node_edgecolor[i] = mcl.colorConverter.to_rgba('black')
node_linewidth[i] = 1
nodes.set_edgecolor(node_edgecolor)
nodes.set_linewidth(node_linewidth)
#nodes, fig = draw_graph(G, pos, node_color, edge_width, node_labels, node_size=400, label_fsize=12)
There are different measures for node importance.
Here are marking of nodes based on betweenes centrality, nodes that appear more in the shortest path (connect other nodes)
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
# Importance by betweeness, Node that appear more in the shortest path (connect other nodes) are considered more important
#nodes, fig = draw_graph(G, pos, node_color, edge_width, node_labels, node_size=400, label_fsize=12)
ib = sorted(nx.betweenness_centrality(G, normalized=True, endpoints=True).items(), key=lambda x: x[1], reverse=True)
#mark_nodes(nodes, [n[0] for n in ib[0:10]], 'red', True)
Another measure is Hubs and Authorities.
Aauthorities are estimated based on in-coming links, Hubs based on out-going links
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
#nodes, fig = draw_graph(G, pos, node_color, edge_width, node_labels, node_size=400, label_fsize=12)
(hubs, auth) = nx.hits(G)
ih = sorted(hubs.items(), key=lambda x: x[1], reverse=True)
ia = sorted(auth.items(), key=lambda x: x[1], reverse=True)
#mark_nodes(nodes, [n[0] for n in ih[0:10]], 'blue', True)
#mark_nodes(nodes, [n[0] for n in ia[0:10]], 'red', False)
The following shows people I most interact with.
People in Red are most sending to me or receiving to me.
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
#nodes, fig = draw_graph(G, pos, node_color, edge_width, node_labels, node_size=400, label_fsize=12)
# my node
MyNodeNum = nodeNames_df[nodeNames_df['Email'].str.contains("NIRBD")].index[0]# Sending most mails to me
# The count DF already sorted so we find the top 5 senders to me
MostSendingToMe = list(email_df[email_df['n2']==MyNodeNum][:6]['n1'])
MostReceivingFromMe = list(email_df[email_df['n1']==MyNodeNum][:6]['n2'])
MostInteractingWithMe = list(MostReceivingFromMe)
MostInteractingWithMe.extend(MostSendingToMe)
MostInteractingWithMe = set(MostInteractingWithMe)
#mark_nodes(nodes, MostInteractingWithMe, 'red', True)
The following are only receving mailers, there are mostly mailing-lists. Only sending mailers are mostly ads or distributors
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
#nodes, fig = draw_graph(G, pos, node_color, edge_width, node_labels, node_size=400, label_fsize=12)
# Find nodes that only receive mails
only_receive = email_dff[~email_dff['n2'].isin(email_dff['n1'])].iloc[0:30]['n2'].unique()
#mark_nodes(nodes, only_receive, 'red', True)
# Find nodes that only send mails
only_send = email_dff[~email_dff['n1'].isin(email_dff['n2'])].iloc[0:]['n1'].unique()
#mark_nodes(nodes, only_send, 'blue', False)
The following is an implementation of the BigClam algorithm for finding communities. Each node could belong to multiple communites, the most probable community is displayed as a different color.
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
A = nx.adjacency_matrix(G, weight='count').todense()
def log_likelihood(F, A, IA):
# IA is a matrix with one where A = zero
As = F.dot(F.T)
part1 = A*np.log(1.-np.exp(-1.*As))
sum_edges = np.sum(part1)
part2 = IA*As
sum_nedges = np.sum(part2)
log_likeli = sum_edges - sum_nedges
return log_likeli
def gradient(F, A, sumF, i):
N, C = F.shape
neighbours = np.where(A[i])
sum_neigh = np.zeros((C,))
sum_nneigh = sumF.copy()
for nb in neighbours[1]:
dotproduct = F[nb].dot(F[i])
sum_neigh += A[nb,i]*F[nb]*(np.divide(np.exp(-1.*dotproduct),1.-np.exp(-1.*dotproduct)))
sum_nneigh -= F[nb]
grad = sum_neigh - sum_nneigh
return grad
def train(A, C, iterations = 100):
# initialize an F
N = A.shape[0]
np.random.seed(0)
F = np.random.rand(N,C)
rows, cols = np.where(A==0)
IA = np.zeros(np.shape(A), dtype=np.int)
IA[rows,cols] = 1
for n in range(iterations):
# Calculate sumF in advance so the gradient done on neighbours only
sumF = F.sum(axis=0)
for person in range(N):
grad = gradient(F, A, sumF, person)
F[person] += 0.005*grad
F[person] = np.maximum(0.001, F[person]) # Handle negatives
ll = log_likelihood(F, A, IA)
return F
F = train(A, 6, iterations=10001)
Display Communities in the Graph
# Show / Hide code Button
HTML('''<button class="toggle">Show/Hide Code</button>''')
def find_nodes_for_community(node_communities, c):
ret = []
# c = community number
for k,v in node_communities.items():
if c in v[1]:
ret.append(k)
return ret
# Find communities each node belongs to
node_communities = {}
for i,n in enumerate(G):
# For each graph node, include a tuple with the node index and list of communities
node_communities[n] = (i, list(np.where(F[i]>0.01)[0]))
node_colors = list(np.argmax(F,1))
node_w_community = []
max_col = np.shape(F)[1]
# put None for nodes that don't have communities, create list of nodes that belong to a community
for k,v in node_communities.items():
if len(v[1]) == 0: # Node doesn't belong to a community, set it to
node_colors[v[0]] = max_col
elif len(v[1])>1 or (v[1][0] != 0): # don't include community 0
node_w_community.append(k)
# Draw the subgraph with associated communities
nodeset = find_nodes_for_community(node_communities, 3)
# show only nodes with community
GC = G.subgraph(node_w_community)
cpos = nx.spring_layout(GC)
cnode_color = [node_colors[node_communities[n][0]] for n in GC.nodes()]
cedge_width = np.array([GC[u][v]['count'] for u,v in GC.edges()])
cedge_width = 1.0 + 3.0 * cedge_width / max(cedge_width)
if doMaskNames:
cnode_labels = dict(nodeNames_df['MaskedName'].loc[node_w_community])
else:
cnode_labels = dict(nodeNames_df['Name'].loc[node_w_community])
#cnodes, cfig = draw_graph(GC, cpos, cnode_color, cedge_width, cnode_labels, node_size=300, label_fsize=8)
It's possible to visualize our email structure and explore relationships in it.
I'm a system/software architect, working for the Core Software Group at Cisco Systems. Having many years of experience in software development, management, and architecture. I was working on various network services products in the areas of security, media, application recognition, visibility, and control. I'm holding a Master of Science degree in electrical engineering and business management from Tel Aviv University. Machine learning and analytics is one of my favorite topics. More can be found in my website: http://nir.bendvora.com/Work