- pm4py.org.discover_network_analysis(log: Union[DataFrame, EventLog, EventStream], out_column: str, in_column: str, node_column_source: str, node_column_target: str, edge_column: str, edge_reference: str = '_out', performance: bool = False, sorting_column: str = 'time:timestamp', timestamp_column: str = 'time:timestamp') Dict[Tuple[str, str], Dict[str, Any]] #
Performs a network analysis of the log based on the provided parameters.
The classical social network analysis methods are based on the order of the events inside a case. For example, the Handover of Work metric considers the directly-follows relationships between resources during the work of a case. An edge is added between the two resources if such relationships occurs.
Real-life scenarios may be more complicated. At first, is difficult to collect events inside the same case without having convergence/divergence issues (see first section of the OCEL part). At second, the type of relationship may also be important. Consider for example the relationship between two resources: this may be more efficient if the activity that is executed is liked by the resources, rather than disgusted.
The network analysis that we introduce here generalizes some existing social network analysis metrics, becoming independent from the choice of a case notion and permitting to build a multi-graph instead of a simple graph.
With this, we assume events to be linked by signals. An event emits a signal (that is contained as one attribute of the event) that is assumed to be received by other events (also, this is an attribute of these events) that follow the first event in the log. So, we assume there is an OUT attribute (of the event) that is identical to the IN attribute (of the other events).
When we collect this information, we can build the network analysis graph: - The source node of the relation is given by an aggregation over a node_column_source attribute. - The target node of the relation is given by an aggregation over a node_column_target attribute. - The type of edge is given by an aggregation over an edge_column attribute. - The network analysis graph can either be annotated with frequency or performance information.
The output is a multigraph. Two events EV1 and EV2 of the log are merged (indipendently from the case notion) based on having EV1.OUT_COLUMN = EV2.IN_COLUMN. Then, an aggregation is applied on the couple of events (NODE_COLUMN) to obtain the nodes that are connected. The edges between these nodes are aggregated based on some property of the source event (EDGE_COLUMN).
log – event log / Pandas dataframe
str) – the source column of the link (default: the case identifier; events of the same case are linked)
str) – the target column of the link (default: the case identifier; events of the same case are linked)
str) – the attribute to be used for the node definition of the source event (default: the resource of the log, org:resource)
str) – the attribute to be used for the node definition of the target event (default: the resource of the log, org:resource)
str) – the attribute to be used for the edge definition (default: the activity of the log, concept:name)
str) – decide if the edge attribute should be picked from the source event. Values: _out => the source event ; _in => the target event
bool) – boolean value that enables the performance calculation on the edges of the network analysis
str) – the column that should be used to sort the log before performing the network analysis (default: time:timestamp)
str) – the column that should be used as timestamp for the performance-related analysis (default: time:timestamp)
- Return type:
Dict[Tuple[str, str], Dict[str, Any]]
import pm4py net_ana = pm4py.discover_network_analysis(dataframe, out_column='case:concept:name', in_column='case:concept:name', node_column_source='org:resource', node_column_target='org:resource', edge_column='concept:name')