Decomposing Petri Nets for Process Mining –A Generic Approach

Ref :

Aalst, van der, W. M. P. (2013). Decomposing Petri nets for process mining : a generic approach. Distributed and Parallel Databases, 31(4), 471-507. DOI: 10.1007/s10619-013-7127-5

Summary

This paper provides a generic approach, i.e., the goal is not to decompose one particular process mining algorithm, but to decompose a large class of existing process discovery and conformance checking techniques.

The notations used in the following contents are snychronous with the formalisms described in the previous article.

Need for decomposing comformance checking

Conformance checking investigates how well an event log $L \in \mathcal{B}(\mathbf{\Sigma}^+)$ and a process model $M \in \mathcal{M}$ fit together

Process model evaluation can be expensive when the event log contains too many activities. Especially, when using genetic process mining, one needs to ckeck the fitnesss of every individual model in every generation. For each conformance check, the whole log needs to be traversed.

Existing conformance checking approaches use state-space analysis or optimization over all possible alignments. These techniques do not scale linearly in the number of activities.

Decomposing the process model into model fragments enable splitting conformance checking computations in several parallel chuncks

Decomposing conformance checking

One of the metrics used for process model evaluation is fitness.

Fitness can be found by considering the fraction of perfectly fitting traces

$\cfrac{\mid \{ \sigma \in L \quad | \quad \sigma \in \mathcal{L}(M)\} \mid}{|L|} \qquad$ where $\mathcal{L}(M) \ $ is the language of the process model $M$

Fitness checking can be parallelized if the following decomposed computation of fitness is possible,

$\cfrac{\mid \{ \sigma \in L \quad | \quad \sigma \in \mathcal{L}(M)\} \mid}{|L|}\quad = \quad \cfrac{\mid \{ \sigma \in L \quad | \quad \forall_{1 \leq i \leq n} \ \sigma \upharpoonright_{\mathbf{\Sigma^i}} \ \in \mathcal{L}(M^i)\} \mid}{|L|} $

Where,

$D = \{ M^1, M^2, ..., M^n\} \ $ is certain decomposition of the process model $M$, such that $M = \bigcup_{1 \leq i \leq n}M^i$

$\mathbf{\Sigma^i} \ $ is the set of visible activities corresponding to subnet i

$\sigma \upharpoonright_{\mathbf{\Sigma^i}} \ $ is the projection of the log trace on $\mathbf{\Sigma^i}$

To achieve the above decomposition, the conditions for decomposition of the system net have to be derived such that the following theorem holds,

(Theorem 2) $L \ $ is perfectly fitting system net $M \ $ if and only if for all $ 1 \leq i \leq n \ : \ L\upharpoonright_{\mathbf{\Sigma^i}} \ $ is perfectly fitting $M^i \ $

That is, if smaller traces fit the individual model fragments, then they can be composed into the overall trace that fits into the process model (and vice versa)

For Theorem 2 to hold, the decomposition should meet the following conditions (from proof for Theorem 2)

1) Each place, invisibe transitions and edge should reside in just one subnet

2) Only unique transitions can be shared among subnets.

The decomposition $D$ which meets the above conditions is called a valid decomposition.

Maximal decompostion

The individual subnets are made small as possible. (This can improve parallelism. Although, too many chuncks might increase communication overhead)

For maximal decomposition to be a valid decomposition, the following condition have to be met (from proof for Theorem 1)

1) All edges in a subnet should be "connected" only via places or invisible transitions

2) When multiple transitions have same label appears in different partitions, they have to be merged into a single subnet

Complexity of maximal decomposition is quadratic in edges, which is negligible compared to complexity of process mining problem (which is exponential in number of unique activities)

Note :

Fitness is just one of the 4 main performance metrics (the other three being, precision, generalizability and simplicity).

Alignments can be used as starting point for many other types of analysis. For eg, alignments can be used as a basis for precision analysis [2]

Ref :

[2] A. Adriansyah, J. Munoz-Gama, J. Carmona, B.F. van Dongen, and W.M.P. van der Aalst. Alignment Based Precision Checking. In B. Weber, D.R. Ferreira, and B. van Dongen, editors, Workshop on Business Process Intelligence (BPI 2012), Tallinn, Estonia, 2012.

Adjusted cost function

Alignment based fitness can be found by considering the cost of optimal alignment

$ 1 - \cfrac{\text{Cost of optimal alignment}}{\text{Cost of worst alignment}}$

When a (unique) activity ($a \in \mathbf{\Sigma}$), which is shared amoung subnets is mis-aligned, the cost for the corresponding legal move ($ \delta(x,y) $) is multipled by the number of subnets in which this activity ($a$) appears.

Hence, the new cost for legal move ($ \delta_Q(x,y) $) is divided by the number of subnets containing that activity

$\delta_Q(x,y) = \cfrac{\delta(x,y)}{\mid \{ 1 \leq i \leq n \ | \ a \in \mathbf{\Sigma^i} \} \mid}$

Lower bound for cost

The optimial alignment in the overall model ($ \sigma^* $) may not be same as the optimal alignment computed by composing the best alignments in subnets ($ \sigma_i^* $)

But the associated alignment cost in the overall model ($ C_M(\sigma^*, \delta) $) will not be less that of the sum of alignment cost in the individual subnets ($ C_{M^i}(\sigma_i^* , \delta_Q) $)

$C_M(\sigma^*, \delta) \ \geq \ \sum_i C_{M^i}(\sigma_i^* , \delta_Q)$

This gives a lower bound for overall costs ( or an upper bound for overall fitness), which can be used as an optimistic estimate.

Need for decomposing process discovery

In many real life cases, data are distributed in several computers. Therefore, process discovery techniques should be capable of discovering the overall process models from isolated event logs ( which agree on same process instance)

Decomposing Process discovery

Process model ($M$) is discovered by mapping the log of events $L$, with observed activities $\mathbf{\Sigma}$ onto a system net with an injective labelling function $l$ (That is, there are no duplicate activities)

$ M : \quad \mathcal{B}(\mathbf{\Sigma}^+) \supseteq \quad L \to (N, m_0, m_f ) \quad \in \mathcal{M}\qquad$ (See formalism notes for notations)

$ l : \mathbf{T} \nrightarrow \mathbf{\Sigma} \qquad$ Where $l$ is injective $\implies$ every transition is mapped to a uniqe activity

In principle, Theorem 2 also allows for duplicate activities within one fragment, but for simplicity we do not consider this case here. Therefore, by imposing $l$ to be injective, we directly get a valid decomposition, which holds theorem 2 by the decomposing the activities $\mathbf{\Sigma}$ into overlapping subsets and discovering a process model $M_i$ for each subset

$L \in \mathcal{B}(\mathbf{\Sigma}^+)$

$\mathcal{C} = \{ \mathbf{\Sigma_1}, \mathbf{\Sigma_2}, ..., \mathbf{\Sigma_n}\} \qquad$ with $\mathbf{\Sigma} \ = \ \bigcup \mathcal{C}$

$M^i : \quad \mathcal{B}(\mathbf{\Sigma_i}^+) \ni L \upharpoonright_{\mathbf{\Sigma_i}} \to (N_i, m_0^i, m_f^i ) \quad \in \mathcal{M} $

$D = \{ M^1, M^2, ... M^n\} \qquad$ and $M = \bigcup_{1 \leq i \leq n}M^i$

$l$ is injective $\implies \ D \ $ is a valid decomposition and holds Theorem 2

Note:

1) Each subnet is wrapped with start and end places

2) While composing the subnets, ensure that places are renamed to avoid name clashes

3) After composing the subnets, there may be redundunt places (i.e, if it's removal does not change the behaviour) which can be removed.

Decomposing activity sets

One method of decomposition is to run graph partitioning algorithm on a causal graph (Most process mining algorithms build such graphs in a pre processing step)

Causal graph : $(\mathbf{\Sigma}, R) \qquad$ where $R \subseteq \mathbf{\Sigma} \times \mathbf{\Sigma}$

There are various approaches to partition the graph and investigating each of them are not within the scope of this article.

The quality of the resulting overall model also heavily depends on the way the set of activities is decomposed. Ideally, we need a decomposition that

1) Maximizes cohesion - group activities that are closely related. The case of least cohesive decomposition is the singleton decomposition which results in an underfitting model

2) Minimize coupling - there should be minimal overlap between different sets of activities.

The quality of overall model also depends on the process discovery algorithm used. Due to overhead, the decomposed approach may even take longer if an unsuitable decomposition is used.

Future work required (as of 2013)

The most challenging part is to aggregate the quality metrics per model fragment and sublog into metrics for the overall model and log. This paper focussed on the notion of fitness. Using the notion of alignment, other conformance metrics related to precission and generalization have to be explored.

The performance gains on using different algorithms for graph partitioning of causal graph have to be explored.