Index: /reasoner/evaluation.tex
===================================================================
--- /reasoner/evaluation.tex	(revision 232)
+++ /reasoner/evaluation.tex	(revision 233)
@@ -9,7 +9,9 @@
 \subsection{Setup}\label{sectEvaluationSetup}
 
-In this section, we present the setup of the experiment in terms of subjects, treatments, data collection, and experimental procedure.
+In this section, we present the setup of the experiment in terms of subjects (Section \ref{sectEvalSetupSubjects}), treatments (Section \ref{sectEvalSetupTreatments}), data collection (Section \ref{sectEvalSetupDataCollection}), experimental procedure (Section \ref{sectEvalSetupProcedure}), and analysis (Section \ref{sectEvalSetupAnalysis}).
 
-\emph{Subjects:} The subjects in this evaluation are two versions of the EASy-Producer SSE reasoner, namely: 
+\subsubsection{Subjects}\label{sectEvalSetupSubjects}
+
+The subjects in this evaluation are two versions of the EASy-Producer SSE reasoner, namely: 
 \begin{itemize}
   \item The original reasoner implementation that acted as basis for the revision. This base version\footnote{\label{reasonerBaseVersion}Git hash 6a00aa9c5aaa37ddb3d490d36c7e9a037e792656} is part of EASy-Producer release 1.1.0, i.e., we will call this original implementation \emph{reasoner v1.1.0}.
@@ -17,9 +19,27 @@
 \end{itemize}
 
-\emph{Treatments:} Several test cases of EASy-Producer involve reasoning, in particular the test suites for the SSE reasoner (based on the \IVML{ReasonerCore} test suite), the VIL runtime extension, the scenario test cases (including the models from FP7 QualiMaster \cite{EichelbergerQinSizonenko+16}) as well as the scenario test cases for the BMWi ScaleLog\footnote{These test cases are not publicly available as they contain propretary knowledge of the industrial partner in the ScaleLog project.} project.  We use these test cases as experimental treatments, although this involves test dependencies such as jUnit. While some of the test cases rely on programmed models (in terms of the IVML object model), most of the test cases specify the model in terms of IVML, i.e., for a more realistic setup and require for execution the IVML parser as well as dependent Eclipse libraries. As we focus on the reasoning time, the actual creation of the reasoning model shall not affect the results. Moreover, it is important to note that the code for reasoner v1.1.0 does not include several test cases that have been created for v1.3.0. For this experiment, we enable for v1.1.0 as many test cases as possible, i.e., we patch back\footnote{\label{fnPatch}Patch is available from \MISSING{XXX}.} the v1.3.0 test cases into v1.1.0 and, if required, either adjust the expected test result accordingly or, in extreme cases, disable test cases that cannot be handled by the v1.1.0 reasoner (or the underlying IVML implementation). For this reason, the treatment sets differ in terms of specific tests, while most of the imporant small and large models are the same. We believe that this is acceptable for an illustrative experiment. 
+\subsubsection{Treatments}\label{sectEvalSetupTreatments}
 
-\emph{Data collection:} In the test cases mentioned above, we employ a generic measurement data collector, which can be feeded with key-value pairs representing measured (real) values. Collected values are stored when the data collection for a test case is finished. By default, the collector can automatically account for (wall) response time, which can help validating more detailed time measures collected during the execution of a treatment. For the measurements in this evaluation, we include the default statistics collected by the SSE reasoner, i.e., translation time, evaluation time, number of failed constraints, number of re-evaluated constraints, model statistics, and complexity measures (cf. Section \ref{sectModelComplexity}) delivered by EASy-Producer. However, EASy-Producer release v1.1.0 did not contain the generic data collector and several measures, so, along with the test cases, we patched the related code from v1.3.0 back\footref{fnPatch} into v1.1.0. 
+Several test cases of EASy-Producer involve reasoning, namely 
+\begin{itemize}
+\item the EASy-Producer test suites for the SSE reasoner (based on the \IVML{ReasonerCore} test suite).
+\item the EASy-Producer test suite for the VIL runtime extension involving the reasoner to detect runtime deviations and to validate runtime changes in the configuration before executing them.
+\item scenario test cases for the BMWi ScaleLog\footnote{These test cases are not publicly available as they contain propretary knowledge of the industrial partner in the ScaleLog project.} project.  
+\item the EASy-Producer scenario test cases including several (historical) model variants from FP7 QualiMaster \cite{EichelbergerQinSizonenko+16}. 
+\item an extended set of scenario test cases derived from the largest QualiMaster models in the EASy-Producer scenario test cases. These tests were specifically defined for this experiment as the gap in model size between the QualiMaster models and the other models was too big. The QualiMaster model consists of several imported projects, various user-defined types, a topological structure for defining Big Data processing pipelines and 16 configured pipelines. This corresponds roughly to 20.000 individual variables. In the set of test cases, we created models with the same type definitions but systematically varied the number of pipelines, i.e., created projected models with one pipeline, two pipelines etc. All these models contain only the required linked variables such as algorithms or data sources so that the respective model is structurally valid and its configuration is consistent.
+\end{itemize}
+We use these test cases as experimental treatments, although this involves test dependencies such as jUnit. While some of the test cases rely on programmed models (in terms of the IVML object model), most of the test cases specify the model in terms of IVML, i.e., for a more realistic setup and require for execution the IVML parser as well as dependent Eclipse libraries. As we focus on the reasoning time, the actual creation of the reasoning model shall not affect the results. 
 
-\emph{Procedure:} We run each of the four test suites mentioned above 5 times to collect the measurements. Due to their use in continuous integration, each test suite is prepared to run in an own JVM instance. To compensate delayed JIT optimization, we include a ramp-up run that warms up the JVM. For most test cases, reasoning over a simple representative model including a compound type, a collection over that type and a quantor constraint over the container variable is sufficient. However, for the QualiMaster models, we added as ramp-up run a full run of one of the largest models without accounting for reasoning time or without performing code instantiation. We execute all tests suites in terms of an ANT script (based on existing mechanisms of the continuous integration) in the respective EASy-Producer standalone variant.  We execute the script on 
+It is important to note that the code for reasoner v1.1.0 does not include several test cases that have been created for v1.3.0. For this experiment, we enable for v1.1.0 as many test cases as possible, i.e., we patch back\footnote{\label{fnPatch}Patch is available from \MISSING{XXX}.} the v1.3.0 test cases into v1.1.0 and, if required, either adjust the expected test result accordingly or, in extreme cases, disable test cases that cannot be handled by the v1.1.0 reasoner (or the underlying IVML implementation). For this reason, the treatment sets differ in terms of specific tests, while most of the imporant small and large models are the same. We believe that this is acceptable for an illustrative experiment. 
+
+\subsubsection{Data Collection}\label{sectEvalSetupDataCollection}
+
+In the test cases mentioned above, we employ a generic measurement data collector, which can be feeded with key-value pairs representing measured (real) values. Collected values are stored when the data collection for a test case is finished. By default, the collector can automatically account for (wall) response time, which can help validating more detailed time measures collected during the execution of a treatment. 
+
+For the measurements in this evaluation, we include the default statistics collected by the SSE reasoner, i.e., translation time, evaluation time, number of failed constraints, number of re-evaluated constraints, model statistics, and complexity measures (cf. Section \ref{sectModelComplexity}) delivered by EASy-Producer. However, EASy-Producer release v1.1.0 did not contain the generic data collector and several measures, so, along with the test cases, we patched the related code from v1.3.0 back\footref{fnPatch} into v1.1.0. 
+
+\subsubsection{Experimental Procedure}\label{sectEvalSetupProcedure}
+
+We run each of the four test suites mentioned above 5 times to collect the measurements. Due to their use in continuous integration, each test suite is prepared to run in an own JVM instance. To compensate delayed JIT optimization, we include a ramp-up run that warms up the JVM. For most test cases, reasoning over a simple representative model including a compound type, a collection over that type and a quantor constraint over the container variable is sufficient. However, for the QualiMaster models, we added as ramp-up run a full run of one of the largest models without accounting for reasoning time or without performing code instantiation. We execute all tests suites in terms of an ANT script (based on existing mechanisms of the continuous integration) in the respective EASy-Producer standalone variant.  We execute the script on 
 \begin{itemize}
   \item an actual office computer, a Dell Latitude 7490 laptop with an Intel core i7 vPro 8th Gen, 32 GBytes RAM with Windows 10 and Oracle JDK 9 64 bit. 
@@ -29,7 +49,9 @@
 For both windows machines, we disable first the virus scanner and terminate all programs that are not required for the execution of the tests. 
 
-\emph{Analysis:} After collecting the runtime measurements, we execute an R script which combines the results of all test executions for one complete run. The script produces various plots, all relating runtime and model complexity (cf. Section \ref{sectModelComplexity}). Some plots are inteded to provide an overview on the reasoning time for different modes, some how the reasoning time is composed (in terms of model translation time and constraint evaluation time) and some indicating the deviations across all series of test suite execution.
+\subsubsection{Analysis}\label{sectEvalSetupAnalysis}
 
-\subsection{Model Complexity}\label{sectModelComplexity}
+After collecting the runtime measurements, we execute an R script which combines the results of all test executions for one complete run. The script produces various plots, all relating runtime and model complexity (cf. Section \ref{sectModelComplexity}). Some plots are inteded to provide an overview on the reasoning time for different modes, some how the reasoning time is composed (in terms of model translation time and constraint evaluation time) and some indicating the deviations across all series of test suite execution.
+
+\subsection{Model Ranking and Complexity}\label{sectModelComplexity}
 
 When evaluating the reasoning capabilities of variability models, such as feature models, typically a measure is employed to characterize the model. Measures include the number of features \cite{Benavides2006AFS}, the number of constraints \cite{Benavides2006AFS} or combinations, e.g., constraints per feature or the constraint ratio \cite{Mendoca08}, graph width \cite{PohlStrickerPohl13} or more complex approaches \cite{StuikysDamasevicius09}. 
@@ -41,15 +63,30 @@
 The complexity metric applied here consists of four parts, the
 \begin{enumerate}
-\item measure of the structure (type) of a given model element $e$, denoted $cpx_v(e)$.
+\item measure of the structure of a given model element $e$, denoted $cpx_v(e)$.
 \item measure of the constraints of a certain model element $e$, denoted by $cpx_c(e)$, which, in turn, is based on the 
 \item measure of a given (constraint, default value) expression $e$, denoted as $cpx_e(e)$ .
 \item weighting $w_{cpx}(e)$ for a model element, constraint or expression $e$.
 \end{enumerate}
+%
+For a given IVML configuration $cfg$, we calculate then
+%
+$$cpx(cfg) = cpx_v(cfg) + cpx_c(cfg)$$ 
+%
+as a combination of structure and constraints.
 
-Due to the nested structure of IVML models, most of the formulae are recursive. Within these formulae, the weighting function $w_{cpx}(e)$ is mostly applied in additive manner. In two cases, we use $w_{cpx}(e)$ in a multiplicative manner, in particular to disable parts of the calculation, e.g., for constraints or nested variables, so that we also can express the traditional counting approach for features and constraints through $cpx_v(e)$ and $cpx_c(e)$ in an integrated way. We explain now the formulae for the four parts. Thereby, we introduce some additional properties of IVML model elements that have not been used so far and that will only be used within this section.
+Due to the nested structure of IVML models, most of the formulas are recursive. Within these formulas, the weighting function $w_{cpx}(e)$ used as in additive fashion if the argument is an usual (user-defined) type. If $w_{cpx}(e)$ refers to a meta-type such as \IVML{Expr} denoting all expression elements, then $w_{cpx}(e)$ is used in multiplicative manner to allow for disabling the related structure or constraint calculation. 
 
-\TBD{here}
+In the sections below, explain now the formulas for the four parts. In this course, we introduce some special functions for IVML model elements, in particular to query the meta types of IVML elements. These functions have not been used so far and that will only be used within this section.
 
-The measure of the variable structure $cpx_v(e)$ calculates a weighted sum of the number of nested variables for an IVML model element starting with a given configuration. We do not rely here on the meta-model, i.e., the project, as a configuration contains all actually available variables, i.e., also those created by assignment constraints in terms of compound or container instances. When applying $cpx_v(e)$ to a configuration, the sum of the measures for all variables is calculated. In turn, for a variable, we add a weight for the type of the variable (e.g., if we want to weight complex types like containers or compounds higher) with the (recursive) sum of $cpx_v(e)$ over all nested variables (weighted by the IVML type \IVML{Var} for decision variables to disable measuring nested variables).
+\subsubsection{Structure Measure}
+
+The measure of the structure $cpx_v(e)$ calculates a weighted sum of the (nested) variables in a given configuration. We rely here on configuration variables, as a configuration contains all available variables including nested ones as defined by assignment constraints for compounds or containers. 
+%When applying $cpx_v(e)$ to a configuration, the sum of the measures for all variables is calculated. 
+
+The measure of an individual variable consists of two additive components: 
+\begin{enumerate}
+\item  The weight for the respective type of the variable to allow for weighting complex types differently than basic types. 
+\item The (recursive) sum of $cpx_v(e)$ over all variables nested nested in $e$. This sum is weighted by the IVML meta type \IVML{Var} to be able to disable considering nested structures.
+\end{enumerate}
 %
 $$
@@ -57,18 +94,27 @@
        \sum_{v\in vars(e)}cpx_v(e) & \text{if } isConfiguration(e)\\
        w_{cpx}(type(e)) +  & \\
-       w_{cpx}(\IVML{Var}) \cdot \sum_{n\in vars(e)}cpx_v(n) & \text{if } isVariable(e)\\
+       \ w_{cpx}(\IVML{Var}) \cdot \sum_{n\in vars(e)}cpx_v(n) & \text{if } isVariable(e)\\
        0 & \text{else}
        \end{cases}
 $$
+\subsubsection{Constraint Measure}
 %
-The measure of the constraints $cpx_c(e)$ considers all types of model elements that may contain constraints and sums the measures for these constraints recursively. In more details, for a configuration we consider the contained (instantiated) variables as well as all top-level elements in the underlying project. For a project, we calculate the sum of $cpx_c(e)$ for all model elements. For a decision variable (not the underlying declaration), we build the recursive sum over all nested variables and add the complexity of the default value expression. Here we use the weighting $w_{cpx}(\IVML{Expr})$ for the IVML type \IVML{Expr} (expression) to allow disabling the measure for the default value expression. For an eval-block, we sum up the measures for all nested constraints and (potentially nested) recursive eval blocks. Similarly, for annotation assignments, we sum the measures for all nested constraints and (potentially nested) recursive annotation assignments. %implicit default values missing
-For a user-defined operation (identified via $isOpDef(e)$), we just take the measure for the function defining the operation into account. For a constraint, we combine a basic additive weight (to enable just counting constraints) with the weighted measure for the expression of the constraints (to disable the calculation of the actual expression).
+The measure of the constraints $cpx_c(e)$ considers all types of model elements that may contain constraints and calculates the recursive sum of the measures for these constraints. In more details, for a configuration we consider the contained configuration variables as well as all top-level elements in the underlying project. For a
+%
+\begin{itemize}
+  \item configuration variable we build the recursive sum over all nested variables and add the measures of the default value expressions. Here we apply the weighting $w_{cpx}(\IVML{Expr})$ for the IVML meta type \IVML{Expr} denoting all expressions to allow disabling the measure for default value expressions.
+  \item project we calculate the sum of $cpx_c(e)$ for all model elements, in particular top-level eval blocks, assignment blocks, and constraints as well as operation definitions. For now we do not take the import structure of projects into account.
+  \item eval-block we sum up the measures for all nested constraints and (potentially nested) recursive eval blocks. 
+  \item annotation assignment we sum the measures for all nested constraints and (potentially nested) recursive annotation assignments. The declared variables are already considered by $cpx_v(e)$.
+  \item user-defined operation (identified via $isOpDef(e)$), we take the expression measure for the function defining the operation into account. The expression measure $cpx_e(e)$ will be introduced in Section \ref{sectExpMeasureExpr}. 
+  \item constraint, we combine an additive weight (to enable counting constraints) with the weighted measure for the meta type \IVML{expr} and the expression measure of the constraint (to be able to exclude the measure for the constraint expression $expr(e)$). The expression measure $cpx_e(e)$ will be introduced in Section \ref{sectExpMeasureExpr}. 
+\end{itemize}
 %
 $$
    cpx_c(e) = \begin{cases} 
        \sum_{v\in vars(e)}cpx_c(v) + cpx_c(project(e)) & \text{if } isConfiguration(e)\\
-       \sum_{f\in elements(e)}cpx_c(f)  & \text{if } isProject(e)\\
        \sum_{v\in vars(v)}cpx_c(e) \\
        \text{ } + w_{cpx}(\IVML{Expr}) \cdot cpx_e(default(e)) & \text{if } isVariable(e)\\
+       \sum_{f\in elements(e)}cpx_c(f)  & \text{if } isProject(e)\\
        \sum_{x\in constraints(e)~\cup~evals(e)} cpx_c(x) & \text{if } isEval(e)\\ 
        \sum_{x\in constraints(e)~\cup~assignments(e)} cpx_c(x) & \text{if } isAssignment(e)\\
@@ -78,6 +124,7 @@
        \end{cases}
 $$
+\subsubsection{Expression Measure}\label{sectExpMeasureExpr}
 %
-The measure for an expression is calculated along the expression tree, i.e., we weight each tree node and typically sum up the connected sub-trees. A tree node can have various types that we enumerate as cases here. Most of the operations and functions that can be used in an IVML expression are internally represented as a call. For a call, we just summarize the measures of all arguments, e.g., for the plus operation the measures of the left and right hand side expression. Parentheses, container iterators, let expressions and accessors mainly consist of a single expression that makes up the measure. For an if-then-else expression, we sum up the expressions constituting the condition, the then part and the else part. For (compound or container) initializers as well as for expression blocks, we just sum up the measures for the contained expressions.
+The measure for an expression is calculated based on the expression tree, i.e., we add the individual weight for each tree node $w_{cpx}(e)$\footnote{Here, expression (meta) types are used in an additive manner.} to the (recursive) sum over the respective expression sub-trees (argument expressions, contained expressions, value expressions). A tree node can have various types that we enumerate as cases here. Most of the operations and functions that can be used in an IVML expression are internally represented as a call node. For a call node, we summarize the measures of all arguments, e.g., for the plus operation the measures of the left and right operand expression. Parentheses, container iterators, let-expressions and accessors consist of a single expression that makes up the respective measure. For an if-then-else expression, we sum up the expressions constituting the condition, the then-part as well as the else-part. For (compound or container) initializers as well as for expression blocks, we calculate the sum of the measures of the contained expressions.
 %
 $$
@@ -96,22 +143,17 @@
 $$
 %
-The traditional measure applied to an IVML model would determine the number of top-level variables and the number of constraints. This can be achieved as follows, i.e., we disable measuring nested variables and expressions, an enable counting constraints and remaining variables by setting the type weight for $cpx_v(e)$ and the constraint weight for $cpx_c(e)$ both to 1. All remaining weights, in particular for constraint expressions are set to 0. The traditional measures can now be expressed by $cpx_c(cfg)/cpx_v(cfg)$ for a given configuration $cfg$.
+\subsubsection{Weighting Function}
+%
+The weighting function decides about the relative weight of individual elements and whether parts of the structure shall be considered at all. 
+
+According to our experience, traditional measures such as the number of variables, the number of constraints, their sum or ratio leads to a wrong impression of the expected 'complexity' of complex IVML models. This is due to the fact that neither nested variables, nor types or constraint forms are considered appropriately, also as they mostly do not exist in such forms in traditional variability modeling. 
+
+Based on our impression, compound and container types as well as quantor or iterator expressions lead to a higher perceived complexity, while accessing the actual value of a variable and of a constraint (often used in default value expressions) does not really increase complexity. For this experiment, we defined the weighting function as follows (not claiming to present a universal complexity measure for IVML models here), that we calibrated by sample runs over the subjects. 
 %
 $$
     w_{cpx}(e) = \begin{cases}
-       0 & \text{if } \IVML{Var}~\vee~\IVML{Expr}\\
-       1 & \text{if } isConstraint(e)\\ % in particular 0 for Expr
-       1 & \text{if } isType(e)\\ %count variables equally
-       0 & \text{else} % scope out \IVML{Variable}
-    \end{cases}
-$$
-%
-As mentioned above, the traditional measure leads to a wrong impression of the actual complexity, as nested variables are not counted and and the different types of constraints are just considered with the same constant value. According to our experience and the algorithms presented in this document, also compound and container types as well as quantor or iterator expressions imply a higher complexity, while accessing the actual value of a variable and of a constraint (often used in default value expressions) is typically not so complex. For this experiment, we defined the weighting function as follows (not claiming to present a universal complexity measure for IVML models here), that we calibrated by sample runs over the subjects. 
-%
-$$
-    w_{cpx}(e) = \begin{cases}
+       1 & \text{if } isType(e)\\
+       2 & \text{if } isCompound(e)~\vee~isContainer(e)\\
        1 & \text{if } \IVML{Var}~\vee~\IVML{Expr}\\
-       2 & \text{if } isCompound(e)~\vee~isContainer(e)\\
-       1 & \text{if } isType(e)\\
        1 & \text{if } isCall(e)~\vee~isLet(e)\\
        1 & \text{if } isParenthesis(e)~\vee~isAccessor(e)\\
@@ -125,6 +167,20 @@
 $$
 
-For short, we enable considering nested variables and the contents of expressions. Compounds and containers are considered with a double weight compared with all other types. Most of the constraint tree nodes are weighted by 1 including access to the actual instance of a compound (\IVMLself{}), except for container iterator operations that we weight by a higher and access to the value of decision variables and constants that we weight by a lower value. For a given IVML model, we calculate $cpx_v(cfg) + cpx_c(cfg)$ as overall complexity measure, because ratio-based measures as in the more homogeneous feature case mentioned above do not seem to correctly reflect the complexity of IVML models.
+For short, compounds and containers are taken into account by a double weight compared with all other types. As argued above, we consider nested variables as well as the contents of expressions. Most of the constraint tree nodes are weighted by 1 including access to the actual instance of a compound (\IVMLself{}), except for container iterator operations that we weight by a higher and access to the value of decision variables and constants that we weight by a lower value.
 
+%as overall measure, because ratio-based measures as in the more homogeneous feature case mentioned above do not seem to correctly reflect the expected 'complexity' of IVML models.
+%
+The measurement approach defined in this section also allows calculating some traditional measures, e.g., the number of top-level variables or the number of constraints. Just for demonstration purposes, this can be achieved using a different weighting function. Here, we disable measuring nested variables and expressions, and enable counting constraints and remaining variables by setting the type weight for $cpx_v(e)$ and the constraint weight for $cpx_c(e)$ both to 1. All remaining weights, in particular for constraint expressions are set to 0.
+%
+$$
+    w_{cpx}(e) = \begin{cases}
+       1 & \text{if } isType(e)\\ %count variables equally
+       0 & \text{if } \IVML{Var}~\vee~\IVML{Expr}\\
+       1 & \text{if } isConstraint(e)\\ % in particular 0 for Expr
+       0 & \text{else} % scope out \IVML{Variable}
+    \end{cases}
+$$
+For a given configuration $cfg$, the number of (top-level) variables can now be determined by $cpx_v(cfg)$, the number of constraints by $cpx_c(cfg)$, the sum of both by $cpx(cfg)$ and the constraint ratio by $cpx_c(cfg)/cpx_v(cfg)$.
+%
 \subsection{Results}\label{sectEvaluationResults}