Index: /reasoner/evaluation.tex
===================================================================
--- /reasoner/evaluation.tex	(revision 230)
+++ /reasoner/evaluation.tex	(revision 231)
@@ -3,5 +3,5 @@
 The goal of this evaluation is to measure and quantify the performance of the implementation of the IVML reasoner, in particular in comparison to the initial version that we used as starting point for the revision. Therefore, we aim here at a practical and illustrative comparison in the sense of a relaxed technical experiment rather than a fully-fledged technical experiment. However, the technical measurement support that we employ would allow for an experiment in a controlled technical environment. 
 
-The practical perspective on the experiment that we take here allows us to measure the reasoner in some form of application setup, i.e., we consider variant setups including different Java versions (Java 8 and Java 9), different operating systems (Windows 7, Windows 10, Linux) as well as Eclipse as hosting platform. This allows us to discuss the impact of Java versions, operating systems and the platform, i.e., in Eclipse or standalone. In contrast, for a strict technical experiment, we would have to ensure that, e.g., only absolutely required services are running or a user interface (including the Eclipse user interface) that may influence the measurements is not present. Typically, such strict requirements exclude using Windows as operating system. We focus here exclusively on response time and leave other potential interesting performance dimensions like memory usage to future evaluations. 
+The practical perspective on the experiment that we take here allows us to measure the reasoner in some form of application setup, i.e., we consider variant setups including different Java versions (Java 8 and Java 9) and different operating systems (Windows 7, Windows 10, Linux) for the standalone variant of EASy-Producer. This allows us to discuss the impact of Java versions and operating systems. In contrast, for a strict technical experiment, we would have to ensure that, e.g., only absolutely required services are running or a user interface that may influence the measurements is not present. Typically, such strict requirements exclude using Windows as operating system. We focus here exclusively on response time and leave other potential interesting performance dimensions like memory usage to future evaluations. 
 
 We present in Section \ref{sectEvaluationSetup} the setup of this evaluation. As the involved IVML models must be ordered for the presentation of the results, we discuss in Section \ref{sectModelComplexity} a pragmatic ranking based on the model complexity. Finally,  in Section \ref{sectEvaluationResults}, we present and discuss the results.
@@ -21,9 +21,13 @@
 \emph{Data collection:} In the test cases mentioned above, we employ a generic measurement data collector, which can be feeded with key-value pairs representing measured (real) values. Collected values are stored when the data collection for a test case is finished. By default, the collector can automatically account for (wall) response time, which can help validating more detailed time measures collected during the execution of a treatment. For the measurements in this evaluation, we include the default statistics collected by the SSE reasoner, i.e., translation time, evaluation time, number of failed constraints, number of re-evaluated constraints, model statistics, and complexity measures (cf. Section \ref{sectModelComplexity}) delivered by EASy-Producer. However, EASy-Producer release v1.1.0 did not contain the generic data collector and several measures, so, along with the test cases, we patched the related code from v1.3.0 back\footref{fnPatch} into v1.1.0. 
 
-\MISSING{Here}
+\emph{Procedure:} We run each of the four test suites mentioned above 5 times to collect the measurements. Due to their use in continuous integration, each test suite is prepared to run in an own JVM instance. To compensate delayed JIT optimization, we include a ramp-up run that warms up the JVM. For most test cases, reasoning over a simple representative model including a compound type, a collection over that type and a quantor constraint over the container variable is sufficient. However, for the QualiMaster models, we added as ramp-up run a full run of one of the largest models without accounting for reasoning time or without performing code instantiation. We execute all tests suites in terms of an ANT script (based on existing mechanisms of the continuous integration) in the respective EASy-Producer standalone variant.  We execute the script on 
+\begin{itemize}
+  \item an actual office computer, a Dell Latitude 7490 laptop with an Intel core i7 vPro 8th Gen, 32 GBytes RAM with Windows 10 and Oracle JDK 9 64 bit. 
+  \item a predecessor office computer, a Dell laptop \TBD{XXX} with Windows 7 and Oracle JDK 8 64 bit. 
+  \item our continuous integration server, a Ubuntu Linux 16.4.5 LTS VM with 4 GBytes RAM and OpenJDK 8 64bit. Here the measurement script is part of a manual continuous integration task that prevents any other continous itegration task at the same time.
+\end{itemize}
+For both windows machines, we disable first the virus scanner and terminate all programs that are not required for the execution of the tests. 
 
-\emph{Procedure:} We run the four test suites mentioned above each 5 times to collect response time and model. Typically, each test suite runs individually in a JVM. To compensate delayed JIT optimization, we include a ramp-up run that warms up the JVM. For most test cases, a simple in-memory model with a compound type, a collection over that type and a quantor constraint over the container variable is sufficient. However, for the QualiMaster models, we added as ramp-up run a full run of one of the largest models without accounting for reasoning time or without performing instantiation. \TBD{We execute all tests in a script outside Eclipse to avoid disturbances caused by functionality of the IDE. We execute this procedure on the most recent version of EASy-Producer\footnote{Version 1.3.0-SNAPSHOT, TBD{git-hash}} on an actual development machine, a Dell laptop \TBD{XXX} with Windows 10 and JDK9. We select Windows for a better comparison with the base version and also to measure the reasoner in a typical environment. For comparison, we run the same version of EASy-Producer on a Dell laptop \TBD{XXX} with Windows 7 and JDK8. For both windows machines, we disable first the virus scanner and terminate all programs that are not required for the execution of the tests. On the Windows 7 machine we also run the base version of the reasoner\footref{reasonerBaseVersion}. Finally, for curiosity, we run the the test execution script also on our continuous integration server, a Linux... VM at a point in time when no Jenkins tasks are running.}
-
-\emph{Analysis:} After collecting the runtime results, we execute an R script which combines the results of all test executions for one complete run. The script produces various plots, all relating runtime and model complexity (cf. Section \ref{sectModelComplexity}). Some plots are inteded to provide an overview on the reasoning time for different modes, some how the reasoning time is composed (in terms of model translation time and constraint evaluation time) and some indicating the deviations across all series of test suite execution.
+\emph{Analysis:} After collecting the runtime measurements, we execute an R script which combines the results of all test executions for one complete run. The script produces various plots, all relating runtime and model complexity (cf. Section \ref{sectModelComplexity}). Some plots are inteded to provide an overview on the reasoning time for different modes, some how the reasoning time is composed (in terms of model translation time and constraint evaluation time) and some indicating the deviations across all series of test suite execution.
 
 \subsection{Model Complexity}\label{sectModelComplexity}