Index: /reasoner/evaluation.tex
===================================================================
--- /reasoner/evaluation.tex	(revision 247)
+++ /reasoner/evaluation.tex	(revision 248)
@@ -49,17 +49,18 @@
 During the experiment, we execute each of the test suites in Section \ref{sectEvalSetupTreatments}. Due to their use in continuous integration, each test suite runs in an own JVM instance. To compensate delayed just-in-time (JIT) compilation, we include a specific ramp-up test in the experimental runs to warm up the JVM.  For most test cases, reasoning over a simple representative model including a compound type, a collection over that type and a quantor constraint over the container variable seems to be sufficient. For the QualiMaster models, we added as ramp-up run a full run of one of the largest models without accounting for reasoning time. If the test cases include artifact instantiation trough VIL, we disable the instantiation phase.
 
-However, pilot experiments showed that still significant differences between the first runs of a test case and subsequent runs may occur. Thus, within each suite, we repeat the reasoning functionality of each test case 10 times on a fresh configuration. These 10 repetitions make up the intra-experiement repetitions. In particular, the repetitions allow for (later) excluding  warmup runs as well as for basic descriptive statistics such as confidence intervals \MISSING{rigorous}. 
-
-On a given machine/device, we first perform an initial run and then the experimental runs. During the initial run, we execute the test suites on the target device to validate that all tests are passed successfully. The measurements of this run are stored separately. Then, for taking the experimental measures, we repeate the execution of the test suites 5 times with 5 seconds pause between two subsequent runs.
-
-
-
- We execute all tests suites in terms of an ANT script (based on existing mechanisms of the continuous integration) in the respective EASy-Producer standalone variant.  We execute the script on 
-\begin{itemize}
-  \item an actual office computer, a Dell Latitude 7490 laptop with an Intel core i7 vPro 8th Gen, 32 GBytes RAM with Windows 10 and Oracle JDK 9 64 bit. 
-  \item a predecessor office computer, a Dell laptop \TBD{XXX} with Windows 7 and Oracle JDK 8 64 bit. 
-  \item our continuous integration server, a Ubuntu Linux 16.4.5 LTS VM with 4 GBytes RAM and OpenJDK 8 64bit. Here the measurement script is part of a manual continuous integration task that prevents any other continous itegration task at the same time.
-\end{itemize}
-For both windows machines, we disable first the virus scanner and terminate all programs that are not required for the execution of the tests. 
+However, pilot experiments showed that still significant differences between the first runs of a test case and subsequent runs may occur. Thus, within each suite, we repeat the reasoning functionality of each test case 10 times on a fresh configuration. These 10 (here intuitively chosen) repetitions make up the intra-experiement repetitions. In particular, the repetitions allow for (later) excluding  warmup runs as well as for basic descriptive statistics such as confidence intervals \cite{GeorgesBuytaertEeckhout07}. 
+
+On a given machine/device, we first perform an initial run and then the experimental runs. During the initial run, we execute the test suites on the target device to validate that all tests are passed successfully. The measurements of this run are stored separately. For taking the experimental measures, we repeate the execution of the test suites 5 times (inter-experiment repetition) with 5 seconds pause between two subsequent runs.
+
+Executing the test suites is realized as an ANT script, because collecting all dependencies for the standalone version of EASy-Producer is not trivial. For this task, we reuse the respective part of the ANT build mechanism from the continuous integration. ANT supports executing jUnit test suites based on a classpath constructed from the dependencies, so executing the test suites described in Section \ref{sectEvalSetupTreatments} is rather straightforward. The inclusion of the ramp-up tests, the intra-experiment and inter-experiment repetitions as well as the waiting time are configured in the ANT script and passed to the EASy-Producer test suites via environment parameters. For separating the intial and the experimental runs, the ANT script defines two specific tasks. In turn, the ANT script itself can be used as build action in the continuous integration, i.e., for collecting performance readings on the continuous integration server, e.g., for detecting performance degradation.
+
+We execute the ANT script, i.e., the experiments, on 
+\begin{itemize}
+  \item an actual office computer: Dell Latitude 7490 laptop with Intel core i7-8650 U processor (4 physical cores at 1.9 GHz) , 32 GBytes RAM with Windows 10 Professional (10.0.17134) and Open JDK 10.0.2 64 bit. Besides Maven 3.2.3, ANT 1.10.3 with Maven-ANT-tasks 2.1.3, an Eclipse Oxygen 3a release 4.7.3a are installed. 
+  \item a retired office computer used for the reason reveisions: Dell laptop \TBD{XXX} with Intel core i7-3367U (2 physical cores at 2.00GHz), Windows 7 version 6.1.761 SP 1 and Oracle JDK 1.8.0\_66 64 bit. Besides Maven 3.2.3, ANT 1.10.3 with Maven-ANT-tasks 2.1.3, an Eclipse Mars2 release 4.5.2 are installed.
+  \item our continuous integration server, a Ubuntu Linux 16.4.5 LTS VM with 4 GBytes RAM and OpenJDK 8 64bit. The measurement script is integrated as a manual build action that prevents any other continous build action running at the same time.
+  \item a Raspberry Pi 3 by vendor element14 hosting an 8 GB SanDisk class-4 SD (one from \cite{KnocheEichelberger18}) with Raspbian Stretch Lite version November 2018, Linux Kernel 4.14 and Oracle JDK 1.8.\TBD{65} ARM and, as alternative for some experiments, Oracle JDK 1.8.0\_201 ARM.
+\end{itemize}
+For both windows machines, we terminate all programs that are not required for the execution of the tests (leaving the virus scanner in operation as usual during development). 
 
 \subsubsection{Analysis}\label{sectEvalSetupAnalysis}
Index: /reasoner/reasoner.bib
===================================================================
--- /reasoner/reasoner.bib	(revision 247)
+++ /reasoner/reasoner.bib	(revision 248)
@@ -493,3 +493,41 @@
 }
 
+@inproceedings{KnocheEichelberger18,
+ author = {Knoche, Holger and Eichelberger, Holger},
+ title = {Using the Raspberry Pi and Docker for Replicable Performance Experiments: Experience Paper},
+ booktitle = {Proceedings of the ACM/SPEC International Conference on Performance Engineering},
+ series = {ICPE '18},
+ year = {2018},
+ isbn = {978-1-4503-5095-2},
+ location = {Berlin, Germany},
+ pages = {305--316},
+ numpages = {12},
+ url = {http://doi.acm.org/10.1145/3184407.3184431},
+ doi = {10.1145/3184407.3184431},
+ acmid = {3184431},
+ _publisher = {ACM},
+ _address = {New York, NY, USA},
+ keywords = {Raspberry pi, performance benchmark, replicability, single-board computer},
+
+
+@article{GeorgesBuytaertEeckhout07,
+ author = {Georges, Andy and Buytaert, Dries and Eeckhout, Lieven},
+ title = {{Statistically Rigorous Java Performance Evaluation}},
+ journal = {SIGPLAN Not.},
+ issue_date = {October 2007},
+ volume = {42},
+ number = {10},
+ month = oct,
+ year = {2007},
+ issn = {0362-1340},
+ pages = {57--76},
+ numpages = {20},
+ url = {http://doi.acm.org/10.1145/1297105.1297033},
+ doi = {10.1145/1297105.1297033},
+ acmid = {1297033},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+ keywords = {benchmarking, data analysis, java, methodology, statistics},
+} 
+
 @Comment{jabref-meta: databaseType:bibtex;}