Design of experiments in simulation

Embedded Scribd iPaper - Requires Javascript and Flash Player
DESIGN OF EXPERIMENTS IN SIMULATION
Pau Fonseca i Casas; pau@fib.upc.edu
Design of experiments in simulation

 
Usually simulation is carry out as a programming exercise. Inaccurate statistical methods (no IID). Take care of the time required to collect the needed data to apply the statistical techniques, with guaranties of achieve the accomplishment of the objectives.
Design of experiments in simulation

How to make the comparisons between different configurations.
 The
comparisons must be the more homogeneous as possible.

Study the effect over the answer variable of the values of the different experimental variables.
 In
a cashier: Answer variable: Queue long; factors: Number of cashiers, service time, time between arrivals.
Principles
Principles to develop a good design of experiments:  Randomization: Assignation to the random of all the factors that are not controlled by the experimentation.  Repetition of the experiment (replication): Is a good method to reduce the variability between the answers.  Statistical homogeneity of the answers: To compare different alternatives derived from the results, is needed that the executions of the experiments have been done under homogeny conditions. Factorial design helps to obtain this similarity between the experiments.
Replications
Number of replications calculus. Methods to perform the replications.
Interest variable calculus
Experimentation

Be x an interest variable x11,…x1i,…,x1m x21,…x2i,…,x2m ……………… xn1,…xni,…,xnm n is the number of replications. xi is the value of each one of the replications.
 
Sample mean 
X 
x
i 1
n
i
n
Sample variance 2
S 
2
x  X
i 1 i
n
n 1
Confidence interval
 
Need to know how far is  and X . Student’s t-distribution of n-1 degrees of freedom.
X  t 1  2 , n  1
S n
2
Student’s t-distribution
What is the correct n?
Replication 1 2 3 4 5 Value from the model 28.841 35.965 31.219 37.090 38.734
6
7 8 9 10
30.923
30.443 32.175 30.683 28.745
Calculus of S an X
X  32.4818 S  3.5149
Calculus of the self-confidence interval
h = t1 2,n 1
t9,0.975 = 2,26 h = 2,512
S n
Confidence interval:


( 32.4818-2.512 = 29.9698, 32.4818 + 2.512 = 34.9938 ) The interpretation is that with a probability of 0.95, the random interval (29.9698, 34.9938) includes the real value of the mean.
More replications needed.


If we specify that we want an interval between a 5% of the sample mean with a confidence level of a 95%, we need more replications. 0.05·( 32.4818 ) = 1.62 but we have 2.512
Number of needed replications
   

on: n = initial number of replications. n* = total replications needed. h = half-range of the confidence interval for the initial number of replications. h* = half-range of the confidence interval for all the replications (the desired half-range).
h 2 n*  n ( ) h*
Number of replications calculus.
2.512 2 n*  10( )  24.04 162 .
More replications…
Rèplica
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Mesura de rendiment
33.020 29.472 27.693 31.803 30.604 33.227 28.085 35.910 30.729 30.844 32.420 39.040 32.341 34.310 28.418
New mean and variance
X  32.1094 S  3.1903
New self-confidence interval

In that case is enough, but the process can be iterative.
h = t1 2,n 1
S
n
h = 1.3144 < 1.62
Replications
Methods to execute the replications.
Kind of simulations


Finite simulations: Simulations where a condition defines the end of the execution. Usually time. No finite simulations: Simulations without this condition.
Independent repetitions


From the same initial state of the model, that means, with the same parameterizations and behavior, only random numbers to be used un the GAV are changed. This different RNG allows test again and again the new system with the different possible values of the variables that are not controlled (random variables).
Independent repetitions
Independent repetitions
Independent repetitions
Batch means

Execute a long simulation and then divide it in different blocks, or execution bags.

We work with the mean values of these observations.


Each one of these observations are considered as independent. Is desirable to determine what must be the required long of each one of these execution blocks, to assure the correctness of the experiment.
Batch means
Regenerative methods


If the variables observed in the execution of the simulation model, represents, in some way a cyclical restart, that allows suppose the existence of cycles (in the life of the variable). Is likely to consider each one of theses cycles as a replication This method is not always applicable. Depends on the existence of cycles in the variables. Also the longitude of this replications must be small; if the longitude of this cycles is big we obtain a small sum of replications.
Regenerative methods
Regenerative methods
Regenerative methods
Applicability
Finite simulations No finite simulations Loading period needed Independent repetitions Independent repetitions
Loading period unneeded
Independent repetitions erasing the loading period/ Batch means
Batch means
Experimental design
Factorial designs Variance reduction techniques
No factorial designs

To fix two factors and modify all the levels of a third until find a good solution. Fixing this level, start the exploration for the other factors. Efecte d'A: A1B0-A0B0. Efecte de B: A0B1-A0B0
A1B0
 
A0B0
A0B1
Factorial designs
 
Take in conseidaration the interactions. A1B0  A1B1 A0B0  A0B1 A effect: 
2 2

B effect:
A1 B1  A0 B1 A0 B0  A1 B0  2 2
A1B0
A1B1
A0B0
A0B1
Factorial designs
   
Controlling “k” factors. “l” levels for each factor (“li” levels for the I factor). l1·l2·…·lk experiments The easiest factorial design is the 2k with li = 2 i = 1,..,k.
2k factorial designs
Advantages  Determination of the tendency with experiments economy (smoothness).  Possibility to evolve to composite designs (local exploration).  Basis for factorial fractional designs (rapid vision of multiple factors).  Easy analysis and interpretation.
2k Matrix
Experimen t 1 2 3 4 5 Factor 1 + + Factor 2 + …. Factor k Respost a R1 R2 R3 R4 R5
6
2k
+
+
+
+
+
R6
R2k
2k Matrix example
Experimen t 1 2 3 4 5 A + + B + + C + Resposta 60 72 54 68 52
6
7 8
+
+
+ +
+
+ +
83
45 80
Interactions for 2 and 3 factors
y1  y3  y6  y8 y2  y4  y5  y7 AC    10 4 4
y21  y3  y5  y8 y1  y4  y6  y7 ABC    05 . 4 4
Effects calculus example
Main effect  y   y 
72  68  83  80 60  54  52  45 A   23 4 4 54  68  45  80 60  72  52  83 B   5 4 4
52  83  45  80 60  72  54  68 C   15 . 4 4
Frank Yates

A pioneer of the Operation research of the s.XX.
Yates algorithm
To make systematic the interactions calculus using a table.  Add the answer in the column “i” in the standard form of the matrix of the experimental design.  Add auxiliary columns as factors exists.  Add a new column dividing the first value of the last auxiliary column by the number of experimental conditions “E”, and the others by the half of “E”.
Yates algorithm


In the last column the first value is the mean of the answers, the last values are the effects. The correspondence between the values and effects is done through localize the + values in the corresponding rows of the matrix. A value with a single + in the B column is representing the principal effect of B. A row wit two + on A and C corresponds to the interaction of AC, etc.
Yates algorithm
Resp. (1) (2) (3) /8 /4 /4 /4 /4 /4 /4 /4 Efectes Mitjana A B AB C AC BC ABC
X Y
X+Y Y-X
Yates algorithm example
Exp. A B C Resp (1) (2) (3) div. efecte Id
1 2 3 4 5 6
+ + +
+ + -
+ +
60 72 54 68 52 83
132 122 135 125 12 14
254 260 26 66 -10 -10
514 92 -20 6 6 40
8 4 4 4 4 4
64.25 23.0 -5.0 1.5 1.5 10.0
Mitja A B AB C AC
7
8
+
+
+
+
+
45
80
31
35
2
4
0
2
4
4
0.0
0.5
BC
ABC
Wooden industry example

Wooden industry that allows to reduce the cost. 4 variables to consider


Change the light to natural light (open the ceiling). Increase the speed of the machines. Increase the lubricant use. Increase the working space.

 
Wooden industry example
Comb. (1) a b ab c ac bc abc d ad bd abd cd acd bcd abcd 1 + + + + + + + + 2 + + + + + + + + 3 + + + + + + + + 4 + + + + + + + + Increase the working space. Increase the useof lubricant Natural light Increase the speed of the machines Description obs. 71 61 90 82 68 61 87 80 61 50 89 83 59 51 85 78
Wooden industry example
Comb.
(1) a b ab c ac bc abc d ad bd abd cd acd bcd abcd
obs.
71 61 90 82 68 61 87 80 61 50 89 83 59 51 85 78
1
2
3
4
Efecte
Descripció
Wooden industry example
Comb.
(1) a b ab c ac bc abc d ad bd abd cd acd bcd abcd
obs.
71 61 90 82 68 61 87 80 61 50 89 83 59 51 85 78
1
132 172 129 167 111 172 110 163 -10 -8 -7 -7 -11 -6 -8 -7
2
304 296 283 273 -18 -14 -17 -15 40 38 61 53 2 0 5 1
3
600 556 -32 -32 78 114 2 6 -8 -10 4 2 -2 -8 -2 -4
4
1156 -64 192 8 -18 6 -10 -6 -44 0 36 4 -2 -2 -6 -2
Efects
72,25 -8 24 1 -2,25 0,75 -1,25 -0,75 -5,5 0 4,5 0,5 -0,25 -0,25 -0,75 -0,25
Description
Mean A B AB C AC BC ABC D AD BD ABD CD ACD BCD ABCD
Variance reduction techniques
Reduce the number of replications
Motivation
 
Interest to reduce the variability introduced in the answer variable due to the use of RNG. The value that estimates an specific answer variable, that is represented by its confidence interval, must be adjusted (as possible).
(x  k s
n
,xk s
n
)
Motivation

Obviously, increasing n, that is the number of observations, the standard error decreases. Variance reduction techniques try to reduce this variability without the need of increase the number of observations.
s
n
Common random numbers



Using the same random number stream for the different configurations. Both streams represents “identical conditions” for both configurations. Is needed to establish mechanism to synchronize the streams.
Antithetic variables



Use of antithetic values o the random numbers stream used. In the first execution the random numbers used can be (a, b, c, ..)  [0,1). In the second execution we use it’s antithetic values, that means (1-a, 1-b, 1-c, ..)  [0,1). Is needed to establish a synchronization method between both streams
Control variables

Simulation allows the observation of the system evolution during the execution of the experiment. This allows, in certain grade, to compare the values of the answer variables with the observed values. We can add modification to reduce the difference.
Analysis of the results in simulation
Comparison of two configurations of the system. Equal variance test.
Comparison of two configurations with equal variances.
Comparison of two configurations with equal variances.

We define the hypothesis test:
A = B  H1: A > B
 H0:

Thanks the central limit theorem we obtain that:
y A  N (A ,
A
nA
)
y B  N ( B ,
B
nB
)
Comparison of two configurations with equal variances.

We can deduce that:
y A  y B  N ( A  B ,
2 A
nA

2 B
nB
)
( y A  y B )  ( A  B )

2 A
nA


2 B
 N (0,1)
nB
Comparison of two configurations with equal variances.

We define the test, and calculate s, the common sample variance:
( y A  y B )  ( A  A ) 1 1 s  n A nB
 tn

Where n=nA+nB-2
Comparison of two configurations with equal variances.

The test is defined as is shown:
y A  yB 1 1 s  n A nB

 t1 ,n
We reject H0 is this is true.
Example
Rèplica
1 2 3 4 5 6 7 8 9 10
Mesura del rendiment per A
24.3 25.6 26.7 22.7 24.8 23.8 25.9 26.4 25.8 25.4
Mesura del rendiment per B
24.4 21.5 25.1 22.8 25.2 23.5 22.2 23.5 23.3 24.7
Example

Mean of the sample.
 A=25.14;
B=23.62
A = B  H1: A > B
 H0:
Example

The standard deviation is:

A=1.242; B=1.237
2514  23.62 .  2.74  t0.05,18  1734 . 1 1 124 .  10 10

Reject H0
Two configurations comparison

If we cannot assume equal variances.
t'
( y A  y B )  ( A  B ) s s  n A nB
2 A 2 B
Two configurations comparison.


If nA = nB = n, the signification level is determined using as a reference distribution a t of Student with n-1 degrees of freedom. If nA  nB, with the value calculated of t’ we can find different signification values pA and pB in the student distributions, with nA-1 and nB-1 degrees of freedom respectively.
Two configurations comparison

The signification level of the test:

with:
 A p A   B pB p  A  B
2 SA A  nA
2 SB B  nB
Equal variance test

Hypothesis test:
A2 = B2  H1: A2  B2
 H0:
S  Fn ,m S

2 A 2 B
F ofSnedecor
n
= nA - 1  m = nB-1.
Example
 
SA2 =1.54 SB2 = 2.18
S 2.18   142  F0.05,9 ,9  318 . . 154 . S
2 B 2 A

Accep H0
Example
 
SA2 =1.54 SB2 = 16.3
S 16.3   10.58  F0.05,9 ,9  318 . 154 . S

2 B 2 A
Discard H0

Published under a Creative Commons License By attribution, non-commercial, non-derivative

Pau Fonseca i Casas
Department of Statistics and Operations Research

Universitat Politècnica de Catalunya - BarcelonaTECH
North Campus - C5218 Room
Barcelona, 08034, SPAIN

Tel. (+34) 93 4017035
Fax. (+34) 93 4015855

LIAM