Preface
This series is aimed at providing tools for an electrical engineer to gain confidence in the performance and reliability of their design. The focus is on applying statistical analysis to empirical results (i.e. measurements, data sets).
Introduction
This article will demonstrate analysis of variance to analyze the results of a design of experiment (DOE).
If you are not familiar with statistics or need a brush up I recommend Schaum's Statistics. It provides a good overview of material without a lot of time spent on proofs and lots of examples.
Concepts
Analysis of Variance: A method of analysis to analyze the difference between groups of means. It is used to explain observations when multiple variables are involved.
Design of Experiment: I won't go into great detail here, but a DOE is a carefully constructed experiment consisting of a set of fixed (environmental) and variable inputs (design variables). The output is measured over the range of the input variables. Statistical analysis is used to determine the effects of the variable inputs on the output.
F-Statistic: The F-statistic can be thought of as the ratio of the "explainable variance" to the "unexplainable variance". Thus a high F-statistic means that the input variable is highly correlated to the output variable. If the output varies considerably but not with respect to the input variable it means that the output variation is explained by a different input variable. In other words: you haven't root caused your circuit yet.
Design of Experiment Setup
In this experiment we are examining the effects of two capacitors and one resistor on the area under the curve of some scope captures.
All the testing was completed on one board, on the same lab bench, in the same day. All equipment was calibrated and probe attach was verified as not significantly loading the circuit. Software and register settings were identical for all tests. The output variable was taken several hundred or thousand times and averaged. This eliminates most variance in the fixed variables and any output should be attributed to the component value changes only.
Importing Your Data Set
I will use the R software package for statistical analysis. It is cross platform, free and open source. There are several Excel plugins which are good and if you have/can use SAS by all means use it.
The first row of your data set should be the title for each column.
> doe<-read.csv(file.choose())
> dim(doe)
[1] 17 4
> tail(doe)
cap1 cap2 res1 area_under_curve
12 100 100000 200 5.01e-10
13 1000 100000 10 7.83e-10
14 1000 100000 33 7.60e-10
15 1000 100000 200 5.96e-10
16 4700 100000 200 5.38e-10
17 4700 100000 540 4.45e-10
> str(doe)
'data.frame': 17 obs. of 4 variables:
$ cap1 : int 100 100 100 360 360 360 1000 1000 1000 100 ...
$ cap2 : int 10000 10000 10000 10000 10000 10000 10000 10000 10000 100000 ...
$ res1 : int 10 33 200 10 33 200 10 33 200 10 ...
$ area_under_curve: num 7.56e-10 7.19e-10 5.14e-10 7.75e-10 7.71e-10 5.65e-10 7.99e-10 7.62e-10 5.88e-10 7.13e-10 ...
> attach(doe)
Running the Analysis of Variance
Now that the test results and input variables have been imported let's perform the anova and see if any of the input variables had an effect on the output.
> doeaov.cap1=aov(area_under_curve~cap1)
> summary(doeaov.cap1)
Df Sum Sq Mean Sq F value Pr(>F)
cap1 1 4.859e-20 4.859e-20 4.18 0.0589 .
Residuals 15 1.743e-19 1.162e-20
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> doeaov.cap2=aov(area_under_curve~cap2)
> summary(doeaov.cap2)
Df Sum Sq Mean Sq F value Pr(>F)
cap2 1 1.815e-20 1.815e-20 1.329 0.267
Residuals 15 2.048e-19 1.365e-20
> doeaov.res1=aov(area_under_curve~res1)
> summary(doeaov.res1)
Df Sum Sq Mean Sq F value Pr(>F)
res1 1 1.780e-19 1.78e-19 59.37 1.36e-06 ***
Residuals 15 4.497e-20 3.00e-21
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We can see here that the F-statistic for the capacitors are both fairly small, but the F-statistic for the resistor is quite large. Depending on how you look at it, if you want to vary the output I would choose that component rather than the other two or if you are trying to limit the output variable then I would tightly control that component.
Also in R we are given the significance value for hypothesis testing and/or confidence levels.
A slightly different anova can be run by multiplying all the input variables together. If we do that then the output looks like this:
> doeaov.combined=aov(area_under_curve~(cap1*cap2*res1))
> summary(doeaov.combined)
Df Sum Sq Mean Sq F value Pr(>F)
cap1 1 4.859e-20 4.859e-20 30.158 0.000384 ***
cap2 1 2.770e-21 2.770e-21 1.717 0.222563
res1 1 1.350e-19 1.350e-19 83.768 7.44e-06 ***
cap1:cap2 1 1.290e-21 1.290e-21 0.803 0.393421
cap1:res1 1 2.058e-20 2.058e-20 12.776 0.005984 **
cap2:res1 1 2.300e-22 2.300e-22 0.145 0.712142
cap1:cap2:res1 1 0.000e+00 0.000e+00 0.002 0.964582
Residuals 9 1.450e-20 1.610e-21
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here we can see again that the resistor is the primary cause of output variation but it turns out that one of the capacitors also has some effect on output, although not as much.
Next Up
Next article we will examine statistical process control in a manufacturing environment.