Preface
This series is aimed at providing tools for an electrical engineer to gain confidence in the performance and reliability of their design. The focus is on applying statistical analysis to empirical results (i.e. measurements, data sets).
Introduction
This article will show step by step how to determine if one variable is dependent on a second variable. This method is useful when you are counting data and presenting it in table form.
Concepts
Chi-Squared Distribution: It is a sum of the squares of a number of standard distributions. For more information see the Wikipedia article.
Contingency Test: We evaluate the Chi-Square for a table of counting data. The null hypothesis is always defined that the two variables are independent. The null hypothesis is then rejected if the p-value of the following Chi-squared test is less than a given level of significance.
Importing Your Data Set
I will use the R software package for statistical analysis. It is cross platform, free and open source. There are several Excel plugins which are good and if you have/can use SAS by all means use it.
We will test whether or not field failures are related to the use of two different designs. Since our data set is small we will enter our data manually:
> B=matrix(c(20,30,80,70),nrow=2,ncol=2)
> B
[,1] [,2]
[1,] 20 80
[2,] 30 70
Where "circuit A" had 20 failures in the field and "circuit B" had 30. 100 samples of each were taken.
Test Setup
H0: That field failures are independent of the circuit used. In other words there is no correlation between the circuit used and failures in the field.
Calculate Chi-Square
Calculate the Chi-Square for our data and then compare to a level of significance at 0.05:
> chisq.test(B)
Pearson's Chi-squared test with Yates' continuity correction
data: B
X-squared = 2.16, df = 1, p-value = 0.1416
As you can see the p-value is 0.1416. This is larger than our level of significance of 0.05 therefore we can conclude that our hypothesis is true: there is no correlation between failures and the circuit used in the field.
What if the difference had been more dramatic:
> B=matrix(c(20,60,80,40),nrow=2,ncol=2)
> B
[,1] [,2]
[1,] 20 80
[2,] 60 40
> chisq.test(B)
Pearson's Chi-squared test with Yates' continuity correction
data: B
X-squared = 31.6875, df = 1, p-value = 1.811e-08
Here we can see there is a strong case that the circuit used is correlated to field failures.
Next Up
In this article we compared counting data. What if our data is continuous or a scatter plot? The answer lies with correlation.