You are here

Statistics in Engineering: Correlation (Part 5 of 8)

Preface
This series is aimed at providing tools for an electrical engineer to gain confidence in the performance and reliability of their design. The focus is on applying statistical analysis to empirical results (i.e. measurements, data sets).

Introduction
This article will introduce the concept of correlation on a data set using the R Project software.

If you are not familiar with statistics or need a brush up I recommend Schaum's Statistics. It provides a good overview of material without a lot of time spent on proofs and lots of examples.

Concepts
Correlation: Correlation refers to any of a broad class of statistical relationships involving dependence.

Dependence: Dependence is any statistical relationship between two random variables or two sets of data.

Importing Your Data Set
I will use the R software package for statistical analysis. It is cross platform, free and open source. There are several Excel plugins which are good and if you have/can use SAS by all means use it.

The first row of your data set should be the titles for each column. Each column can contain anything but for building a distribution we can assume a single column with a row for each measurement.

NOTE: We are using the same dataset as teh DAC experiment in a previous article. The following assumes we are testing an implementation of a 5V 8-bit DAC on a PCB. The data set contains two samples. In a real product I would probably want many samples to build a distribution.

column1: DAC input value
column2: perfect DAC output
column3: sample1
column4: sample2

> dac_out<-read.csv(file.choose())
> dim(dac_out)
[1] 256 4
> tail(dac_out)
dac.level perf sample1 sample2
251 251 4.902344 5.042472 5.463848
252 252 4.921875 5.099334 5.492536
253 253 4.941406 5.052925 5.479490
254 254 4.960938 5.118694 5.505745
255 255 4.980469 5.082373 5.524927
256 256 5.000000 5.124489 5.510183
> str(dac_out)
'data.frame': 256 obs. of 4 variables:
$ dac.level: int 1 2 3 4 5 6 7 8 9 10 ...
$ perf : num 0.0195 0.0391 0.0586 0.0781 0.0977 ...
$ sample1 : num 0.164 0.146 0.175 0.221 0.271 ...
$ sample2 : num -0.437 -0.438 -0.346 -0.326 -0.369 ...
> attach(dac_out)

Calculating the Correlation Coefficient
We use the cor function.

> cor(dac.level,perf)
[1] 1
> cor(dac.level,sample1)
[1] 0.9997917
> cor(dac.level,sample2)
[1] 0.9998707

Confidence Interval of Correlation Coefficient
tbd

Next Up
Next article will demonstrate how to use analysis of variance to examine the effects of multiple variables on an output.