**Preface**

This series is aimed at providing tools for an electrical engineer to gain confidence in the performance and reliability of their design. The focus is on applying statistical analysis to empirical results (i.e. measurements, data sets).

**Introduction**

This article will introduce the concept of correlation on a data set using the R Project software.

If you are not familiar with statistics or need a brush up I recommend Schaum's Statistics. It provides a good overview of material without a lot of time spent on proofs and lots of examples.

**Concepts***Correlation:* Correlation refers to any of a broad class of statistical relationships involving dependence.

*Dependence:* Dependence is any statistical relationship between two random variables or two sets of data.

**Importing Your Data Set**

I will use the R software package for statistical analysis. It is cross platform, free and open source. There are several Excel plugins which are good and if you have/can use SAS by all means use it.

The first row of your data set should be the titles for each column. Each column can contain anything but for building a distribution we can assume a single column with a row for each measurement.

NOTE: We are using the same dataset as teh DAC experiment in a previous article. The following assumes we are testing an implementation of a 5V 8-bit DAC on a PCB. The data set contains two samples. In a real product I would probably want many samples to build a distribution.

column1: DAC input value

column2: perfect DAC output

column3: sample1

column4: sample2

> dac_out<-read.csv(file.choose())

> dim(dac_out)

[1] 256 4

> tail(dac_out)

dac.level perf sample1 sample2

251 251 4.902344 5.042472 5.463848

252 252 4.921875 5.099334 5.492536

253 253 4.941406 5.052925 5.479490

254 254 4.960938 5.118694 5.505745

255 255 4.980469 5.082373 5.524927

256 256 5.000000 5.124489 5.510183

> str(dac_out)

'data.frame': 256 obs. of 4 variables:

$ dac.level: int 1 2 3 4 5 6 7 8 9 10 ...

$ perf : num 0.0195 0.0391 0.0586 0.0781 0.0977 ...

$ sample1 : num 0.164 0.146 0.175 0.221 0.271 ...

$ sample2 : num -0.437 -0.438 -0.346 -0.326 -0.369 ...

> attach(dac_out)

**Calculating the Correlation Coefficient**

We use the cor function.

> cor(dac.level,perf)

[1] 1

> cor(dac.level,sample1)

[1] 0.9997917

> cor(dac.level,sample2)

[1] 0.9998707

**Confidence Interval of Correlation Coefficient**

tbd

**Next Up**

Next article will demonstrate how to use analysis of variance to examine the effects of multiple variables on an output.

Tags:

## Add new comment