"Principal component analysis applied for outlier location and missing value problem with surface wind and pressure data"

Carlos López and Elías Kaplan

Facultad de Ingeniería

Centro de Cálculo

CC 30

Montevideo, URUGUAY


ABSTRACT

The techniques employed in the treatment of an hourly surface wind database during the development and calibration phases of a wind field model are presented. The model itself have been applied to estimate the regional wind energy resource creating a layer in a GIS environment. Any model is affected to some extent by both random and sistematic errors (outliers) in the input data. So it is advisable to remove them prior to use the data bank, while keeping at lowest the required effort. For this case, some different methodologies have been applied. The most succesfull was based in Principal Component Analysis (PCA). It was able to locate outliers with an associated type I and II errors of 49.16% and 6.44%, respectively, in a single step. The methodology is liable to be used in real time, involving minimum computer resources. For the stages described here, only errors coming from manually digitizing are considered. However, it is suggested that PCA may help in detecting random errors from the observer himself, and also some kind of sistematic errors, all of which is still in an investigation phase.

The outlier detection phase is presented in one paper, and the different techniques applied in order to imputate the missing values are described in the other. The comparative results obtained with an hourly record of 15 years long are presented. It has been simulated numerically two different problems: systematic missing values (i.e. at fixed hours) and non systematic ones. Three different criteria were applied: imputation with the historical mean value; linear time interpolation within single station records; optimum interpolation (kriging) and the two newly developed penalty of principal coefficients and linear time interpolation using all station records in a multivariate fashion, which proves to be the most accurate for wind.


If you are still interested in them:

Here you have: THE TECHNICAL REPORT ABOUT MISSING VALUES (.PS) (119KB) or in .PDF format (176KB)

and here: THE TECHNICAL REPORT ABOUT QUALITY CONTROL (.PS) (164KB) or in .PDF format (176KB)


carlos.lopez@ieee.org ; elias@fing.edu.uy