Simply Explained

Attribute Construction

It is a type of data preprocessing method. It consists of applying some data transformation operations to the original attributes to create new attributes whose predictive power is greater than the original attributes.

In this method, we do not change the data or add new data, we just transform the existing data in a more understandable and presentable form. Example, changing from the day of the month to the day of the week to observe a pattern in the data.

Attribute construction can vary in difficulty based on the input given by the user on what kind of attribute he wants.

There are two attribute construction methods:

1. Principle Component Analysis (PCA)

Principle Component Analysis, also known as feature extraction method is a very popular attribute construction method. In this method, we construct new attributes that represent the same data that the original data represents but with fewer features or attributes.

PCA constructs new features that are based on linear combinations of original attributes. It captures a large amount of variance. Features are sorted in decreasing order of variance and only principal components are used to represent the data.


The major benefit of using this technique is that the processing time of the Data Mining algorithm is reduced without any significant drop in the prediction accuracy.


There is a large risk that the prediction accuracy might drop significantly.

2. Progressive Sampling:

Progressive Sampling is an iterative process. In this method, we start with a small sample and measure the accuracy of the discovered patterns. We repeat the process and increase the sample size little by little until we reach a point when we do not see any significant change in the accuracy.

The quality of data is very important to get the right results. No matter how good an algorithm is, it can never produce good results if the data provided is of low quality.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.