* Data transformation
: A function that maps the entire set of values of a given attribute to a new set of replacement values s.t. each old value can be identified with one of the new value
- Smoothing : Remove noise from data
ex) binning, regressio and clustering
- Attribute/feature construction : new attributes constructed from the given ones
- Aggregation : Summarization
ex) the daily sales data may be aggregated so as to compute monthly and annual total amouts
이 과정에서 일부 데이터를 잃게된다
- Normalization : The attribute data are scaled so as to fall within a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0
1) min-max normalization : need clear prdefined range

2) z-score normalization : 실제 최소값이나 최대값을 알 수 없거나 아웃라이어가 있을 때 유용함

3) normalization by decimal scaling : 범위를 -1 과 1 사이의 값으로 지정해 주기 위해서 10^n 으로 나눠줌
- Discretization : The raw values of a numeric attribute are replaced by interval labels or conceptual labels
ex) age -> (0-10),(11-20) or youth, adult and senior
'산업공학 > Data Analytics' 카테고리의 다른 글
Interestingness Measure: Correlation Lift (0) | 2024.03.25 |
---|---|
ECLAT (0) | 2024.03.25 |
Data integration (0) | 2024.03.05 |
Data preprocessing (0) | 2024.03.05 |
FP Growth (0) | 2024.03.05 |