산업공학/Data Analytics

Data transformation

빕준 2024. 3. 5. 15:03
반응형

* Data transformation

 

: A function that maps the entire set of values of a given attribute to a new set of replacement values s.t. each old value can  be identified with one of the new value

 

- Smoothing : Remove noise from data

                         ex) binning, regressio and clustering

 

- Attribute/feature construction : new attributes constructed from the given ones

 

- Aggregation : Summarization

                            ex) the daily sales data may be aggregated so as to compute monthly and annual total amouts

                                  이 과정에서 일부 데이터를 잃게된다

 

- Normalization : The attribute data are scaled so as to fall within a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0

 

  1) min-max normalization : need clear prdefined range

 

 

 

  2) z-score normalization : 실제 최소값이나 최대값을 알 수 없거나 아웃라이어가 있을 때 유용함

 

  3) normalization by decimal scaling : 범위를 -1 과 1 사이의 값으로 지정해 주기 위해서 10^n 으로 나눠줌

 

 

- Discretization : The raw values of a numeric attribute are replaced by interval labels or conceptual labels

                              ex)  age -> (0-10),(11-20) or youth, adult and senior

반응형

'산업공학 > Data Analytics' 카테고리의 다른 글

Interestingness Measure: Correlation Lift  (0) 2024.03.25
ECLAT  (0) 2024.03.25
Data integration  (0) 2024.03.05
Data preprocessing  (0) 2024.03.05
FP Growth  (0) 2024.03.05