2024. 3. 5.

* Data transformation


: A function that maps the entire set of values of a given attribute to a new set of replacement values s.t. each old value can  be identified with one of the new value


- Smoothing : Remove noise from data

                         ex) binning, regressio and clustering


- Attribute/feature construction : new attributes constructed from the given ones


- Aggregation : Summarization

                            ex) the daily sales data may be aggregated so as to compute monthly and annual total amouts

                                  이 과정에서 일부 데이터를 잃게된다


- Normalization : The attribute data are scaled so as to fall within a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0


  1) min-max normalization : need clear prdefined range




  2) z-score normalization : 실제 최소값이나 최대값을 알 수 없거나 아웃라이어가 있을 때 유용함


  3) normalization by decimal scaling : 범위를 -1 과 1 사이의 값으로 지정해 주기 위해서 10^n 으로 나눠줌



- Discretization : The raw values of a numeric attribute are replaced by interval labels or conceptual labels

                              ex)  age -> (0-10),(11-20) or youth, adult and senior


