関数のコードは https://github.com/you1025/knn_feature_extraction/blob/master/knn_feature_extraction.R
Iris
library(tidyverse) df.iris <- iris %>% tibble::as_tibble() head(df.iris)
# | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
散布図で可視化
df.iris %>% ggplot(aes(Sepal.Length, Sepal.Width)) + geom_point(aes(colour = Species), alpha = 2/3)
k-NN Feature Extraction の適用
df.knn_d_columns <- df.iris %>% add_knn_d_columns(Species) head(df.knn_d_columns)
# | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | setosa_1 | versicolor_1 | virginica_1 |
---|---|---|---|---|---|---|---|---|
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa | 0.1000000 | 2.0904545 | 3.5916570 |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa | 0.1414214 | 1.9131126 | 3.4799425 |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa | 0.1414214 | 2.0856654 | 3.6083237 |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa | 0.1414214 | 1.9157244 | 3.4205263 |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa | 0.1414214 | 2.1424285 | 3.6166283 |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | 0.3316625 | 2.0566964 | 3.4263683 |
2 軸を変更しながら可視化
# setosa & versicolor g1 <- df.knn_d_columns %>% ggplot(aes(setosa_1, versicolor_1)) + geom_point(aes(colour = Species), alpha = 2/3) # versicolor & virginica g2 <- df.knn_d_columns %>% ggplot(aes(versicolor_1, virginica_1)) + geom_point(aes(colour = Species), alpha = 2/3) # virginica & setosa g3 <- df.knn_d_columns %>% ggplot(aes(virginica_1, setosa_1)) + geom_point(aes(colour = Species), alpha = 2/3) gridExtra::grid.arrange(g1, g2, g3)
どの組み合わせもいい感じに分離できそう。
3 次元プロットも試してみた
plotly::plot_ly( x = df.knn_d_columns$setosa_1, y = df.knn_d_columns$versicolor_1, z = df.knn_d_columns$virginica_1, type = "scatter3d", mode = "markers", color = df.knn_d_columns$Species, size = 0 )
いいね
0 件のコメント:
コメントを投稿