作业27

用到的包(如提示包不存在可以用install.packages("包名")的方式安装):

library('C50')
library('sjmisc')
library('sjstats')
library('ggplot2')
library('pROC')
library('dplyr') # 数据处理

练习1 数据导入

1.将数据集tele_train_small导入

2.命令训练集

3.命名数据集

train <- read.csv("/data/tele_train_small.csv", header = T)
test <- read.csv("/data/tele_train_small.csv", header = T)

练习2 查看数据

1.查看数据集的变量名

str(train)
'data.frame':    3333 obs. of  20 variables:
 $ state                        : Factor w/ 51 levels "AK","AL","AR",..: 17 36 32 36 37 2 20 25 19 50 ...
 $ account_length               : int  128 107 137 84 75 118 121 147 117 141 ...
 $ area_code                    : Factor w/ 3 levels "area_code_408",..: 2 2 2 1 2 3 3 2 1 2 ...
 $ international_plan           : Factor w/ 2 levels "no","yes": 1 1 1 2 2 2 1 2 1 2 ...
 $ voice_mail_plan              : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 1 1 2 ...
 $ number_vmail_messages        : int  25 26 0 0 0 0 24 0 0 37 ...
 $ total_day_minutes            : num  265 162 243 299 167 ...
 $ total_day_calls              : int  110 123 114 71 113 98 88 79 97 84 ...
 $ total_day_charge             : num  45.1 27.5 41.4 50.9 28.3 ...
 $ total_eve_minutes            : num  197.4 195.5 121.2 61.9 148.3 ...
 $ total_eve_calls              : int  99 103 110 88 122 101 108 94 80 111 ...
 $ total_eve_charge             : num  16.78 16.62 10.3 5.26 12.61 ...
 $ total_night_minutes          : num  245 254 163 197 187 ...
 $ total_night_calls            : int  91 103 104 89 121 118 118 96 90 97 ...
 $ total_night_charge           : num  11.01 11.45 7.32 8.86 8.41 ...
 $ total_intl_minutes           : num  10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
 $ total_intl_calls             : int  3 3 5 7 3 6 7 6 4 5 ...
 $ total_intl_charge            : num  2.7 3.7 3.29 1.78 2.73 1.7 2.03 1.92 2.35 3.02 ...
 $ number_customer_service_calls: int  1 1 0 2 3 0 3 0 1 0 ...
 $ churn                        : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...

2.查看训练集有无缺失值

sum(is.na(train))
[1] 0

练习3 数据操作

1.剔除数据集的第1列变量

train_backup <- train[,-1]

2.合并训练集和测试集的数据框

total_data <- bind_rows(train,test)

练习4 图形绘制操作

  1. 画出合并的数据集中state各值出现频数的柱状图
barplot(table(train$state))

zuo-ye-26-1

results matching ""

    No results matching ""