R_数据接口
ref
csv文件
读写csv
-
获取和设置工作目录
- **
getwd()
**函数检查R语言工作区指向的目录。 - **
setwd()
**函数设置新的工作目录。
- **
-
读取csv文件:
read.csv()
,将输出作为数据帧-
data <- read.csv("input.csv") print(data) # result id, name, salary, start_date, dept 1 1 Rick 623.30 2012-01-01 IT 2 2 Dan 515.20 2013-09-23 Operations 3 3 Michelle 611.00 2014-11-15 IT 4 4 Ryan 729.00 2014-05-11 HR 5 NA Gary 843.25 2015-03-27 Finance 6 6 Nina 578.00 2013-05-21 IT 7 7 Simon 632.80 2013-07-30 Operations 8 8 Guru 722.50 2014-06-17 Finance
-
-
分析csv文件
-
行列数
print(ncol(data)) print(nrow(data))
-
某列最大值
sal <- max(data$salary) print(sal)
-
特定条件筛选,类似SQL语句
subset(data, condition)
# Get the person detail having max salary. retval <- subset(data, salary == max(salary)) print(retval)
# Create a data frame. data <- read.csv("input.csv") retval <- subset( data, dept == "IT") print(retval)
-
-
写入csv文件
-
**
write.csv()
**函数用于创建csv
文件,并写入数据帧。 此文件在工作目录中创建。data <- read.csv("input.csv") retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01")) # Write filtered data into a new file. write.csv(retval,"output.csv")
-
默认写入行名,可以
write.csv(retval,"output.csv", row.names = FALSE)
去除
-
Excel文件
-
安装、导入xlsl包
-
install.packages("xlsx") # Verify the package is installed. any(grepl("xlsx",installed.packages())) # Load the library into R workspace. library("xlsx")
-
-
读入xlsl文件
-
# sheetIndex = 1表示读取第一个sheet data <- read.xlsx("input.xlsx", sheetIndex = 1) print(data)
-
-
写入xlsl文件
二进制文件
R语言有两个函数WriteBin()和readBin()来创建和读取二进制文件。
-
语法
writeBin(object, con) readBin(con, what, n )
- con 是读取或写入二进制文件的连接对象。
- object 是要写入的二进制文件。
- what - 是像字符,整数等代表字节模式被读取。
- n 是从二进制文件读取的字节数。
xml文件
-
安装xml包
install.packages("XML")
-
读取xml
-
使用函数**xmlParse()**读取。 它作为列表存储在R语言中。
# Load the package required to read XML files. library("XML") # Also load the other required package. library("methods") # Give the input file name to the function. result <- xmlParse(file = "input.xml") # Print the result. print(result)
-
获取根节点(是个列表)和文件节点数
# Exract the root node form the xml file. rootnode <- xmlRoot(result) # Find number of nodes in the root. rootsize <- xmlSize(rootnode)
-
解析文件第一条记录
print(rootnode[1])
-
获取节点的不同元素
# Get the second element of the third node. print(rootnode[[3]][[2]])
-
xml转数据帧
xmldataframe <- xmlToDataFrame("input.xml") print(xmldataframe)
-
-
写入xml
json文件
-
安装rjson包
install.packages("rjson")
-
读取json文件
# Load the package required to read JSON files. library("rjson") # Give the input file name to the function. result <- fromJSON(file = "input.json") # Print the result. print(result)
-
json转数据帧
-
使用**as.data.frame()**函数将上面提取的数据转换为R语言数据帧以进行进一步分析
# Load the package required to read JSON files. library("rjson") # Give the input file name to the function. result <- fromJSON(file = "input.json") # Convert JSON file to a data frame. json_data_frame <- as.data.frame(result) print(json_data_frame)
-
-
写入json文件
web数据
许多网站提供数据供其用户使用。
使用R语言程序,我们可以从这些网站以编程方式提取特定数据。
R语言中用于从网站中提取数据的一些包是“RCurl”,XML“和”stringr“,它们用于连接到URL,识别文件所需的链接并将它们下载到本地环境。
-
安装包
install.packages("RCurl") install.packages("XML") install.packages("stringr") install.packages("plyr")
-
demo[主要是获取link,再逐个下载]
- 使用函数getHTMLLinks()来收集文件的URL
- 使用函数downlaod.file()将文件保存到本地系统
# Read the URL. url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/" # Gather the html links present in the webpage. links <- getHTMLLinks(url) # Identify only the links which point to the JCMB 2015 files. filenames <- links[str_detect(links, "JCMB_2015")] # Store the file names as a list. filenames_list <- as.list(filenames) # Create a function to download the files by passing the URL and filename list. downloadcsv <- function (mainurl,filename) { filedetails <- str_c(mainurl,filename) download.file(filedetails,filename) } # Now apply the l_ply function and save the files into the current R working directory. l_ply(filenames,downloadcsv,mainurl = "http://www.geos.ed.ac.uk/~weather/jcmb_ws/")
数据库
数据是关系数据库系统以规范化格式存储。
-
安装包
install.packages("RMySQL")
-
连接mysql
-
R中创建一个连接对象以连接到数据库。 它使用用户名,密码,数据库名称和主机名作为输入。
-
使用dbConnect()函数创建连接对象
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost')
# Create a connection Object to MySQL database. # We will connect to the sampel database named "sakila" that comes with MySql installation. mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost') # List the tables available in this database. dbListTables(mysqlconnection)
-
-
查询
-
使用函数dbSendQuery()查询MySql中的数据库表。
result = dbSendQuery(mysqlconnection,SQL)
-
使用R语言fetch()函数返回结果集
data.frame = fetch(result, n = rows_display) # -1表示所有
-
最后,它被存储为R语言中的数据帧。
# Query the "actor" tables to get all the rows. result = dbSendQuery(mysqlconnection, "select * from actor") # Store the result in a R data frame object. n = 5 is used to fetch first 5 rows. data.frame = fetch(result, n = 5) print(data.frame)
-
-
更新
-
将更新查询传递给**dbSendQuery()**函数来更新Mysql表中的行。
dbSendQuery(mysqlconnection, "update mtcars set disp = 168.5 where hp = 110")
-
-
插入
dbSendQuery(mysqlconnection, "insert into mtcars(row_names, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb) values('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)" )
-
建表
-
使用函数dbWriteTable()创建表。 如果表已经存在,它将覆盖该表,并将数据帧用作输入。
# Create the connection object to the database where we want to create the table. mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost') # Use the R data frame "mtcars" to create the table in MySql. # All the rows of mtcars are taken inot MySql. dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)
-
-
删除表
dbSendQuery(mysqlconnection, 'drop table if exists mtcars')
转载:https://blog.csdn.net/qq_37774098/article/details/117369171