R语言数据接口（下载、读取、写入）

2021-06-02 18:30 423人阅读评论(0)

文章目录

R_数据接口
- ref

R_数据接口

ref

w3school_R_数据接口

csv文件

读写csv

获取和设置工作目录
- **getwd()**函数检查R语言工作区指向的目录。
- **setwd()**函数设置新的工作目录。

读取csv文件：read.csv()，将输出作为数据帧

data <- read.csv("input.csv")
print(data)

# result
      id,   name,    salary,   start_date,     dept
1      1    Rick     623.30    2012-01-01      IT
2      2    Dan      515.20    2013-09-23      Operations
3      3    Michelle 611.00    2014-11-15      IT
4      4    Ryan     729.00    2014-05-11      HR
5     NA    Gary     843.25    2015-03-27      Finance
6      6    Nina     578.00    2013-05-21      IT
7      7    Simon    632.80    2013-07-30      Operations
8      8    Guru     722.50    2014-06-17      Finance

分析csv文件

行列数
```
print(ncol(data))
print(nrow(data))
```
某列最大值
```
sal <- max(data$salary)
print(sal)
```

特定条件筛选,类似SQL语句

subset(data, condition)

# Get the person detail having max salary.
retval <- subset(data, salary == max(salary))
print(retval)

# Create a data frame.
data <- read.csv("input.csv")

retval <- subset( data, dept == "IT")
print(retval)

写入csv文件
- **write.csv()**函数用于创建csv文件，并写入数据帧。此文件在工作目录中创建。
```
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered data into a new file.
write.csv(retval,"output.csv")
```
- 默认写入行名，可以write.csv(retval,"output.csv", row.names = FALSE)去除

Excel文件

安装、导入xlsl包

install.packages("xlsx")

# Verify the package is installed.
any(grepl("xlsx",installed.packages()))

# Load the library into R workspace.
library("xlsx")

读入xlsl文件

# sheetIndex = 1表示读取第一个sheet
data <- read.xlsx("input.xlsx", sheetIndex = 1)
print(data)

写入xlsl文件

二进制文件

R语言有两个函数WriteBin()和readBin()来创建和读取二进制文件。

语法
```
writeBin(object, con)
readBin(con, what, n )
```
- con 是读取或写入二进制文件的连接对象。
- object 是要写入的二进制文件。
- what - 是像字符，整数等代表字节模式被读取。
- n 是从二进制文件读取的字节数。

xml文件

安装xml包
```
install.packages("XML")
```

读取xml

使用函数**xmlParse()**读取。它作为列表存储在R语言中。

# Load the package required to read XML files.
library("XML")

# Also load the other required package.
library("methods")

# Give the input file name to the function.
result <- xmlParse(file = "input.xml")

# Print the result.
print(result)

获取根节点(是个列表)和文件节点数

# Exract the root node form the xml file.
rootnode <- xmlRoot(result)

# Find number of nodes in the root.
rootsize <- xmlSize(rootnode)

解析文件第一条记录
```
print(rootnode[1])
```

获取节点的不同元素

# Get the second element of the third node.
print(rootnode[[3]][[2]])

xml转数据帧

xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)

写入xml

json文件

安装rjson包
```
install.packages("rjson")
```

读取json文件

# Load the package required to read JSON files.
library("rjson")

# Give the input file name to the function.
result <- fromJSON(file = "input.json")

# Print the result.
print(result)

json转数据帧

使用**as.data.frame()**函数将上面提取的数据转换为R语言数据帧以进行进一步分析

# Load the package required to read JSON files.
library("rjson")

# Give the input file name to the function.
result <- fromJSON(file = "input.json")

# Convert JSON file to a data frame.
json_data_frame <- as.data.frame(result)

print(json_data_frame)

写入json文件

web数据

许多网站提供数据供其用户使用。

使用R语言程序，我们可以从这些网站以编程方式提取特定数据。

R语言中用于从网站中提取数据的一些包是“RCurl”，XML“和”stringr“，它们用于连接到URL，识别文件所需的链接并将它们下载到本地环境。

安装包

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("plyr")

demo[主要是获取link，再逐个下载]

使用函数getHTMLLinks()来收集文件的URL
使用函数downlaod.file()将文件保存到本地系统

# Read the URL.
url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/"

# Gather the html links present in the webpage.
links <- getHTMLLinks(url)

# Identify only the links which point to the JCMB 2015 files. 
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.
filenames_list <- as.list(filenames)

# Create a function to download the files by passing the URL and filename list.
downloadcsv <- function (mainurl,filename) {
   filedetails <- str_c(mainurl,filename)
   download.file(filedetails,filename)
}

# Now apply the l_ply function and save the files into the current R working directory.
l_ply(filenames,downloadcsv,mainurl = "http://www.geos.ed.ac.uk/~weather/jcmb_ws/")

数据库

数据是关系数据库系统以规范化格式存储。

安装包
```
install.packages("RMySQL")
```

连接mysql

R中创建一个连接对象以连接到数据库。它使用用户名，密码，数据库名称和主机名作为输入。

使用dbConnect()函数创建连接对象

mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost')

# Create a connection Object to MySQL database.
# We will connect to the sampel database named "sakila" that comes with MySql installation.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', host = 'localhost')

# List the tables available in this database.
 dbListTables(mysqlconnection)

查询

使用函数dbSendQuery()查询MySql中的数据库表。
```
result = dbSendQuery(mysqlconnection,SQL)
```

使用R语言fetch()函数返回结果集

data.frame = fetch(result, n = rows_display) # -1表示所有

最后，它被存储为R语言中的数据帧。

# Query the "actor" tables to get all the rows.
result = dbSendQuery(mysqlconnection, "select * from actor")

# Store the result in a R data frame object. n = 5 is used to fetch first 5 rows.
data.frame = fetch(result, n = 5)
print(data.frame)

更新
- 将更新查询传递给**dbSendQuery()**函数来更新Mysql表中的行。
```
dbSendQuery(mysqlconnection, "update mtcars set disp = 168.5 where hp = 110")
```

插入

dbSendQuery(mysqlconnection,
   "insert into mtcars(row_names, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb)
   values('New Mazda RX4 Wag', 21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)"
)

建表

使用函数dbWriteTable()创建表。如果表已经存在，它将覆盖该表，并将数据帧用作输入。

# Create the connection object to the database where we want to create the table.
mysqlconnection = dbConnect(MySQL(), user = 'root', password = '', dbname = 'sakila', 
   host = 'localhost')

# Use the R data frame "mtcars" to create the table in MySql.
# All the rows of mtcars are taken inot MySql.
dbWriteTable(mysqlconnection, "mtcars", mtcars[, ], overwrite = TRUE)

删除表

dbSendQuery(mysqlconnection, 'drop table if exists mtcars')

转载：https://blog.csdn.net/qq_37774098/article/details/117369171

查看评论

飞道的博客

飞道的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章