images/blog-posts

数据分析・资料的载入与导出

返回教程主页

上篇 数据分析・数据可视化

现在我们来了解如何在Python中进行数据的载入和导出操作,我们以Json和CSV这两个常见的文本数据类型进行讲解。

JSON数据的载入与导出

python的json模块提供了一种很简单的方式来编码和解码JSON数据。 其中两个主要的函数是json.dumps()json.loads(),下面演示如何将一个Python数据结构转换为JSON:

import json

data = {'author': 'kk', 'level': 10, 'is old man': True}
jsonStr = json.dumps(data)
print(type(jsonStr), jsonStr)

data = json.loads(jsonStr)
print(type(data), data)

运行后输出:

<class 'str'> {"author": "kk", "level": 10, "is old man": true}
<class 'dict'> {'author': 'kk', 'level': 10, 'is old man': True}

json.dumps用于将python对象「通常为字典,列表」转换为json格式的字符串;json.loads将json格式的字符串解析成python对象。

几个常见的python类型与json类型的互换对照:

pythonjson
Nonenull
Truetrue
Falsefalse
intnumbers
floatnumbers
strstring
listarray
dictobject

如果想要从磁盘文件载入json数据,或者将数据导出为json格式的磁盘文件可以使用json.dump()json.load():

import json


data = {'author': 'kk', 'level': 10, 'is old man': True}
print(data)

with open('myjson.json', 'w') as f:
    json.dump(data, f)

with open('myjson.json', 'r') as f:
    data = json.load(f)
    print(data)

CSV数据的读取

对于大多数的CSV格式的数据读写问题,都可以使用python的csv库。例如:假设你在一个名叫stocks.csv文件中有一些股票市场数据,就像这样:

Symbol,Price,Date,Time,Change,Volume
"AA",39.48,"6/11/2007","9:36am",-0.18,181800
"AIG",71.38,"6/11/2007","9:36am",-0.15,195500
"AXP",62.58,"6/11/2007","9:36am",-0.46,935000
"BA",98.31,"6/11/2007","9:36am",+0.12,104800
"C",53.08,"6/11/2007","9:36am",-0.25,360900
"CAT",78.29,"6/11/2007","9:36am",-0.23,225400
import csv


with open('./stocks.csv', 'r') as f:
    c = csv.reader(f, delimiter=',')
    header = next(c)
    print(header)
    for row in c:
        print(row)

函数csv.reader可以接收文件句柄并进行读取,通过delimiter参数设定分隔符,该函数返回一个可迭代的对象,一般先使用next函数抽取表头信息,然后在一个for语句中遍历每一行数据,每一行的数据将以列表的进行展现,就像下面这样:

['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
['AA', '39.48', '6/11/2007', '9:36am', '-0.18', '181800']
['AIG', '71.38', '6/11/2007', '9:36am', '-0.15', '195500']
['AXP', '62.58', '6/11/2007', '9:36am', '-0.46', '935000']
['BA', '98.31', '6/11/2007', '9:36am', '+0.12', '104800']
['C', '53.08', '6/11/2007', '9:36am', '-0.25', '360900']
['CAT', '78.29', '6/11/2007', '9:36am', '-0.23', '225400']

通过观察可以发现解析后的数据统一为字符串类型,这并不符合原本数据中的描述,可以额外再做数据转换处理:

import csv


types = (str, float, str, str, float, int)

with open('./stocks.csv', 'r') as f:
    c = csv.reader(f, delimiter=',')
    header = next(c)
    print(header)
    for row in c:
        row = [types[i](v) for i, v in enumerate(row)]
        print(row)

运行后输出:

['Symbol', 'Price', 'Date', 'Time', 'Change', 'Volume']
['AA', 39.48, '6/11/2007', '9:36am', -0.18, 181800]
['AIG', 71.38, '6/11/2007', '9:36am', -0.15, 195500]
['AXP', 62.58, '6/11/2007', '9:36am', -0.46, 935000]
['BA', 98.31, '6/11/2007', '9:36am', 0.12, 104800]
['C', 53.08, '6/11/2007', '9:36am', -0.25, 360900]
['CAT', 78.29, '6/11/2007', '9:36am', -0.23, 225400]

继续学习

实际上python内置的json与csv模块还有很多细节操作值得去学习,感兴趣的朋友可以去到python官方文档页面进行详细的了解。

下篇 数据分析・矩阵计算

SUBSCRIBE


🔒 No spam. Unsubscribe any time.

About kk

kk

Vincenzo Antedoro is an engineer who helps those who want to invest in renewables. For the rest he enjoys teaching with the method of learning by doing..

» More about kk