{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 文件读写操作"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"本节可能用到的文件: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[data.tar](images/ch8/data.tar) \n",
"[9.1 出塞.txt](images/ch8/9.1 出塞.txt) \n",
"[Who Moved My Cheese.txt](images/ch8/Who Moved My Cheese.txt) \n",
"[百家姓.txt](images/ch8/百家姓.txt) \n",
"[出塞.txt](images/ch8/出塞.txt) \n",
"[What is New in Python3.11.txt](images/ch8/What is New in Python3.11.txt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 文件读取方法 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python文件对象fr提供了三个读取数据的方法: \n",
"fr.read() \n",
"fr.readline() \n",
"fr.readlines() "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| 方法 | 描述 |\n",
"| :---- | :---- |\n",
"| read(size) | 无参数或参数为-1时,读取<font color='red'>全部文件内容为一个字符串</font>;当参数size为大于或等于0的整数时,读取size个字符。 |\n",
"| readline(size) | 无参数或参数为-1时,读取并返回文件对象中的<font color='red'>一行数据 </font>,包括行末结尾标记'\\\\n',字符串类型。当参数size为大于或等于0的整数时,从指针所在处向后最多读取当前行的前size个字符,当前行剩余字符少于size时,读取到行末。 |\n",
"| readlines(hint) | 无参数时,读取文件全部数据,返回一个<font color='red'>列表 </font>,列表中每个元素是文件对象中的一行数据,包括行末的换行符’\\\\n’。 当参数hint为大于或等于0的整数时,返回文件从头到第hint个字符所在的行末。 |\n",
"| seek(offset,whence) | 改变当前文件操作<font color='red'>指针 </font>指针的位置,offset为指针偏移量,whence代表参照物,有三个取值:0文件开始、1当前位置、2文件结尾。在文本文件(模式字符串未使用 b 时打开的文件)中,只允许相对于文件开头搜索。 |\n",
"| tell() | 返回文件指针当前的位置,返回值与编码有关。 |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在文件刚打开时,指针是指向文件内容的起始处,伴随着读写的进行指针一步一步往后移动。当指针移动到文件结尾后,再试图读取数据就没有返回值了。 \n",
"如果期望重新读取文件中的数据,可使用seek(0) 将文件读取指针移动到文件开始的位置。"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"seek(0) # 将文件指针移动到文件开始位置"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 实例 9.1 输出文件内容 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"有一个utf-8编码的文本文件“出塞.txt”,内容如下: \n",
" \n",
"出塞 \n",
"秦时明月汉时关,万里长征人未还。 \n",
"但使龙城飞将在,不教胡马度阴山。 \n",
"\n",
"编程读取文件的内容。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"访问文件里的数据,必须先用 open() 函数打开文件,只读取文件,不修改文件内容时,读取模式参数 mode 的值可设为 ‘r’ 或缺省。encoding='utf-8' 参数表示以“utf-8”编码方式处理数据"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[9.1 出塞.txt](images/ch8/9.1 出塞.txt)\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"出塞\n",
"秦时明月汉时关,万里长征人未还。\n",
"但使龙城飞将在,不教胡马度阴山。\n",
"\n"
]
}
],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" print(poem.read()) # 读文件为字符串输出\n",
" for line in poem: # 指针在文件末尾,后面无数据,读到空字符串\n",
" print(line.strip()) # 无输出"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"出塞\n",
"秦时明月汉时关,万里长征人未还。\n",
"但使龙城飞将在,不教胡马度阴山。\n",
"\n",
"出塞\n",
"秦时明月汉时关,万里长征人未还。\n",
"但使龙城飞将在,不教胡马度阴山。\n"
]
}
],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" print(poem.read()) # 读文件为字符串输出\n",
" poem.seek(0) # 将文件指针移动到文件开始位置\n",
" for line in poem: # 指针在文件开头\n",
" print(line.strip()) # 重新输出一次文件内容"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"read(size=-1) \n",
"\n",
"```\n",
"从文本文件中读取并返回最多 size 个字符,返回的数据类型为字符串\n",
"size 为负值或值是 None 时,从当前位置一直读取到文件结束(End Of File,EOF)\n",
"读取的字符包括标点符号和空格、换行符等空白字符。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['出塞\\n秦时明月汉时关,万里长征人未还。\\n']\n",
"但使龙城飞将在,不教胡马度阴山。\n",
"\n"
]
}
],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" print([poem.read(20)]) # 读取前20个字符,放入列表中输出是为了能看到换行符\n",
" \n",
" print(poem.read()) # 输出剩余字符"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"readline(size=-1) \n",
"\n",
"```\n",
"每次只读取一行数据,文件指针移动到下一行开始。\n",
"如果指定了 size ,将在当前行读取最多 size 个字符,本行剩余字符少于size时,读取到本行结束。 \n",
"字符串末尾保留换行符(\\n),空行使用 '\\n' 表示,该字符串只包含一个换行符。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" print([poem.read(11)]) # ['出塞\\n秦时明月汉时关,'],字符串末尾保留换行符\n",
" print(poem.readline(10)) # 万里长征人未还。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了更好的理解这个命令的用法,我们使用以下诗句:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" 出塞\n",
" 王昌龄(唐)\n",
"秦时明月汉时关,万里长征人未还。\n",
"但使龙城飞将在,不教胡马度阴山。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为了观察方便,将空格用中文数字代替:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"一二三四五六出塞\n",
"一二三四五六七八王昌龄(唐)\n",
"秦时明月汉时关,万里长征人未还。\n",
"但使龙城飞将在,不教胡马度阴山。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file = 'images/ch8/9.1 出塞.txt'\n",
"with open(file, 'r', encoding='utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" for i in range(1, 12): \n",
" print(i, [poem.readline(i)], poem.tell()) # 从当前行可读数据中读取前i个字符,指针后移"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"poem.tell()获取当前指针位置,从输出结果看,每次只读取当前行指针位置后面的若干个字符,最多不超过当前行。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这种方式让返回值清晰明确;只要 f.readline() 返回空字符串,就表示已经到达了文件末尾。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" while txt := poem.readline(): # 逐行读文件,直至文件结束\n",
" print(txt.strip()) # 去除行末的换行符后输出当前读到的字符串"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"readlines(hint=-1)\n",
"# 一次读取文件中所有数据行,文件指针一次性就移动到文件结尾处。\n",
"# readlines()方法自动将文件内容转成一个列表,列表中每个元素是文件对象中的一行数据。\n",
"# 可以指定 hint 来,读取的直到指定字符所在的行。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" print(poem.readlines()) # 输出全部内容放入列表\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem: # 打开文件创建文件对象,命名为poem\n",
" print(poem.readlines(4)) # 输出到第4个字符\"秦\"所在的行的数据, ['出塞\\n', '秦时明月汉时关,万里长征人未还。\\n']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 实例 9.2 提取文件中的英文 "
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"《谁动了我的奶酪?》是美国作家斯宾塞·约翰逊创作的一个寓言故事,该书首次出版于1998年。文件“Who Moved My Cheese.txt”中包含这个故事的中英文,提取并输出文件中的英文。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Who Moved My Cheese.txt](images/ch8/Who Moved My Cheese.txt)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_file(file):\n",
" \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
" 过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
" with open(file, 'r', encoding='utf-8') as novel:\n",
" txt = novel.read() # 读取文件全部内容为字符串\n",
" english_txt = ''\n",
" for x in txt:\n",
" if ord(x) < 256:\n",
" english_txt = english_txt + x # 过滤掉中文,拼接为字符串\n",
" return english_txt # 返回英文文本\n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" filename = 'images/ch8/Who Moved My Cheese.txt'\n",
" print(read_file(filename))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_file(file):\n",
" \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
" 过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
" with open(file, 'r', encoding='utf-8') as novel:\n",
" txt = novel.read() # 读取文件全部内容为字符串\n",
" english_txt = ''.join([x for x in txt if ord(x) < 256]) # 过滤掉中文,拼接为字符串\n",
" return english_txt # 返回英文文本\n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" filename = 'images/ch8/Who Moved My Cheese.txt'\n",
" print(read_file(filename))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<font face='楷体' color='red' size=5> 练一练 </font>"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"修改上面的代码,统计小说《谁动了我的奶酪?》中的英文单词数量?\n",
"提示:\n",
"将英文文本中的符号替换为空格后,根据空格切分为列表,列表的长度即单词的数量"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_file(file):\n",
" \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
" 过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
" with open(file, 'r', encoding='utf-8') as novel:\n",
" txt = novel.read() # 读取文件全部内容为字符串\n",
" english_txt = ''.join([x for x in txt if ord(x) < 256]) # 过滤掉中文,拼接为字符串\n",
" return english_txt # 返回英文文本\n",
"\n",
"\n",
"def words_lst(text):\n",
" \"\"\"接收字符串为参数,用空格替换字符串中所有标点符号,根据空格将字符串切分为列表\n",
" 返回值为元素为单词的列表。\n",
" \"\"\"\n",
" # 补充你的代码\n",
" \n",
" \n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" filename = 'images/ch8/Who Moved My Cheese.txt'\n",
" txt = read_file(filename)\n",
" print(len(words_lst(txt))) # 输出列表的长度"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<font face='楷体' color='red' size=5> 练一练 </font>"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"修改上面的代码,统计小说《谁动了我的奶酪?》中的英文单词的词频,输出出现频率最高的30个单词。\n",
"提示:\n",
"以单词为字典的键,以单词出现次数为字典的值,遍历列表,单词每出现一次词频增加1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_file(file):\n",
" \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
" 过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
" with open(file, 'r', encoding='utf-8') as novel:\n",
" txt = novel.read() # 读取文件全部内容为字符串\n",
" english_txt = ''.join([x for x in txt if ord(x) < 256]) # 过滤掉中文,拼接为字符串\n",
" return english_txt # 返回英文文本\n",
"\n",
"\n",
"def words_lst(text):\n",
" \"\"\"接收字符串为参数,用空格替换字符串中所有标点符号,根据空格将字符串切分为列表\n",
" 返回值为元素为单词的列表。\n",
" \"\"\"\n",
" # 补充你的代码\n",
"\n",
" \n",
" \n",
"\n",
"def word_frequency(words_ls):\n",
" \"\"\"接收元素为单词的列表,统计每个单词出现的次数,根据出现次数排序,输出出现频率最高的50个单词及其出现次数\"\"\"\n",
" # 补充你的代码\n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" filename = 'images/ch8/Who Moved My Cheese.txt'\n",
" txt = read_file(filename)\n",
" words_list = words_lst(txt) # 单词列表\n",
" word_frequency(words_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 实例 9.3 读百家姓获得姓的列表 "
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"赵钱孙李,周吴郑王。\n",
"冯陈褚卫,蒋沈韩杨。\n",
"......\n",
"巢关蒯相,查后荆红。\n",
"游竺权逯,盖益桓公。\n",
"\n",
"万俟司马,上官欧阳。\n",
"夏侯诸葛,闻人东方。\n",
"......\n",
"墨哈谯笪,年爱阳佟。\n",
"第五言福"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"提示:\n",
"百家姓前51行为单姓,51行以后为复姓,需要分别处理。\n",
"每行读取为字符串,去除换行符、逗号和句号拼接为一个字符串,再用list()转列表,每个字为一个元素\n",
"51行以后拼接为一个字符串,每次取两个字加入列表"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def read_txt(): # 根据性别生成一个姓名\n",
" \"\"\"无参数,读百家姓文件,将单字姓和复姓拆分,返回以姓为元素的列表\"\"\"\n",
" with open('images/ch8/百家姓.txt', 'r', encoding='utf-8') as data:\n",
" # 替换掉换行符、逗号和句号,每行转为一个字符串,得到列表\n",
" last_ls = [line.strip().replace(',', '').replace('。', '') for line in data]\n",
" # print(last_ls) # ['赵钱孙李周吴郑王', '冯陈褚卫蒋沈韩杨', ...]\n",
"\n",
" single = list(''.join(last_ls[:51])) # 前51行为单字姓,拼接为一个字符串转列表\n",
" # print(single) # ['赵', '钱', '孙', '李', '周', '吴', '郑', '王',...]\n",
" \n",
" double_txt = ''.join(last_ls[51:]) # 51行后为复姓,拼接为一个字符串\n",
" double = [] # 创建空列表\n",
" for i in range(0, len(double_txt), 2): # 遍历字符串序号,步长为2\n",
" double.append(double_txt[i: i + 2]) # 当前序号向后切2个字符,加入到列表\n",
" # double = [double_txt[i: i + 2] for i in range(0, len(double_txt), 2)] # 列表推导式方法\n",
" # print(double) # ['万俟', '司马', '上官', '欧阳', '夏侯', '诸葛', '闻人', '东方', ...]\n",
" lastname = single + double # 单字姓列表与复姓列表拼接为一个列表\n",
" return lastname # 返回百家姓中所有姓的列表\n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" family_names = read_txt() # 调用函数将文件读为列表\n",
" print(family_names) # 输出"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 文件写入方法 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"进行文件的写入操作,使用 open() 函数时,要将 mode 参数设置为 'w'、'x'、'a'等具有写权限的模式。 \n",
"或用“r+”为读模式打开的文件增加写权限。 \n",
"注意: \n",
"使用“r+”模式时文件处于改写状态,新写入的数据会覆盖原文件起始位置相同字符数量的数据。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"写入数据的方法:\n",
"write() \n",
"writelines() \n",
"writelines()方法不会自动在每一个元素后面增加换行,只是将列表内容直接输出,所以在构造列表时,在需要换行的位置加入一个' \\n',以控制写入时换行。这两种方法可将字符串和列表中的内容写入到文件中永久保存。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| 方法 | 描述 |\n",
"| :---- | :---- |\n",
"| write(b) | 将给定的字符串或字节流对象写入文件 |\n",
"| writelines(lines) | 将一个元素全为字符串的列表写入文件。构造列表时,在需要换行的位置加入换行符' \\n',以控制写入时换行。 |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 实例 9.4 追加写文件 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"将字符串 '姓名,C语言,Java,Python,C#\\n罗明,95,96,85,63\\n朱佳,75,93,66,85\\n' 写入到当前路径下的'score.txt'中,再把列表 ['李思,86,76,96,93\\n郑君,88,98,76,90\\n'] 中的数据追加到文件'score.txt'中,读文件查看写入是否正确。"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"姓名,C语言,Java,Python,C#\n",
"罗明,95,96,85,63\n",
"朱佳,75,93,66,85\n",
"李思,86,76,96,93\n",
"郑君,88,98,76,90\n",
"\n"
]
}
],
"source": [
"def write_str(s, filename):\n",
" \"\"\"接收要写入的字符串和文件名为参数,将字符串追加写入到文件现在数据末尾\"\"\"\n",
" with open(filename, 'w', encoding='utf-8') as f:\n",
" f.write(s) # 将字符串s写入文件\n",
"\n",
"\n",
"def write_list(ls, filename):\n",
" \"\"\"接收要写入的列表和文件名为参数,将列表中的字符串追加写入到文件现在数据末尾\"\"\"\n",
" with open(filename, 'a', encoding='utf-8') as f:\n",
" f.writelines(ls) # 将列表ls写入文件,附加到后面\n",
"\n",
"\n",
"def read_file(filename):\n",
" \"\"\"接收表示文件名的字符串为参数,读文件,以字符串形式返回\"\"\"\n",
" with open(filename, 'r', encoding='utf-8') as f:\n",
" return f.read() # 返回字符串形式的文件内容\n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" file = 'score.txt' # 定义文件名变量,方便程序扩展和修改\n",
" score_str = '姓名,C语言,Java,Python,C#\\n罗明,95,96,85,63\\n朱佳,75,93,66,85\\n'\n",
" score_lst = ['李思,86,76,96,93\\n郑君,88,98,76,90\\n']\n",
" write_str(score_str, file) # 传递要写入的字符串和文件名为参数\n",
" write_list(score_lst, file) # 传递要写入的列表和文件名为参数\n",
" print(read_file(file)) # 读文件,查看写入是否成功\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}