master
/ 8.2 文件读写操作.ipynb

8.2 文件读写操作.ipynb @master

d487d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7ab69a
d487d71
c7ab69a
 
 
 
 
 
 
 
 
 
 
 
d487d71
 
 
 
 
 
 
 
 
c7ab69a
d487d71
c7ab69a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d487d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7ab69a
d487d71
c7ab69a
 
 
 
 
 
 
 
 
 
 
d487d71
 
 
c7ab69a
d487d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7ab69a
 
d487d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c7ab69a
d487d71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 文件读写操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本节可能用到的文件:  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[data.tar](images/ch8/data.tar)  \n",
    "[9.1 出塞.txt](images/ch8/9.1 出塞.txt)  \n",
    "[Who Moved My Cheese.txt](images/ch8/Who Moved My Cheese.txt)  \n",
    "[百家姓.txt](images/ch8/百家姓.txt)  \n",
    "[出塞.txt](images/ch8/出塞.txt)  \n",
    "[What is New in Python3.11.txt](images/ch8/What is New in Python3.11.txt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 文件读取方法 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python文件对象fr提供了三个读取数据的方法:   \n",
    "fr.read()  \n",
    "fr.readline()  \n",
    "fr.readlines()  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| 方法 | 描述 |\n",
    "| :---- | :---- |\n",
    "| read(size) | 无参数或参数为-1时,读取<font color='red'>全部文件内容为一个字符串</font>;当参数size为大于或等于0的整数时,读取size个字符。 |\n",
    "| readline(size) | 无参数或参数为-1时,读取并返回文件对象中的<font color='red'>一行数据 </font>,包括行末结尾标记'\\\\n',字符串类型。当参数size为大于或等于0的整数时,从指针所在处向后最多读取当前行的前size个字符,当前行剩余字符少于size时,读取到行末。 |\n",
    "| readlines(hint) | 无参数时,读取文件全部数据,返回一个<font color='red'>列表 </font>,列表中每个元素是文件对象中的一行数据,包括行末的换行符’\\\\n’。 当参数hint为大于或等于0的整数时,返回文件从头到第hint个字符所在的行末。 |\n",
    "| seek(offset,whence) | 改变当前文件操作<font color='red'>指针 </font>指针的位置,offset为指针偏移量,whence代表参照物,有三个取值:0文件开始、1当前位置、2文件结尾。在文本文件(模式字符串未使用 b 时打开的文件)中,只允许相对于文件开头搜索。 |\n",
    "| tell() | 返回文件指针当前的位置,返回值与编码有关。 |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在文件刚打开时,指针是指向文件内容的起始处,伴随着读写的进行指针一步一步往后移动。当指针移动到文件结尾后,再试图读取数据就没有返回值了。  \n",
    "如果期望重新读取文件中的数据,可使用seek(0) 将文件读取指针移动到文件开始的位置。"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "seek(0) # 将文件指针移动到文件开始位置"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 实例 9.1 输出文件内容 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有一个utf-8编码的文本文件“出塞.txt”,内容如下:  \n",
    "  \n",
    "出塞  \n",
    "秦时明月汉时关,万里长征人未还。  \n",
    "但使龙城飞将在,不教胡马度阴山。  \n",
    "\n",
    "编程读取文件的内容。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "访问文件里的数据,必须先用 open() 函数打开文件,只读取文件,不修改文件内容时,读取模式参数 mode 的值可设为 ‘r’ 或缺省。encoding='utf-8' 参数表示以“utf-8”编码方式处理数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[9.1 出塞.txt](images/ch8/9.1 出塞.txt)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "出塞\n",
      "秦时明月汉时关,万里长征人未还。\n",
      "但使龙城飞将在,不教胡马度阴山。\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    print(poem.read())  # 读文件为字符串输出\n",
    "    for line in poem:       # 指针在文件末尾,后面无数据,读到空字符串\n",
    "        print(line.strip())  # 无输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "出塞\n",
      "秦时明月汉时关,万里长征人未还。\n",
      "但使龙城飞将在,不教胡马度阴山。\n",
      "\n",
      "出塞\n",
      "秦时明月汉时关,万里长征人未还。\n",
      "但使龙城飞将在,不教胡马度阴山。\n"
     ]
    }
   ],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    print(poem.read())  # 读文件为字符串输出\n",
    "    poem.seek(0) # 将文件指针移动到文件开始位置\n",
    "    for line in poem:       # 指针在文件开头\n",
    "        print(line.strip())  # 重新输出一次文件内容"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "read(size=-1) \n",
    "\n",
    "```\n",
    "从文本文件中读取并返回最多 size 个字符,返回的数据类型为字符串\n",
    "size 为负值或值是 None 时,从当前位置一直读取到文件结束(End Of File,EOF)\n",
    "读取的字符包括标点符号和空格、换行符等空白字符。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['出塞\\n秦时明月汉时关,万里长征人未还。\\n']\n",
      "但使龙城飞将在,不教胡马度阴山。\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    print([poem.read(20)])  # 读取前20个字符,放入列表中输出是为了能看到换行符\n",
    "    \n",
    "    print(poem.read())      # 输出剩余字符"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "readline(size=-1) \n",
    "\n",
    "```\n",
    "每次只读取一行数据,文件指针移动到下一行开始。\n",
    "如果指定了 size ,将在当前行读取最多 size 个字符,本行剩余字符少于size时,读取到本行结束。  \n",
    "字符串末尾保留换行符(\\n),空行使用 '\\n' 表示,该字符串只包含一个换行符。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    print([poem.read(11)])    # ['出塞\\n秦时明月汉时关,'],字符串末尾保留换行符\n",
    "    print(poem.readline(10))  # 万里长征人未还。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了更好的理解这个命令的用法,我们使用以下诗句:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "          出塞\n",
    "             王昌龄()\n",
    "秦时明月汉时关,万里长征人未还。\n",
    "但使龙城飞将在,不教胡马度阴山。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了观察方便,将空格用中文数字代替:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一二三四五六出塞\n",
    "一二三四五六七八王昌龄()\n",
    "秦时明月汉时关,万里长征人未还。\n",
    "但使龙城飞将在,不教胡马度阴山。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file =  'images/ch8/9.1 出塞.txt'\n",
    "with open(file, 'r', encoding='utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    for i in range(1, 12):  \n",
    "        print(i, [poem.readline(i)], poem.tell())  # 从当前行可读数据中读取前i个字符,指针后移"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "poem.tell()获取当前指针位置,从输出结果看,每次只读取当前行指针位置后面的若干个字符,最多不超过当前行。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这种方式让返回值清晰明确;只要 f.readline() 返回空字符串,就表示已经到达了文件末尾。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    while txt := poem.readline():  # 逐行读文件,直至文件结束\n",
    "        print(txt.strip())          # 去除行末的换行符后输出当前读到的字符串"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "readlines(hint=-1)\n",
    "# 一次读取文件中所有数据行,文件指针一次性就移动到文件结尾处。\n",
    "# readlines()方法自动将文件内容转成一个列表,列表中每个元素是文件对象中的一行数据。\n",
    "# 可以指定 hint 来,读取的直到指定字符所在的行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    print(poem.readlines())      # 输出全部内容放入列表\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open( 'images/ch8/9.1 出塞.txt','r',encoding = 'utf-8') as poem:  # 打开文件创建文件对象,命名为poem\n",
    "    print(poem.readlines(4))      # 输出到第4个字符\"秦\"所在的行的数据, ['出塞\\n', '秦时明月汉时关,万里长征人未还。\\n']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 实例 9.2 提取文件中的英文 "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "《谁动了我的奶酪?》是美国作家斯宾塞·约翰逊创作的一个寓言故事,该书首次出版于1998年。‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬文件“Who Moved My Cheese.txt”中包含这个故事的中英文,提取并输出文件中的英文。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Who Moved My Cheese.txt](images/ch8/Who Moved My Cheese.txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file(file):\n",
    "    \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
    "    过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
    "    with open(file, 'r', encoding='utf-8') as novel:\n",
    "        txt = novel.read()  # 读取文件全部内容为字符串\n",
    "    english_txt = ''\n",
    "    for x in txt:\n",
    "        if ord(x) < 256:\n",
    "            english_txt = english_txt + x   # 过滤掉中文,拼接为字符串\n",
    "    return english_txt  # 返回英文文本\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    filename = 'images/ch8/Who Moved My Cheese.txt'\n",
    "    print(read_file(filename))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file(file):\n",
    "    \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
    "    过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
    "    with open(file, 'r', encoding='utf-8') as novel:\n",
    "        txt = novel.read()  # 读取文件全部内容为字符串\n",
    "    english_txt = ''.join([x for x in txt if ord(x) < 256])  # 过滤掉中文,拼接为字符串\n",
    "    return english_txt  # 返回英文文本\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    filename =  'images/ch8/Who Moved My Cheese.txt'\n",
    "    print(read_file(filename))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font face='楷体' color='red' size=5> 练一练 </font>"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "修改上面的代码,统计小说《谁动了我的奶酪?》中的英文单词数量?\n",
    "提示:\n",
    "将英文文本中的符号替换为空格后,根据空格切分为列表,列表的长度即单词的数量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file(file):\n",
    "    \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
    "    过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
    "    with open(file, 'r', encoding='utf-8') as novel:\n",
    "        txt = novel.read()  # 读取文件全部内容为字符串\n",
    "    english_txt = ''.join([x for x in txt if ord(x) < 256])  # 过滤掉中文,拼接为字符串\n",
    "    return english_txt  # 返回英文文本\n",
    "\n",
    "\n",
    "def words_lst(text):\n",
    "    \"\"\"接收字符串为参数,用空格替换字符串中所有标点符号,根据空格将字符串切分为列表\n",
    "    返回值为元素为单词的列表。\n",
    "    \"\"\"\n",
    "    # 补充你的代码\n",
    "    \n",
    "    \n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    filename = 'images/ch8/Who Moved My Cheese.txt'\n",
    "    txt = read_file(filename)\n",
    "    print(len(words_lst(txt)))  # 输出列表的长度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font face='楷体' color='red' size=5> 练一练 </font>"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "修改上面的代码,统计小说《谁动了我的奶酪?》中的英文单词的词频,输出出现频率最高的30个单词。\n",
    "提示:\n",
    "以单词为字典的键,以单词出现次数为字典的值,遍历列表,单词每出现一次词频增加1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file(file):\n",
    "    \"\"\"接收文件名为参数,将文件中的内容读为字符串,\n",
    "    过滤掉中文(中文字符及全角符号Unicode编码都大于256),返回字符串 \"\"\"\n",
    "    with open(file, 'r', encoding='utf-8') as novel:\n",
    "        txt = novel.read()  # 读取文件全部内容为字符串\n",
    "    english_txt = ''.join([x for x in txt if ord(x) < 256])  # 过滤掉中文,拼接为字符串\n",
    "    return english_txt  # 返回英文文本\n",
    "\n",
    "\n",
    "def words_lst(text):\n",
    "    \"\"\"接收字符串为参数,用空格替换字符串中所有标点符号,根据空格将字符串切分为列表\n",
    "    返回值为元素为单词的列表。\n",
    "    \"\"\"\n",
    "    # 补充你的代码\n",
    "\n",
    "    \n",
    "    \n",
    "\n",
    "def word_frequency(words_ls):\n",
    "    \"\"\"接收元素为单词的列表,统计每个单词出现的次数,根据出现次数排序,输出出现频率最高的50个单词及其出现次数\"\"\"\n",
    "    # 补充你的代码\n",
    "    \n",
    "    \n",
    "    \n",
    "    \n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    filename = 'images/ch8/Who Moved My Cheese.txt'\n",
    "    txt = read_file(filename)\n",
    "    words_list = words_lst(txt)  # 单词列表\n",
    "    word_frequency(words_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 实例 9.3 读百家姓获得姓的列表 "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "赵钱孙李,周吴郑王。\n",
    "冯陈褚卫,蒋沈韩杨。\n",
    "......\n",
    "巢关蒯相,查后荆红。\n",
    "游竺权逯,盖益桓公。\n",
    "\n",
    "万俟司马,上官欧阳。\n",
    "夏侯诸葛,闻人东方。\n",
    "......\n",
    "墨哈谯笪,年爱阳佟。\n",
    "第五言福"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "提示:\n",
    "百家姓前51行为单姓,51行以后为复姓,需要分别处理。\n",
    "每行读取为字符串,去除换行符、逗号和句号拼接为一个字符串,再用list()转列表,每个字为一个元素\n",
    "51行以后拼接为一个字符串,每次取两个字加入列表"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_txt():  # 根据性别生成一个姓名\n",
    "    \"\"\"无参数,读百家姓文件,将单字姓和复姓拆分,返回以姓为元素的列表\"\"\"\n",
    "    with open('images/ch8/百家姓.txt', 'r', encoding='utf-8') as data:\n",
    "        # 替换掉换行符、逗号和句号,每行转为一个字符串,得到列表\n",
    "        last_ls = [line.strip().replace(',', '').replace('。', '') for line in data]\n",
    "        # print(last_ls)  # ['赵钱孙李周吴郑王', '冯陈褚卫蒋沈韩杨', ...]\n",
    "\n",
    "    single = list(''.join(last_ls[:51]))    # 前51行为单字姓,拼接为一个字符串转列表\n",
    "    # print(single)  # ['赵', '钱', '孙', '李', '周', '吴', '郑', '王',...]\n",
    "    \n",
    "    double_txt = ''.join(last_ls[51:])       # 51行后为复姓,拼接为一个字符串\n",
    "    double = []                              # 创建空列表\n",
    "    for i in range(0, len(double_txt), 2):   # 遍历字符串序号,步长为2\n",
    "        double.append(double_txt[i: i + 2])  # 当前序号向后切2个字符,加入到列表\n",
    "    # double = [double_txt[i: i + 2] for i in range(0, len(double_txt), 2)]  # 列表推导式方法\n",
    "    # print(double)  # ['万俟', '司马', '上官', '欧阳', '夏侯', '诸葛', '闻人', '东方', ...]\n",
    "    lastname = single + double               # 单字姓列表与复姓列表拼接为一个列表\n",
    "    return lastname                          # 返回百家姓中所有姓的列表\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    family_names = read_txt()                # 调用函数将文件读为列表\n",
    "    print(family_names)                      # 输出"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 文件写入方法 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "进行文件的写入操作,使用 open() 函数时,要将 mode 参数设置为 'w'、'x'、'a'等具有写权限的模式。  \n",
    "或用“r+”为读模式打开的文件增加写权限。  \n",
    "注意:  \n",
    "使用“r+”模式时文件处于改写状态,新写入的数据会覆盖原文件起始位置相同字符数量的数据。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "写入数据的方法:\n",
    "write() \n",
    "writelines() \n",
    "writelines()方法不会自动在每一个元素后面增加换行,只是将列表内容直接输出,所以在构造列表时,在需要换行的位置加入一个' \\n',以控制写入时换行。这两种方法可将字符串和列表中的内容写入到文件中永久保存。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| 方法 | 描述 |\n",
    "| :---- | :---- |\n",
    "| write(b) | 将给定的字符串或字节流对象写入文件 |\n",
    "| writelines(lines) | 将一个元素全为字符串的列表写入文件。构造列表时,在需要换行的位置加入换行符' \\n',以控制写入时换行。 |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 实例 9.4 追加写文件 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将字符串 '姓名,C语言,Java,Python,C#\\n罗明,95,96,85,63\\n朱佳,75,93,66,85\\n' 写入到当前路径下的'score.txt'中,再把列表 ['李思,86,76,96,93\\n郑君,88,98,76,90\\n'] 中的数据追加到文件'score.txt'中,读文件查看写入是否正确。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "姓名,C语言,Java,Python,C#\n",
      "罗明,95,96,85,63\n",
      "朱佳,75,93,66,85\n",
      "李思,86,76,96,93\n",
      "郑君,88,98,76,90\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def write_str(s, filename):\n",
    "    \"\"\"接收要写入的字符串和文件名为参数,将字符串追加写入到文件现在数据末尾\"\"\"\n",
    "    with open(filename, 'w', encoding='utf-8') as f:\n",
    "        f.write(s)        # 将字符串s写入文件\n",
    "\n",
    "\n",
    "def write_list(ls, filename):\n",
    "    \"\"\"接收要写入的列表和文件名为参数,将列表中的字符串追加写入到文件现在数据末尾\"\"\"\n",
    "    with open(filename, 'a', encoding='utf-8') as f:\n",
    "        f.writelines(ls)  # 将列表ls写入文件,附加到后面\n",
    "\n",
    "\n",
    "def read_file(filename):\n",
    "    \"\"\"接收表示文件名的字符串为参数,读文件,以字符串形式返回\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as f:\n",
    "        return f.read()   # 返回字符串形式的文件内容\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    file = 'score.txt'     # 定义文件名变量,方便程序扩展和修改\n",
    "    score_str = '姓名,C语言,Java,Python,C#\\n罗明,95,96,85,63\\n朱佳,75,93,66,85\\n'\n",
    "    score_lst = ['李思,86,76,96,93\\n郑君,88,98,76,90\\n']\n",
    "    write_str(score_str, file)   # 传递要写入的字符串和文件名为参数\n",
    "    write_list(score_lst, file)  # 传递要写入的列表和文件名为参数\n",
    "    print(read_file(file))       # 读文件,查看写入是否成功\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}