{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 7.5 集合运算"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python中的集合和数学中的集合概念基本一致,也支持集合的交、差、并等操作,使用这些运算可以很方便的处理数学中的集合操作。集合运算的方法与含义如下表所示。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| 操作方法 | 符号 | 含义 |\n",
"| :------------------------- | :------: | :------------------------------------------------ |\n",
"| s.union(t) | s \\| t | 返回集合s和t的并集</br>返回一个新集合,其中包含来自原集合以及 t 指定的所有集合中的元素。 |\n",
"| s.intersection(t) | s & t | 返回集合s和t的交集 </br> 返回一个新集合,其中包含原集合以及 t 指定的所有集合中共有的元素。 |\n",
"| s.difference(t) | s –t | 返回集合s和t的差</br> 返回一个新集合,其中包含原集合中在 t 指定的其他集合中不存在的元素。 |\n",
"| s.symmetric_difference(t) | s ^ t | 返回集合s和t的对称差,即存在于s和t中的非交集数据</br> 返回一个新集合,其中的元素或属于原集合或属于 t 指定的其他集合,但不能同时属于两者。|\n",
"| s.update(t) | s = s \\| t | s中的元素更新为属于 s 或 t 的成员,即 s与 t的并集。 |\n",
"| s.intersection_update(t) | s = s & t | s中的元素更新为共同属于 s和t的元素,即 s与 t的交集。 |\n",
"| s.difference_update(t) | s = s - t | s中的元素更新为属于 s 但不包含在 t 中的元素,即 s与 t的差集。 |\n",
"| s.symmetric_difference_update(t) | s = s ^ t | s中的元素更新为那些包含在 s 或 t 中,但不 是 s和 t 共有的元素。 |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"使用集合操作方法时,参数可以是集合或可迭代数据对象;用符号操作时,参与运算的两个对象必须都是集合。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. 并集"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ **s.union(t)** : \n",
"返回一个新的集合,包含原集合s以及t指定的所有集合中的所有元素。参数t可以接受多个任何可迭代对象。\n",
"+ **s | t**: \n",
"返回一个新的集合,包含集合s以及集合t中的所有元素。参与运算的两个对象必须都是集合。\n",
"+ **s.update(t)**: \n",
"无返回值,直接作用于集合对象s,s中包含原集合s以及t指定的所有集合中的所有元素。参数t可以接受多个任何可迭代对象。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'p', 'o', 'b', 'c', 's', 'k', 'h', 'e'}\n",
"{'p', 'o', 'b', 'c', 's', 'k', 'h', 'e'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"new_set1 = s.union(t)\n",
"print(new_set1)\n",
"new_set2 = s | t\n",
"print(new_set2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/13.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{1, 'm', 2, 3, 4, 5, 6, 'g', 7, 8, 9, 'p', 'a', 'n', 'i', 'r', 'o', 't', 'h', 'y'}\n"
]
}
],
"source": [
"s = set('python')\n",
"new_set3 = s.union('programming', range(1,10)) # 参数t可以接受多个任何可迭代对象\n",
"print(new_set3)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'p', 'b', 'o', 's', 'k', 'h'}\n",
"None {'p', 'b', 'o', 'c', 's', 'k', 'h', 'e'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"print(s)\n",
"result = s.update(t) # 无返回值,直接作用于集合对象s,等价于s = s | t\n",
"print(result, s) # None {'k', 'n', 't', 'p', 'h', 'y', 'o', 's', 'b'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/14.png\" style=\"zoom:60%;\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. 交集"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ **s.intersection(t)**: \n",
"返回一个新的集合,该集合中的每个元素同时是s和t指定的所有集合中的成员。参数t可以接受多个任何可迭代对象。\n",
"+ **s & t**: \n",
"返回一个新的集合,该集合中的每个元素同时是s和t两个集合中的成员。参与运算的两个对象必须都是集合。\n",
"+ **s.intersection_update(t)**: \n",
"无返回值,直接作用于集合对象s,s中包含的元素同时是原s和t指定的所有集合中的成员。参数t可以接受多个任何可迭代对象。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'o', 'h', 'p', 's'}\n",
"{'o', 'h', 'p', 's'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"new_set1 = s. intersection(t)\n",
"print(new_set1)\n",
"new_set2 = s & t\n",
"print(new_set2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"通过上面的代码可以看出,对两个集合求交集,&运算符和intersection方法的作用是完全相同的,使用运算符的方式显然更直观且代码也更简短。 \n",
"需要说明的是,集合的二元运算还可以跟赋值运算一起构成复合赋值运算,例如: \n",
"set1 |= set2 \n",
"相当于: \n",
"set1 = set1 | set2 \n",
"跟|=作用相同的方法是update; \n",
"set1 &= set2 \n",
"相当于 \n",
"set1 = set1 & set2 \n",
"跟&=作用相同的方法是intersection_update,代码如下所示。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{1, 2, 3, 4, 5, 6, 7}\n",
"{3, 6}\n",
"{2, 4}\n"
]
}
],
"source": [
"set1 = {1, 3, 5, 7}\n",
"set2 = {2, 4, 6}\n",
"set1 |= set2\n",
"# set1.update(set2)\n",
"print(set1) # {1, 2, 3, 4, 5, 6, 7}\n",
"set3 = {3, 6, 9}\n",
"set1 &= set3\n",
"# set1.intersection_update(set3)\n",
"print(set1) # {3, 6}\n",
"set2 -= set1\n",
"# set2.difference_update(set1)\n",
"print(set2) # {2, 4}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/15.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'p', 'b', 'o', 's', 'k', 'h'}\n",
"None {'o', 'h', 'p', 's'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"print(s)\n",
"result = s.intersection_update(t) # 无返回值,直接作用于集合对象s,等价于s = s & t\n",
"print(result, s) # None {'p', 'o', 'h', 's'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/16.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. 差集"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ **s.difference(t)**: \n",
"返回一个新的集合,该集合中的每个元素是s的成员,但不是t指定的所有集合中的成员。参数t可以接受多个任何可迭代对象。\n",
"+ **s - t**: \n",
"返回一个新的集合,该集合中的每个元素是s的成员,但不是集合t的成员。参与运算的两个对象必须都是集合。\n",
"+ **s.difference_update(t)**: \n",
"无返回值,直接作用于集合对象s,s中包含的元素是原s的成员,但不是t指定的所有集合中的成员。参数t可以接受多个任何可迭代对象。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'b', 'k'}\n",
"{'b', 'k'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"new_set1 = s.difference(t)\n",
"print(new_set1)\n",
"new_set2 = s - t\n",
"print(new_set2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/17.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'p', 'b', 'o', 's', 'k', 'h'}\n",
"None {'b', 'k'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"print(s)\n",
"result = s.difference_update(t) # 无返回值,直接作用于集合对象s,等价于s = s - t\n",
"print(result, s) # None {'b', 'k'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/18.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. 对称差集"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ **s.symmetric_difference(t)**: \n",
"返回一个新的集合,该集合中的每个元素或属于原集合s或属于t指定的其他集合,但不能同时属于两者。参数t可以接受多个任何可迭代对象。\n",
"+ **s ^ t**: \n",
"返回一个新的集合,该集合中的每个元素或属于原集合s或属于集合t,但不能同时属于两者。参与运算的两个对象必须都是集合。\n",
"+ **s.symmetric_difference_update(t)**: \n",
"无返回值,直接作用于集合对象s,s中包含的元素或属于原集合s或属于t指定的其他集合,但不能同时属于两者。参数t可以接受多个任何可迭代对象。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'e', 'k', 'b', 'c'}\n",
"{'e', 'k', 'b', 'c'}\n"
]
}
],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"new_set1 = s.symmetric_difference(t)\n",
"print(new_set1)\n",
"new_set2 = s ^ t\n",
"print(new_set2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/19.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"s = set('bookshop')\n",
"t = set('cheeseshop')\n",
"print(s)\n",
"result = s.symmetric_difference_update(t) # 无返回值,直接作用于集合对象s,等价于s = s ^ t\n",
"print(result, s) # None {'c', 'b', 'k', 'e'}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/ch7/20.png\" style=\"zoom:50%;\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 实例1: 手机销售分析"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"!tar -xvf /data/bigfiles/2b2d3026-228d-46e4-8216-cb5f684ff337.tar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"images/ch7/7.4 手机销售分析.xlsx\" target=\"_blank\">7.4 手机销售分析.xlsx</a>\n",
"\n",
"文件“7.4 手机销售分析.xlsx”的Excel表中第1个工作薄2019手机销售上榜品牌及其份额数据(百分数),第2个工作薄2018手机销售上榜品牌及其份额数据(百分数)。读取文件中的数据,按销量输出每年销售榜单品牌、两年都上榜的品牌、两年上榜的所有品牌、2019新上榜品牌、2019新上榜与落榜品牌。\n",
"\n",
"利用pandas库读取Excel文件中数据、用tolist()方法转为列表的方法可以获取其中的手机品牌列表,将其转为集合。excel 是要读取数据的文件名,sheet_name 是要读取的工作薄序号,1表示读取第2个工作薄,值缺省时读取第1个工作薄。values.tolist()可将dataframe类型数据转为列表类型,再利用集合推导式根据列表中的数据创建集合。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"def data(excel):\n",
" \"\"\"接收Excel文件名,读excel文件中两个工作表,分别将数据放入两个集合并返回。\"\"\"\n",
" sale2018 = pd.read_excel(excel, sheet_name=1) # 读数据进dataframe\n",
" sale2018 = sale2018.values.tolist() # dataframe类型转列表\n",
" sale2019 = pd.read_excel(excel).values.tolist() # 两行语句也可以合并为一行,读取后直接将值转为列表\n",
" set2019 = {x[0] for x in sale2019} # 集合推导式,2019年榜单集合\n",
" set2018 = {x[0] for x in sale2018} # 集合推导式,2018年榜单集合\n",
" return set2019, set2018"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"利用集合的并、交、差补和对称差分可以完成题目要求的分析工作。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def rank(s2019, s2018):\n",
" \"\"\"接收两个集合为参数,做集合运算,输出两年都上榜的手机品牌、两年上榜的所有品牌、2019年新上榜品牌、新上榜与落榜品牌。\"\"\"\n",
" print(f'两年都上榜的手机品牌:{s2019 & s2018}') # 两年都上榜的手机,交集s.intersection(t)\n",
" print(f'两年上榜的所有品牌:{s2019 | s2018}') # 两年上榜的所有品牌,并集s.union(t)\n",
" print(f'2019年新上榜品牌:{s2019 - s2018}') # 新上榜品牌,差补s.difference(t)\n",
" print(f'新上榜与落榜品牌:{s2019 ^ s2018}') # 新上榜与落榜品牌,对称差分,s.symmetric_difference(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"调用上面的函数实现题目要求。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"if __name__ == '__main__':\n",
" excelName = '7.4 手机销售分析.xlsx' # 定义文件名,方便修改\n",
" saleSet2019, saleSet2018 = data(excelName) # 调用函数读数据转格式\n",
" rank(saleSet2019, saleSet2018) # 调用函数运算和输出"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}