Python 官方文档:入门教程 => 点击学习
目录一、简介二、交集三、并集四、差集五、对称差集一、简介 Python的数据类型集合:由不同元素组成的集合,集合中是一组无序排列的可 Hash 的值(不可变类型),可以作为字典的Ke
Python的数据类型集合:由不同元素组成的集合,集合中是一组无序排列的可 Hash 的值(不可变类型),可以作为字典的Key
pandas
中的DataFrame
:DataFrame
是一个表格型的数据结构,可以理解为带有标签的二维数组。
常用的集合操作如下图所示:
pandas
的 merge
功能默认为 inner 连接,可以实现取交集set
可以直接用 & 取交集import pandas as pd
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
set1 = {"python", "Go", "c++", "Java"}
set2 = {"Go", "C++", "javascript", "C"}
set1 & set2
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.merge(df1, df2, on=['id','name'])
操作如下所示:
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 | set2
print("CSDN叶庭云:Https://yetingyun.blog.csdn.net/")
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.merge(df1, df2,
on=['id','name'],
how='outer')
df3 = df1.append(df2)
df3.drop_duplicates(subset=['id'], keep="first")
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 - set2
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set2 - set1
# df1-df2
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
df1 = df1.append(df2)
df1 = df1.append(df2)
set_diff_df = df1.drop_duplicates(subset=df1.columns,
keep=False)
set_diff_df
# df2-df1
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
df2 = df2.append(df1)
df2 = df2.append(df1)
set_diff_df = df2.drop_duplicates(subset=df2.columns,
keep=False)
set_diff_df
# df1-df2
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
# df2-df1
df1 = pd.DataFrame([
['1', 'Python'],
['2', 'Go'],
['3', 'C++'],
['4', 'Java'],
], columns=['id','name'])
df2 = pd.DataFrame([
['2','Go'],
['3','C++'],
['5','JavaScript'],
['6','C'],
], columns=['id','name'])
pd.concat([df2, df1, df1]).drop_duplicates(keep=False)
print("CSDN叶庭云:https://yetingyun.blog.csdn.net/")
set1 = {"Python", "Go", "C++", "Java"}
set2 = {"Go", "C++", "JavaScript", "C"}
set1 ^ set2 # 对称差集
# 去重 不保留重复的:即可实现取对称差集
df3 = df1.append(df2)
df3.drop_duplicates(subset=['id'], keep=False)
到此这篇关于Pandas
的DataFrame
如何做交集,并集,差集与对称差集的文章就介绍到这了,更多相关Pandas的DataFrame如何做交集,并集,差集与对称差集内容请搜索编程网以前的文章或继续浏览下面的相关文章希望大家以后多多支持编程网!
--结束END--
本文标题: Pandas的DataFrame如何做交集,并集,差集与对称差集
本文链接: https://lsjlt.com/news/163390.html(转载时请注明来源链接)
有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341
2024-03-01
2024-03-01
2024-03-01
2024-02-29
2024-02-29
2024-02-29
2024-02-29
2024-02-29
2024-02-29
2024-02-29
回答
回答
回答
回答
回答
回答
回答
回答
回答
回答
0