根据选定的列过滤重复的行并与Pandas中的另一个数据框进行比较

给定两个数据帧,如下所示:

import pandas as pd 

# Creating a DataFrame object   
df1 = pd.DataFrame([('Stuti', 28, 'Varanasi'), 
            ('Saumya', 32, 'Delhi'), 
            ('Aaditya', 25, 'Mumbai'), 
            ('Saumya', 32, 'Delhi')],  
                   columns = ['Name', 'Score', 'City']) 

df2 = pd.DataFrame([('Saumya', 32, 'Delhi'), 
            ('Saumya', 32, 'Mumbai'), 
            ('Aaditya', 40, 'Mumbai'), 
            ('Seema', 32, 'Delhi')],  
                   columns = ['Name', 'Score', 'City'])

我怎么可以创建一个面具df2来过滤重复的基础上的行df1和列NameCity,如果同样存在PAIRE中df1,然后返回check其列Duplicated,否则,返回None

预期的结果将是:

    Name  Score      City       Check
0   Saumya     32     Delhi  Duplicated
1   Saumya     32    Mumbai        None
2  Aaditya     40  Dehradun  Duplicated
3    Seema     32     Delhi        None

更新代码:

df = pd.concat([df1, df2])

df[df.duplicated(['Name', 'City'])] 

出去:

      Name  Score    City
3   Saumya     32   Delhi
0   Saumya     32   Delhi
2  Aaditya     40  Mumbai

回答

In [65]: df2.merge(df1[['Name', 'City']].drop_duplicates(), how='left', indicator='Check').assign(Check=lambda x: np.where(x['Check'] == 'both', 'Duplicated', None))
Out[65]:
      Name  Score    City       Check
0   Saumya     32   Delhi  Duplicated
1   Saumya     32  Mumbai        None
2  Aaditya     40  Mumbai  Duplicated
3    Seema     32   Delhi        None


以上是根据选定的列过滤重复的行并与Pandas中的另一个数据框进行比较的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>