Groupby并删除NaN行,同时在Pandas中保留一个行

给定一个测试数据集如下:

   id city   name
0   1   bj    NaN
1   2   bj   jack
2   3   bj    NaN
3   4   bj    jim
4   5   sh    NaN
5   6   sh    NaN
6   7   sh  steve
7   8   sh  fiona
8   9   sh    NaN

我如何分组city并删除NaN行,name同时只为每个保留一个group?非常感谢。

预期的结果将是这样的:

   id city   name
0   1   bj    NaN
1   2   bj   jack
2   4   bj    jim
3   5   sh    NaN
4   7   sh  steve
5   8   sh  fiona

df = pd.read_clipboard(na_filter = False)从 excel 文件中读取的新数据集,请注意N/A不应被视为NaN

      newcode build_name  floor  rent_id      rent_name
0  1210010403         C?     25  1765228   ????????????
1  1210010403         C?     25  1765229            N/A
2  1210010403         C?     25  1765229            N/A
3  1210010403         C?     25  1765229            N/A
4  1210010403         C?     25  1765230  ???????????? 
5  1210010403         C?     25  1765229            N/A
6  1210010403         C?     25  1765231     ??????????
7  1210010403         C?     25  1765232            N/A
8  1210010403         C?     25  1765231   ???NA???????

代码:df[df['rent_name'].ne('N/A') | ~df.duplicated(subset=['newcode', 'build_name', 'floor'])],结果与df[~(df['rent_name'].eq('N/A') & df.duplicated(subset = ['newcode', 'build_name', 'floor'], keep = 'first'))]

出去:

      newcode build_name  floor  rent_id      rent_name
0  1210010403         C?     25  1765228   ????????????
4  1210010403         C?     25  1765230  ???????????? 
6  1210010403         C?     25  1765231     ??????????
8  1210010403         C?     25  1765231   ???NA???????

您可以看到N/A结果中缺少一行,我不知道为什么。

期望的输出:

      newcode build_name  floor  rent_id      rent_name
0  1210010403         C?     25  1765228   ????????????
1  1210010403         C?     25  1765229            N/A
4  1210010403         C?     25  1765230  ???????????? 
6  1210010403         C?     25  1765231     ??????????
8  1210010403         C?     25  1765231   ???NA???????

回答

布尔选择NaNs,删除最后一个重复的namecity

df[~(df.name.isna() & df.duplicated(subset = ['city', 'name'], keep = 'first'))]

    id city   name
0   1   bj    NaN
1   2   bj   jack
3   4   bj    jim
4   5   sh    NaN
6   7   sh  steve
7   8   sh  fiona


以上是Groupby并删除NaN行,同时在Pandas中保留一个行的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>