查找所有组值为nan的行

html5 • 2022年9月16日 pm1:29 • 问答

我有这个数据框：

from io import StringIO
u_cols = ['word','count']
audit_trail = StringIO('''
test 1
testing 24
tested 
again 5
begin 
again 6
begin
again
''')

df = pd.read_csv(audit_trail, sep=" ", names = u_cols  )

我可以通过 groupby 查找有多少值为 null：

df.groupby('word')['count'].apply(list)

word
again      [5.0, 6.0, nan]
begin           [nan, nan]
test                 [1.0]
tested               [nan]
testing             [24.0]
Name: count, dtype: object

如果组计数大于 1 并且所有值都是 nan 则不包括此类条目。

Expected: 
again      [5.0, 6.0, nan]
test                 [1.0]
tested               [nan]
testing             [24.0]

为了获得这些结果，应该从数据框中删除单词“ begin ”。

df[df.word != "begin"]
word    count
0   test    1.0
1   testing     24.0
2   tested  NaN
3   again   5.0
5   again   6.0
7   again   NaN

我如何找到应该删除哪些词？

回答

1.`groupby`与`filter`

df.groupby('word').filter(lambda x: not (x['count'].isna().all() and len(x) > 1))

2.`groupby`与 `transform`

g = df.groupby('word')['count']
m = g.transform('count').eq(0) & g.transform('size').gt(1)
df[~m]

3.`groupby`与`agg`

s = df.groupby('word')['count'].agg(['count', 'size'])
m = s['count'].eq(0) & s['size'].gt(1)
df[~df['word'].isin(m[m].index)]

      word  count
0     test    1.0
1  testing   24.0
2   tested    NaN
3    again    5.0
5    again    6.0
7    again    NaN

以上是查找所有组值为nan的行的全部内容。

THE END

二维码

累积R中每个可能组合的值

< <上一篇

ask函数如何知道在Readermonad中返回的环境？

下一篇>>

搜索内容

查找所有组值为nan的行

回答

1.`groupby`与`filter`

2.`groupby`与 `transform`

3.`groupby`与`agg`

目录

目录

推荐文章

最新文章

查找所有组值为nan的行

回答

1.groupby与filter

2.groupby与 transform

3.groupby与agg

目录

目录

推荐文章

最新文章

1.`groupby`与`filter`

2.`groupby`与 `transform`

3.`groupby`与`agg`