如何在熊猫中另一列的两个值之间创建一个包含行数的列
我有以下数据框:
import pandas as pd
#Create DF
d = {
'Date': ['1/01/2021','2/01/2021','3/01/2021','4/01/2021','5/01/2021','6/01/2021','7/01/2021','8/01/2021','9/01/2021','10/01/2021','11/01/2021','12/01/2021','13/01/2021',
'14/01/2021','15/01/2021','16/01/2021'],
'Name': ['Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','John','John','John','John','John','John','John'],
'Status':['Avaiable','Unavailable','Unavailable','Unavailable','Unavailable','Unavailable','Avaiable','Unavailable','Unavailable','Avaiable','Unavailable','Unavailable'
,'Unavailable','Available','Unavailable','Unavailable'],
'Count' : [1,2,3,4,5,6,1,2,3,1,2,3,4,1,2,3]}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df.Date,format='%d/%m/%Y')
df
由于单词'Available'出现在Status列中,我如何创建行数。
非常感谢!
编辑 - 关于问题的扩展:
如果我有两个起始名称,例如下面的示例,其中计数以"First Entry"或"Available"
import pandas as pd
#Create DF
d = {
'Date': ['1/01/2021','2/01/2021','3/01/2021','4/01/2021','5/01/2021','6/01/2021','7/01/2021','8/01/2021','9/01/2021','10/01/2021','11/01/2021','12/01/2021','13/01/2021',
'14/01/2021','15/01/2021','16/01/2021'],
'Name': ['Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','Joe','John','John','John','John','John','John','John'],
'Status':['First Entry','Unavailable','Available','Unavailable','Unavailable','Unavailable','Available','Unavailable','Unavailable','First Entry','Unavailable','Unavailable'
,'Unavailable','Available','Unavailable','Unavailable'],
'Count' : [1,2,1,2,3,4,1,2,3,1,2,3,4,1,2,3]}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df.Date,format='%d/%m/%Y')
df
示例 3 - 名称混淆时
这是一个在整个代码中出现名称的示例。该Count是被预期的输出是什么
import pandas as pd
#Create DF
d = {
'Date': ['1/01/2021','2/01/2021','3/01/2021','4/01/2021','5/01/2021','6/01/2021','7/01/2021','8/01/2021','9/01/2021','10/01/2021','11/01/2021','12/01/2021','13/01/2021',
'14/01/2021','15/01/2021','16/01/2021'],
'Name': ['Joe','John','Joe','Joe','Joe','John','John','Joe','Joe','John','John','John','John','John','John','John'],
'Status':['First Entry','First Entry','Available','Unavailable','Unavailable','Unavailable','Available','Unavailable','Unavailable','Unavailable','Unavailable','Unavailable'
,'Unavailable','Available','Unavailable','Unavailable'],
'Count' : [1,1,1,2,3,2,1,4,5,2,3,4,5,1,2,3]}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df.Date,format='%d/%m/%Y')
df
回答
如果需要每个Status和Name组的累积计数,请GroupBy.cumcount与 compare Statusby 一起使用Avaiable:
df['Count1'] = df.groupby(['Name', df['Status'].eq('Avaiable').cumsum()]).cumcount().add(1)
print (df)
Date Name Status Count Count1
0 2021-01-01 Joe Avaiable 1 1
1 2021-01-02 Joe Unavailable 2 2
2 2021-01-03 Joe Unavailable 3 3
3 2021-01-04 Joe Unavailable 4 4
4 2021-01-05 Joe Unavailable 5 5
5 2021-01-06 Joe Unavailable 6 6
6 2021-01-07 Joe Avaiable 1 1
7 2021-01-08 Joe Unavailable 2 2
8 2021-01-09 Joe Unavailable 3 3
9 2021-01-10 John Avaiable 1 1
10 2021-01-11 John Unavailable 2 2
11 2021-01-12 John Unavailable 3 3
12 2021-01-13 John Unavailable 4 4
13 2021-01-14 John Avaiable 1 1
14 2021-01-15 John Unavailable 2 2
15 2021-01-16 John Unavailable 3 3
对于第二个样本:
df['Count1'] = df.groupby(['Name', df['Status'].isin(['Avaiable', 'First Entry']).cumsum()]).cumcount().add(1)
对于第三个示例:Name首先对列进行排序,对原始顺序应用解决方案和最后排序索引:
df = df.sort_values(['Name'])
df['Count1'] = df.groupby(['Name', df['Status'].isin(['Available', 'First Entry']).cumsum()]).cumcount().add(1)
df = df.sort_index()
print (df)
Date Name Status Count Count1
0 2021-01-01 Joe First Entry 1 1
1 2021-01-02 John First Entry 1 1
2 2021-01-03 Joe Available 1 1
3 2021-01-04 Joe Unavailable 2 2
4 2021-01-05 Joe Unavailable 3 3
5 2021-01-06 John Unavailable 2 2
6 2021-01-07 John Available 1 1
7 2021-01-08 Joe Unavailable 4 4
8 2021-01-09 Joe Unavailable 5 5
9 2021-01-10 John Unavailable 2 2
10 2021-01-11 John Unavailable 3 3
11 2021-01-12 John Unavailable 4 4
12 2021-01-13 John Unavailable 5 5
13 2021-01-14 John Available 1 1
14 2021-01-15 John Unavailable 2 2
15 2021-01-16 John Unavailable 3 3