基于条件累积和的多个Pandas列

我有一个dataframe包含多个“堆栈”及其相应的“长度”。

df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})

  stack-1-material stack-2-material  stack-1-length  stack-2-length
0             rock             rock               3               3
1            paper            paper               1               1
2            paper             rock               1               3
3         scissors            paper               2               1
4             rock         scissors               3               2

我试图为每种材料创建一个单独的列,跟踪长度的累积总和,而不管它们是哪个“堆栈”。我试过使用,groupby但只能将累积总和放入一列。这是我要找的:

  stack-1-material stack-2-material  stack-1-length  stack-2-length  rock_cumsum  paper_cumsum  scissors_cumsum
0             rock             rock               3               3            6             0                0
1            paper            paper               1               1            6             2                0
2            paper             rock               1               3            9             3                0
3         scissors            paper               2               1            9             4                2
4             rock         scissors               3               2           12             4                4 

回答

您可以使用列材料作为列长度的掩码,然后sum沿着列和cumsum, 为每种材料使用。

#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')

# get all unique material
l_mat = np.unique(material)

# iterate over nique materials
for mat in l_mat:
    df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()

print(df)
  stack-1-material stack-2-material  stack-1-length  stack-2-length  
0             rock             rock               3               3   
1            paper            paper               1               1   
2            paper             rock               1               3   
3         scissors            paper               2               1   
4             rock         scissors               3               2   

   rock_cumsum  paper_cumsum  scissors_cumsum  
0          6.0           0.0              0.0  
1          6.0           2.0              0.0  
2          9.0           3.0              0.0  
3          9.0           4.0              2.0  
4         12.0           4.0              4.0  


以上是基于条件累积和的多个Pandas列的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>