基于条件累积和的多个Pandas列
我有一个dataframe包含多个“堆栈”及其相应的“长度”。
df = pd.DataFrame({'stack-1-material': ['rock', 'paper', 'paper', 'scissors', 'rock'], 'stack-2-material': ['rock', 'paper', 'rock', 'paper', 'scissors'], 'stack-1-length': [3, 1, 1, 2, 3], 'stack-2-length': [3, 1, 3, 1, 2]})
stack-1-material stack-2-material stack-1-length stack-2-length
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
我试图为每种材料创建一个单独的列,跟踪长度的累积总和,而不管它们是哪个“堆栈”。我试过使用,groupby但只能将累积总和放入一列。这是我要找的:
stack-1-material stack-2-material stack-1-length stack-2-length rock_cumsum paper_cumsum scissors_cumsum
0 rock rock 3 3 6 0 0
1 paper paper 1 1 6 2 0
2 paper rock 1 3 9 3 0
3 scissors paper 2 1 9 4 2
4 rock scissors 3 2 12 4 4
回答
您可以使用列材料作为列长度的掩码,然后sum沿着列和cumsum, 为每种材料使用。
#separate material and length
material = df.filter(like='material').to_numpy()
lentgh = df.filter(like='length')
# get all unique material
l_mat = np.unique(material)
# iterate over nique materials
for mat in l_mat:
df[f'{mat}_cumsum'] = lentgh.where(material==mat).sum(axis=1).cumsum()
print(df)
stack-1-material stack-2-material stack-1-length stack-2-length
0 rock rock 3 3
1 paper paper 1 1
2 paper rock 1 3
3 scissors paper 2 1
4 rock scissors 3 2
rock_cumsum paper_cumsum scissors_cumsum
0 6.0 0.0 0.0
1 6.0 2.0 0.0
2 9.0 3.0 0.0
3 9.0 4.0 2.0
4 12.0 4.0 4.0