逐月变换逐月变换
初始表
salesman training_date 01/20 02/20 03/20 04/20 05/20 06/20 07/20 08/20 09/20 10/20 11/20 12/20
0 John 2020-11-01 100 20 200 250 0 28 80 30 150 100 300 250
1 Ruddy 2020-07-12 90 50 30 225 300 100 95 10 20 0 20 100
在 Python 中:
t1 = {'salesman': ['John', 'Ruddy'],
'training_date':['2020-11-30','2020-07-12'],
'01/20': [100, 90], '02/20':[20,50], '03/20':[200,30],'04/20':[250,225],'05/20':[0,300],'06/20':[28,100],
'07/20': [80, 95], '08/20':[30,10], '09/20':[150,20],'10/20':[100,0],'11/20':[300,20],'12/20':[250,100],
}
t1a = pd.DataFrame(data=t1)
t1a
数据帧预期:
salesman training_date training_month 1m_prior 2m_prior 3m_prior 4m_prior 5m_prior 6m_prior
0 John 2020-11-30 300 100 150 30 80 28 0
1 Ruddy 2020-07-12 95 100 300 225 30 50 90
在 Python 中:
t2 = {'salesman': ['John', 'Ruddy'],
'training_date':['2020-11-30','2020-07-12'],
'training_month': [300, 95], '1m_prior':[100,100], '2m_prior':[150,300],
'3m_prior':[30,225],'4m_prior':[80,30],'5m_prior':[28,50], '6m_prior': [0, 90]}
t2a = pd.DataFrame(data=t2)
t2a
说明:
约翰在 11 月 1 日接受了培训。在 11 月 1 日之前的 100 万美元,即 10 月,约翰产生了 100 美元。在 11 月 1 日和 9 月之前的 200 万美元,John 产生了 150 美元。
Ruddy 于 7 月 12 日接受了培训。7 月 12 日之前的 100 万,即 6 月,Ruddy 产生了 100 美元。在 5 月 12 日之前的 200 万,Ruddy 产生了 300 美元。
在理想情况下,我们开始计算 1 个整月,始终从每个月的 1 日开始。因此,如果 Ruddy 在 2020 年 7 月 12 日受雇,那么前一个月应该是 6 月 1 日 - 6 月 30 日。
到目前为止,我们在 Excel 中手动转换数据。
回答
熔化+枢轴:
t1a['training_date'] = pd.to_datetime(t1a['training_date'])
如果您确实需要在 6 个月前:
u = t1a.melt(['salesman','training_date'])
diff_ = (pd.to_datetime(u['variable'],format='%m/%y').dt.month
.sub(u['training_date'].dt.month))
u = u.assign(k=diff_).loc[lambda x: x['k'].between(-6,0)]
choices = (diff_.abs().astype(str).replace("0","").add("")+
np.select([diff_<0,diff_==0],['months_prior','training_month']))
out = (u.assign(Key=choices).pivot_table(index=["salesman","training_date"]
,columns="Key",values="value").set_index("training_month",append=True))
print(out)
Key 1months_prior 2months_prior
salesman training_date training_month
John 2020-11-30 300 100 150
Ruddy 2020-07-12 95 100 300
Key 3months_prior 4months_prior
salesman training_date training_month
John 2020-11-30 300 30 80
Ruddy 2020-07-12 95 225 30
Key 5months_prior 6months_prior
salesman training_date training_month
John 2020-11-30 300 28 0
Ruddy 2020-07-12 95 50 90
如果您想要包括之前和之后的所有月份:
u = t1a.melt(['salesman','training_date'])
diff_ = (pd.to_datetime(u['variable'],format='%m/%y').dt.month
.sub(u['training_date'].dt.month))
choices = (diff_.abs().astype(str).replace("0","").add("")+
np.select([diff_>0,diff_<0,diff_==0],
['months_later','months_prior','training_month']))
out = (u.assign(Key=choices).pivot_table(index=["salesman","training_date"]
,columns="Key",values="value").set_index("training_month",append=True))
print(out)
Key 10months_prior 1months_later
salesman training_date training_month
John 2020-11-30 300.0 100.0 250.0
Ruddy 2020-07-12 95.0 NaN 10.0
Key 1months_prior 2months_later
salesman training_date training_month
John 2020-11-30 300.0 100.0 NaN
Ruddy 2020-07-12 95.0 100.0 20.0
Key 2months_prior 3months_later
salesman training_date training_month
John 2020-11-30 300.0 150.0 NaN
Ruddy 2020-07-12 95.0 300.0 0.0
Key 3months_prior 4months_later
salesman training_date training_month
John 2020-11-30 300.0 30.0 NaN
Ruddy 2020-07-12 95.0 225.0 20.0
Key 4months_prior 5months_later
salesman training_date training_month
John 2020-11-30 300.0 80.0 NaN
Ruddy 2020-07-12 95.0 30.0 100.0
Key 5months_prior 6months_prior
salesman training_date training_month
John 2020-11-30 300.0 28.0 0.0
Ruddy 2020-07-12 95.0 50.0 90.0
Key 7months_prior 8months_prior
salesman training_date training_month
John 2020-11-30 300.0 250.0 200.0
Ruddy 2020-07-12 95.0 NaN NaN
Key 9months_prior
salesman training_date training_month
John 2020-11-30 300.0 20.0
Ruddy 2020-07-12 95.0 NaN