过滤数据框行以返回每种水果的前两天,从日期开始
我的熊猫数据框是:
FRUIT DATE PRICE CITY
Apple 11/5/2021 10 M
Apple 11/5/2021 11 N
Apple 11/5/2021 15 O
Apple 12/5/2021 14 A
Apple 12/5/2021 12 B
Apple 13/5/2021 8 C
Apple 13/5/2021 7 H
Apple 13/5/2021 6 K
orange 11/5/2021 13 L
orange 11/5/2021 12 J
orange 11/5/2021 33 H
Orange 11/5/2021 20 J
orange 12/5/2021 11 A
orange 12/5/2021 12 B
Orange 12/5/2021 29 C
orange 12/5/2021 20 M
Orange 13/5/2021 15 N
Banana 11/5/2021 3 A
Banana 11/5/2021 5 O
Banana 12/5/2021 7 P
Banana 12/5/2021 3 K
Banana 12/5/2021 4 N
Banana 12/5/2021 7 A
Banana 13/5/2021 6 J
Banana 13/5/2021 8 C
我需要每种水果名称的前两个日期的行,例如:
FRUIT DATE PRICE CITY
Apple 11/5/2021 10 M
Apple 11/5/2021 11 N
Apple 11/5/2021 15 O
Apple 12/5/2021 14 A
Apple 12/5/2021 12 B
orange 11/5/2021 13 L
orange 11/5/2021 12 J
orange 11/5/2021 33 H
Orange 11/5/2021 20 J
orange 12/5/2021 11 A
orange 12/5/2021 12 B
Orange 12/5/2021 29 C
orange 12/5/2021 20 M
Banana 11/5/2021 3 A
Banana 11/5/2021 5 O
Banana 12/5/2021 7 P
Banana 12/5/2021 3 K
Banana 12/5/2021 4 N
Banana 12/5/2021 7 A
我有100多个水果名称。如何编写过滤数据的条件?
回答
您可以检查DATE按以下分组的列的密集等级FRUIT是否为<= 2:
df.DATE = pd.to_datetime(df.DATE, format='%d/%m/%Y')
df[df.DATE.groupby(df.FRUIT).rank('dense') <= 2]
FRUIT DATE PRICE CITY
0 Apple 2021-05-11 10 M
1 Apple 2021-05-11 11 N
2 Apple 2021-05-11 15 O
3 Apple 2021-05-12 14 A
4 Apple 2021-05-12 12 B
8 orange 2021-05-11 13 L
9 orange 2021-05-11 12 J
10 orange 2021-05-11 33 H
11 Orange 2021-05-11 20 J
12 orange 2021-05-12 11 A
13 orange 2021-05-12 12 B
14 Orange 2021-05-12 29 C
15 orange 2021-05-12 20 M
17 Banana 2021-05-11 3 A
18 Banana 2021-05-11 5 O
19 Banana 2021-05-12 7 P
20 Banana 2021-05-12 3 K
21 Banana 2021-05-12 4 N
22 Banana 2021-05-12 7 A