如何按R中列中的次数拆分行?
例如,假设您有以下数据框:
ID<-c("11", "12", "13", "14", "14")
Date<-c("2020-01-01", "2020-02-01", "2020-03-15", "2020-04-10", "2020-06-01")
Item<-c("Item1", "Item1", "Item2", "Item2", "Item2")
ItemPrice<-c(5, 5, 7, 7, 7)
Quantity<-c(1, 2, -2, 2, 3)
Cost<-c(5, 10, -14, 14, 21)
df<-data.frame(ID, Date, Item, ItemPrice, Quantity, Cost)
df
ID Date Item ItemPrice Quantity Cost
1 11 2020-01-01 Item1 5 1 5
2 12 2020-02-01 Item1 5 2 10
3 13 2020-03-15 Item2 7 -2 -14
4 14 2020-04-10 Item2 7 2 14
5 14 2020-06-01 Item2 7 3 21
但是,您希望按以下方式分隔行,Quantity以便每一行都代表一个单独的销售,如下所示:
ID Date Item ItemPrice Quantity Cost
1 11 2020-01-01 Item1 5 1 5
2 12 2020-02-01 Item1 5 1 5
3 12 2020-02-01 Item1 5 1 5
4 13 2020-03-15 Item2 7 -1 -7
5 13 2020-03-15 Item2 7 -1 -7
6 14 2020-04-10 Item2 7 1 7
7 14 2020-04-10 Item2 7 1 7
8 14 2020-06-01 Item2 7 1 7
9 14 2020-06-01 Item2 7 1 7
10 14 2020-06-01 Item2 7 1 7
这怎么可能实现?
回答
使用abs'Quantity'的olute 值创建一个计数列,将'Quantity' 更改sign为它,用 'cnt' 列除以替换 'Cost',然后复制带有 'cnt' 列的行
library(dplyr)
library(tidyr)
df %>%
mutate(cnt = abs(Quantity), Quantity = sign(Quantity),
Cost = Cost/cnt) %>%
uncount(cnt) %>%
as_tibble
-输出
# A tibble: 10 x 6
# ID Date Item ItemPrice Quantity Cost
# <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 11 2020-01-01 Item1 5 1 5
# 2 12 2020-02-01 Item1 5 1 5
# 3 12 2020-02-01 Item1 5 1 5
# 4 13 2020-03-15 Item2 7 -1 -7
# 5 13 2020-03-15 Item2 7 -1 -7
# 6 14 2020-04-10 Item2 7 1 7
# 7 14 2020-04-10 Item2 7 1 7
# 8 14 2020-06-01 Item2 7 1 7
# 9 14 2020-06-01 Item2 7 1 7
#10 14 2020-06-01 Item2 7 1 7
回答
这是一个基本的R方式。我将关闭作为这个问题的重复,但有一些小的差异。主要指令,lapply循环来自这个答案。
df$Cost <- df$Cost/abs(df$Quantity)
df <- as.data.frame(lapply(df, rep, abs(df$Quantity)))
df$Quantity <- sign(df$Quantity)
df
# ID Date Item ItemPrice Quantity Cost
#1 11 2020-01-01 Item1 5 1 5
#2 12 2020-02-01 Item1 5 1 5
#3 12 2020-02-01 Item1 5 1 5
#4 13 2020-03-15 Item2 7 -1 -7
#5 13 2020-03-15 Item2 7 -1 -7
#6 14 2020-04-10 Item2 7 1 7
#7 14 2020-04-10 Item2 7 1 7
#8 14 2020-06-01 Item2 7 1 7
#9 14 2020-06-01 Item2 7 1 7
#10 14 2020-06-01 Item2 7 1 7
回答
一个data.table选项
> setDT(df)[, lapply(.SD, function(x) rep(x / abs(Quantity), abs(Quantity))), ID:ItemPrice]
ID Date Item ItemPrice Quantity Cost
1: 11 2020-01-01 Item1 5 1 5
2: 12 2020-02-01 Item1 5 1 5
3: 12 2020-02-01 Item1 5 1 5
4: 13 2020-03-15 Item2 7 -1 -7
5: 13 2020-03-15 Item2 7 -1 -7
6: 14 2020-04-10 Item2 7 1 7
7: 14 2020-04-10 Item2 7 1 7
8: 14 2020-06-01 Item2 7 1 7
9: 14 2020-06-01 Item2 7 1 7
10: 14 2020-06-01 Item2 7 1 7