在R中将四列特殊连接成新的两列
我在R工作,我有一个有趣的问题。我想转换下一个数据帧:
DF = data.frame(ID = c(1, 2, 3),
Person1 = c("Devin Davey", "Rui Butt", "Keon Dotson"),
Sign = "artist",
Person2 = c("Eli Greer", "Alvin Simons", "Leona Ford"),
Sex = c("female", "male", "female"),
Score = c(10, 20, 30))
ID Person1 Sign Person2 Sex Score
1 1 Devin Davey artist Eli Greer female 10
2 2 Rui Butt artist Alvin Simons male 20
3 3 Keon Dotson artist Leona Ford female 30
像这样格式化:
ID Name Sign Score
1 1 Devin Davey artist 10
2 1 Eli Greer female 10
3 2 Rui Butt artist 20
4 2 Alvin Simons male 20
5 3 Keon Dotson artist 30
6 3 Leona Ford female 30
也就是说,将四列特殊连接成两个新列。
我有一个想法如下:
PART1 <- DF %>%
select(ID, Person1, Person2, Score) %>%
gather(key, Name, -c(ID, Score), na.rm = TRUE) %>%
select(-key) %>%
arrange(ID) %>%
mutate(temp_id = 1:n())
PART2 <- DF %>%
select(ID, Sign, Sex) %>%
gather(key, Sign, -ID, na.rm = TRUE) %>%
select(-key) %>%
arrange(ID) %>%
mutate(temp_id = 1:n())
PART1 %>%
left_join(PART2, by = c("ID" = "ID", "temp_id" = "temp_id")) %>%
select(-temp_id) %>%
relocate(Score, .after = Sign)
但是在我看来这样的解决方案并不是很漂亮,我认为这个问题可以用更好的方式解决。
因此,我将不胜感激您使用tidyverse.
回答
我们可以将名称从 'Sign', 'Sex' 更改为常用名称 'Sign' 并附加一个序列作为后缀以匹配 Person 的名称,然后使用 pivot_longer
library(dplyr)
library(tidyr)
DF %>%
rename_at(vars(c('Sign', 'Sex')), ~ c('Sign1', 'Sign2')) %>%
pivot_longer(cols = -c(ID, Score), names_to = c(".value", "grp"),
names_sep = "(?<=[a-z])(?=d)") %>%
select(ID, Name = Person, Sign, Score)
-输出
# A tibble: 6 x 4
# ID Name Sign Score
# <dbl> <chr> <chr> <dbl>
#1 1 Devin Davey artist 10
#2 1 Eli Greer female 10
#3 2 Rui Butt artist 20
#4 2 Alvin Simons male 20
#5 3 Keon Dotson artist 30
#6 3 Leona Ford female 30
回答
在基础 R 中,您可以使用函数reshape. 由于这给出了不同的排序,我们将重新排序以获得如上所示的确切数据。虽然没有必要
DF1<-reshape(DF, matrix(2:5, 2), dir="long")
DF1[order(DF1$ID),c("ID", "Person1","Sign", "Score")]
ID Person1 Sign Score
1.1 1 Devin Davey artist 10
1.2 1 Eli Greer female 10
2.1 2 Rui Butt artist 20
2.2 2 Alvin Simons male 20
3.1 3 Keon Dotson artist 30
3.2 3 Leona Ford female 30