使用R折叠不同列中具有不同值的重复行

我有一个包含 500 个观察值的数据框,但在我的示例中只显示了 3 个。这些是在不同列中具有不同值的重复项(ID 列除外,其中包括重复的人员)。我正在复制数据框的样子 (df) 以及处理后的样子 (df_new)。这可能吗 ?数据框是 10 个变量,所以我不担心将它们“加倍”。变量中的值是 a,b,c,d,0,''。然而,我在表格中让它们更通用。

df <- data.frame(ID =  c('1','1','2', '2', '3','3'),
                 Year = c('smaller year.1', 'bigger year.1', 'bigger year.2', 'smaller year.2', 'same year.3', 'same year.3'),
                 V1 = c('a', 'b','c','d','e','f'),
                 V2 = c('g', 'h', 'i', 'j', 'k', 'l'),
                 Vn = c('n1', 'n2','n3','n4','n5','n6'))


df_new <- data.frame(ID = c('1','2','3'),
                     Year_smaller = c('smaller year.1', 'smaller year.2', 'same year.3'),
                     Year_bigger = c('bigger year.1', 'bigger year.2', 'same year.3'),
                     V1 = c('a','c','e'),
                     V1.1 = c('b','d','f'),
                     V2 = c('g','i','k'),
                     V2.1 = c('h','j','l'),
                     Vn = c('n1','n3','n5'),
                     Vn.1 = c('n2','n4','n6'))

回答

对于已编辑的数据和根据修订的要求。由于在字母表中b出现之前,s因此bigger_year显示在之前smaller_year,在实际数据中,您将正确排序年份。如果你想对这样的字符串进行排序,请使用sort(desc(Year))而不是sort(Year)

df <- data.frame(ID =  c('1','1','2', '2', '3','3'),
                 Year = c('smaller year.1', 'bigger year.1', 'bigger year.2', 'smaller year.2', 'same year.3', 'same year.3'),
                 V1 = c('a', 'b','c','d','e','f'),
                 V2 = c('g', 'h', 'i', 'j', 'k', 'l'),
                 Vn = c('n1', 'n2','n3','n4','n5','n6'))


library(tidyverse)

df %>% group_by(ID) %>% mutate(Year = sort(Year)) %>% 
  mutate(rid = row_number()) %>%
  pivot_wider(id_cols = ID, names_from = rid, values_from = c(Year:Vn), names_sep = '')

#> # A tibble: 3 x 9
#> # Groups:   ID [3]
#>   ID    Year1         Year2          V11   V12   V21   V22   Vn1   Vn2  
#>   <chr> <chr>         <chr>          <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1     bigger year.1 smaller year.1 a     b     g     h     n1    n2   
#> 2 2     bigger year.2 smaller year.2 c     d     i     j     n3    n4   
#> 3 3     same year.3   same year.3    e     f     k     l     n5    n6

由reprex 包( v2.0.0 )于 2021 年 6 月 19 日创建


library(tidyverse)

df %>% group_by(ID) %>% mutate(rid = row_number()) %>%
  pivot_wider(id_cols = ID, names_from = rid, values_from = c(Year:Variable_n), names_sep = '')

# A tibble: 3 x 9
# Groups:   ID [3]
  ID    Year1          Year2          Variable_a1 Variable_a2 Variable_b1 Variable_b2 Variable_n1 Variable_n2
  <chr> <chr>          <chr>          <chr>       <chr>       <chr>       <chr>       <chr>       <chr>      
1 1     smaller year.1 bigger year.1  va11        va12        vb11        vb12        vn11        vn12       
2 2     bigger year.2  smaller year.2 va21        va22        vb21        vb22        vn21        vn22       
3 3     same year.3    same year.3    va31        va32        vb31        vb32        vn31        vn32 

你是这个意思吗?


df %>% group_by(ID) %>% arrange(desc(Year)) %>% mutate(rid = row_number()) %>%
  pivot_wider(id_cols = ID, names_from = rid, values_from = c(Year:Variable_n), names_sep = '')

# A tibble: 3 x 9
# Groups:   ID [3]
  ID    Year1          Year2         Variable_a1 Variable_a2 Variable_b1 Variable_b2 Variable_n1 Variable_n2
  <chr> <chr>          <chr>         <chr>       <chr>       <chr>       <chr>       <chr>       <chr>      
1 2     smaller year.2 bigger year.2 va22        va21        vb22        vb21        vn22        vn21       
2 1     smaller year.1 bigger year.1 va11        va12        vb11        vb12        vn11        vn12       
3 3     same year.3    same year.3   va31        va32        vb31        vb32        vn31        vn32


以上是使用R折叠不同列中具有不同值的重复行的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>