按组查找时间间隔中的重叠并返回子集数据帧
假设我有这个数据框,它有两个 ID (1/2),它们的开始和结束时间在三个不同的区域 (A/B/C):
df <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), zone = c("A",
"B", "A", "C", "B", "A", "B", "A", "B", "C"), start = c(0, 6,
7, 8, 10, 0, 3, 5, 6, 7), end = c(6, 7, 8, 10, 11, 3, 5, 6, 7,
11)), row.names = c(NA, -10L), class = "data.frame")
df
id zone start end
1 1 A 0 6
2 1 B 6 7
3 1 A 7 8
4 1 C 8 10
5 1 B 10 11
6 2 A 0 3
7 2 B 3 5
8 2 A 5 6
9 2 B 6 7
10 2 C 7 11
如果我们查看每个区域,我们可以直观地检查 ID 处于同一区域和不在同一区域的时间:
split(df,df$zone)
$A
id zone start end
1 1 A 0 6
3 1 A 7 8
6 2 A 0 3
8 2 A 5 6
$B
id zone start end
2 1 B 6 7
5 1 B 10 11
7 2 B 3 5
9 2 B 6 7
$C
id zone start end
4 1 C 8 10
10 2 C 7 11
例如,1 和 2 都在 0-3 和 5-6 的区域 A 中,但在其他时间不是。
期望输出
我想提取三个数据帧。
- 显示它们在一起的时间和区域的数据框:
zone start end id
1 A 0 3 1-2
2 A 5 6 1-2
3 B 6 7 1-2
4 C 8 10 1-2
2 & 3:当它们不在一起时的数据帧:
#id=1
zone start end
1 A 3 5
2 A 7 8
3 B 10 11
#id=2
zone start end
1 B 3 5
2 C 7 8
3 C 10 11
我一直在尝试使用foverlapsfromdata.table 和intervals包,但似乎无法找到正确的方法。
例如,对每个区域/ID 进行子集化,我可以得到一个包含重叠的输出,但这似乎不是正确的方向:
A <- split(df,df$zone)$A
Asp <- split(A,A$id)
x <- setDT(Asp[[1]])
y <- setDT(Asp[[2]])
setkey(y, start, end)
foverlaps(x, y, type="any")
id zone start end i.id i.zone i.start i.end
1: 2 A 0 3 1 A 0 6
2: 2 A 5 6 1 A 0 6
3: NA <NA> NA NA 1 A 7 8
非常感谢任何帮助。
编辑:额外的示例数据集似乎给当前建议的解决方案带来了一些问题:
df2 <- structure(list(start = c(0, 5, 6, 8, 10, 13, 15, 20, 22, 26,
29, 37, 40, 42, 0, 3, 6, 9, 15, 20, 25, 33, 35, 40), id = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2), zone = c("A", "B", "A", "D", "C", "B", "C", "B", "A",
"B", "A", "D", "C", "D", "A", "B", "C", "D", "A", "B", "C", "B",
"A", "D"), end = c(5, 6, 8, 10, 13, 15, 20, 22, 26, 29, 37, 40,
42, 45, 3, 6, 9, 15, 20, 25, 33, 35, 40, 45)), class = c("data.table", "data.frame"), row.names = c(NA, -24L))
df2
start id zone end
1: 0 1 A 5
2: 5 1 B 6
3: 6 1 A 8
4: 8 1 D 10
5: 10 1 C 13
6: 13 1 B 15
7: 15 1 C 20
8: 20 1 B 22
9: 22 1 A 26
10: 26 1 B 29
11: 29 1 A 37
12: 37 1 D 40
13: 40 1 C 42
14: 42 1 D 45
15: 0 2 A 3
16: 3 2 B 6
17: 6 2 C 9
18: 9 2 D 15
19: 15 2 A 20
20: 20 2 B 25
21: 25 2 C 33
22: 33 2 B 35
23: 35 2 A 40
24: 40 2 D 45
start id zone end