根据标签值垂直重新排列Sankey图
我正在尝试在Sankey 图中绘制 3 个集群之间的患者流量。我有一个counts带有 from-to 值的 pd.DataFrame ,见下文。为了重现这个 DF,这里是counts应该加载到 pd.DataFrame的dict(这是visualize_cluster_flow_counts 函数的输入)。
from to value
0 C1_1 C1_2 867
1 C1_1 C2_2 405
2 C1_1 C0_2 2
3 C2_1 C1_2 46
4 C2_1 C2_2 458
... ... ... ...
175 C0_20 C0_21 130
176 C0_20 C2_21 1
177 C2_20 C1_21 12
178 C2_20 C0_21 0
179 C2_20 C2_21 96
DataFrame 中的from和to值表示集群编号(0、1 或 2)和 x 轴的天数(介于 1 和 21 之间)。如果我用这些值绘制桑基图,结果如下:
代码:
import plotly.graph_objects as go
def visualize_cluster_flow_counts(counts):
all_sources = list(set(counts['from'].values.tolist() + counts['to'].values.tolist()))
froms, tos, vals, labs = [], [], [], []
for index, row in counts.iterrows():
froms.append(all_sources.index(row.values[0]))
tos.append(all_sources.index(row.values[1]))
vals.append(row[2])
labs.append(row[3])
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 5,
line = dict(color = "black", width = 0.1),
label = all_sources,
color = "blue"
),
link = dict(
source = froms,
target = tos,
value = vals,
label = labs
))])
fig.update_layout(title_text="Patient flow between clusters over time: 48h (2 days) - 504h (21 days)", font_size=10)
fig.show()
visualize_cluster_flow_counts(counts)
但是,我想对条形进行垂直排序,以便 C0始终在顶部,C1始终在中间,而 C2始终在底部(或相反,无关紧要)。我知道,我们可以设置node.x并node.y以手动指定的坐标。因此,我将 x 值设置为天数 *(1/天数范围),增量为 +- 0.045。我根据集群值设置 y 值:0、0.5 或 1。然后我获得了下面的图像。垂直顺序很好,但条形之间的垂直边距明显偏离;它们应该与第一个结果相似。
产生这个的代码是:
import plotly.graph_objects as go
def find_node_coordinates(sources):
x_nodes, y_nodes = [], []
for s in sources:
# Shift each x with +- 0.045
x = float(s.split("_")[-1]) * (1/21)
x_nodes.append(x)
# Choose either 0, 0.5 or 1 for the y-value
cluster_number = s[1]
if cluster_number == "0": y = 1
elif cluster_number == "1": y = 0.5
else: y = 1e-09
y_nodes.append(y)
return x_nodes, y_nodes
def visualize_cluster_flow_counts(counts):
all_sources = list(set(counts['from'].values.tolist() + counts['to'].values.tolist()))
node_x, node_y = find_node_coordinates(all_sources)
froms, tos, vals, labs = [], [], [], []
for index, row in counts.iterrows():
froms.append(all_sources.index(row.values[0]))
tos.append(all_sources.index(row.values[1]))
vals.append(row[2])
labs.append(row[3])
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 5,
line = dict(color = "black", width = 0.1),
label = all_sources,
color = "blue",
x = node_x,
y = node_y,
),
link = dict(
source = froms,
target = tos,
value = vals,
label = labs
))])
fig.update_layout(title_text="Patient flow between clusters over time: 48h (2 days) - 504h (21 days)", font_size=10)
fig.show()
visualize_cluster_flow_counts(counts)
问题:如何修复条形的边距,使结果看起来像第一个结果?因此,为了清楚起见:应该将条形推到底部。或者还有另一种方法可以让桑基图根据标签值自动对条形图进行垂直重新排序?