为什么生成大量随机数据的速度要慢得多？

html5 • 2022年11月13日 am1:01 • 问答

我想生成大量的随机数。我编写了以下 bash 命令（请注意，我在cat这里使用是出于演示目的；在我的实际用例中，我将数字通过管道传输到进程中）：

for i in {1..99999999}; do echo -e "$(cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1)"; done | cat

数字以非常低的速度打印。但是，如果我生成较小的数量，它会快得多：

for i in {1..9999}; do echo -e "$(cat /dev/urandom | tr -dc '0-9' | fold -w 5 | head -n 1)"; done | cat

请注意，唯一的区别是9999而不是99999999。

为什么是这样？数据是否在某处缓冲？有没有办法优化这一点，以便随机数cat立即通过管道/流传输？

回答

为什么是这样？

生成{1..99999999}100000000 个参数然后解析它们需要从 bash 分配大量内存。这显着地拖延了整个系统。

此外，从读取大量数据/dev/urandom，大约 96% 的数据被过滤掉tr -dc '0-9'。这会显着耗尽熵池并额外拖延整个系统。

数据是否在某处缓冲？

每个进程都有自己的缓冲区，所以：

cat /dev/urandom 正在缓冲
tr -dc '0-9' 正在缓冲
fold -w 5 正在缓冲
head -n 1 正在缓冲
管道的左侧 - 外壳，有自己的缓冲区
和右侧 -| cat有自己的缓冲区

那是 6 个缓冲位置。即使忽略来自head -n1管道右侧和来自管道右侧的输入缓冲| cat，也有 4 个输出缓冲区。

此外，拯救动物并停止虐待猫。使用tr </dev/urandom, 而不是cat /dev/urandom | tr。有趣的事实 -tr不能将文件名作为参数。

有没有办法优化这一点，以便随机数立即通过管道/流传输到 cat 中？

删除整个代码。

根据需要从随机源中获取尽可能少的字节。要生成 32 位数字，您只需要 32 位——不再需要. 要生成 5 位数字，您只需要 17 位 - 舍入为 8 位字节，即只有 3 个字节。这tr -dc '0-9'是一个很酷的技巧，但绝对不应该在任何实际代码中使用它。

奇怪的是，最近我回答了一个类似的问题，从那里复制代码，你可以：

for ((i=0;i<100000000;++i)); do echo "$((0x$(dd if=/dev/urandom of=/dev/stdout bs=4 count=1 status=none | xxd -p)))"; done | cut -c-5
# cut to take first 5 digits

但这仍然会慢得令人无法接受，因为它为每个随机数运行 2 个进程（我认为只取前 5 位数字的分布会很糟糕）。

我建议使用$RANDOM, 在 bash 中可用。如果没有，$SRANDOM如果你真的想要 /dev/urandom（并且真的知道你为什么想要它），就使用它。如果没有，我建议/dev/urandom用真正的编程语言编写随机数生成，如 C、C++、python、perl、ruby。我相信一个人可以把它写在awk.

以下看起来不错，但仍然将二进制数据转换为十六进制，只是稍后将它们转换为十进制是该 shell 无法处理二进制数据的解决方法：

count=10;
# take count*4 bytes from input
dd if=/dev/urandom of=/dev/stdout bs=4 count=$count status=none |
# Convert bytes to hex 4 bytes at a time
xxd -p -c 4 |
# Convert hex to decimal using GNU awk
awk --non-decimal-data '{printf "%dn", "0x"$0}'

`/dev/urandom` shouldn't care about "depleting the pool", though at least the Linux implementation is slower than many other CSPRNGs (unless it's changed recently). 5 decimal digits is 5*ln(10)/ln(2) ~= 16.5 bits worth, so 17 bits is can represent anything < 99999 (2^17 is 131072). But of course the distribution is abysmal if you just take the last 5 digits without regard for if the number is > 99999. But any change from N bits to M-digit decimal numbers gives a skewed result if you don't account for the ranges not matching.

以上是为什么生成大量随机数据的速度要慢得多？的全部内容。

THE END

二维码

bzero不做它所说的

< <上一篇

调用docker-composebuild命令时执行Springbuildpacks

下一篇>>

搜索内容

为什么生成大量随机数据的速度要慢得多？

回答

目录

目录

推荐文章

最新文章