使用rvest（或其他R包）检测HTML段落的开头何时是不同的格式（例如emboldened）

html5 • 2022年8月30日 pm1:29 • 问答

我正在使用 R 包 edgarWebR 来解析 SEC 文件，例如https://www.sec.gov/Archives/edgar/data/1060224/000090480206000008/sa10k306.htm。它返回一个数据框，其中一列（称为“原始”）是 HTML。它将 HTML 页面分解为段落，每段一行：

其他栏目	生的	文本
第一排	`<p><font><i>We had a net loss of $1.</i><i><b>55</b></i><i> million for the year ended December 31, 201</i><i>6</i><i> and have an accumulated deficit of $</i><i>61.5</i><i> million as of December 31, 201</i><i>6</i><i>. To achieve sustainable profitability, we must generate increased revenue.</i></font></p>`	截至 2016 年 12 月 31 日止年度，我们的净亏损为 155 万美元，截至 2016 年 12 月 31 日的累计亏损为 6150 万美元。为了实现可持续盈利，我们必须增加收入。
第二排	`<div><font>We have a history of losses, and we cannot assure you that we will achieve profitability.</font></div>`	我们有亏损的历史，我们不能向您保证我们会实现盈利。

以上是使用rvest（或其他R包）检测HTML段落的开头何时是不同的格式（例如emboldened）的全部内容。

THE END

二维码

为什么我们包含头文件而不是源文件？

< <上一篇

C++中按函数删除对象

下一篇>>

搜索内容

使用rvest（或其他R包）检测HTML段落的开头何时是不同的格式（例如emboldened）

目录

目录

推荐文章

最新文章