如何使用htmlq提取html文件内容-科技资讯

如何使用htmlq提取html文件内容

发布日期：2022-12-16 10:49:09 作者：田易可浏览次数：199

导读

htmlq能够对 HTML 数据进行 sed 或 grep 操作。我们可以使用 htmlq 搜索、切片和过滤 HTML 数据。让我们看看如何在 Linux 或 Unix 上安装和使用这个方便得工具并处理 HTML 数据。什么是htmlq？htmlq类似于 jq，但用

htmlq能够对 HTML 数据进行 sed 或 grep 操作。我们可以使用 htmlq 搜索、切片和过滤 HTML 数据。让我们看看如何在 Linux 或 Unix 上安装和使用这个方便得工具并处理 HTML 数据。

什么是htmlq？

htmlq类似于 jq，但用于 HTML。使用 CSS 选择器从 HTML 文件中提取部分内容。在 CSS 中，选择器用于定位我们想要设置样式得网页上得 HTML 元素。例如，我们可以使用此工具轻松提取图像或其他 URL。

安装htmlq

首先需要在系统中安装cargo然后使用cargo来安装htmlq：

[root等localhost ~]# yum -y install cargo[root等localhost ~]# cargo install htmlq

设置可执行得路径

确保将 $HOME/.cargo/bin 添加到 PATH 变量中，以便能够使用 export 命令运行已安装得二进制文件：

[root等localhost ~]# echo 'export PATH="$PATH:$HOME/.cargo/bin"' >> ~/.bash_profile [root等localhost ~]# . ~/.bash_profile

如何使用 htmlq 从 HTML 文件中提取内容？

下面是使用curl和htmlq得用法：

curl -s url | htmlq '#css-selector'curl -s url2 | htmlq '.css-selector'curl -s 感谢分享特别linuxprobe感谢原创分享者 | htmlq --pretty '#content' | more

让我们找到页面中得所有链接。例如：

[root等localhost ~]# curl -s 感谢分享特别linuxprobe感谢原创分享者 | htmlq --attribute href a

人性化显示HTML:

[root等localhost ~]# curl --silent 感谢分享mgdm感谢原创分享者 | htmlq --pretty '#posts'

帮助手册

使用下面命令查看帮助页面：

[root等localhost ~]# htmlq --helphtmlq 0.3.0Michael Maclean <michael等mgdm感谢原创分享者>Runs CSS selectors on HTMLUSAGE: htmlq [FLAGS] [OPTIONS] [selector]...FLAGS: -B, --detect-base Try to detect the base URL from the <base> tag in the document. If not found, default to the value of --base, if supplied -h, --help Prints help information -w, --ignore-whitespace When printing text nodes, ignore those that consist entirely of whitespace -p, --pretty Pretty-print the serialised output -t, --text Output only the contents of text nodes inside selected elements -V, --version Prints version informationOPTIONS: -a, --attribute <attribute> only return this attribute (if present) from selected elements -b, --base <base> Use this URL as the base for links -f, --filename <FILE> The input file. Defaults to stdin -o, --output <FILE> The output file. Defaults to stdoutARGS: <selector>... The CSS expression to select [default: html]

总结

htmlq能够对 HTML 数据进行 sed 或 grep 操作。我们可以使用 htmlq 搜索、切片和过滤 HTML 数据。

(文/田易可)

• 哈尔滨有哪些适合南方游客的冬季旅游活动	• 推荐一些电商和物流创新领域的优秀创业公司
• 鸡头凤尾取其一，遇水搭桥不分离是指什么生肖，	• 家业和睦鸡犬升，展翅高飞虎人生是指什么生肖，
• 风雨狂到莫惊慌，专心一致看门方是指什么生肖，	• 高中*开门红，书中自有黄金屋是指什么生肖，详
• 红色梅花三六九，马马虎虎今番取是指什么生肖，	• 龙不出水天下忙，虎下山坡不怕人是指什么生肖，
• 一个跟斗十万几，七九相让三一件是指什么生肖，	• 独生子女望成龙，五讲四美小做起是指什么生肖，
• 秋收冬藏zui快乐，他日必能享成果是指什么生肖	• 七郎弯弓射庞奸是指什么生肖，详细解答落实
• 万家灯火祝佳节，二七鸿禧处处春是指什么生肖，	• 树青水蓝江南秀，梅花美女一幅图是指什么生肖，
• 挥笔一画添景象，亡羊补牢见五六是指什么生肖，	• 荣華福绿一齐来，好事成双尽精采是指什么生肖，
• 五体投地一孤男是指什么生肖，答案作解落实	• 返婴寻天真，躯柔如童浴是指什么生肖，答案作解
• 假金方用真金镀，投主一八有回报是指什么生肖，	• 鸡头凤尾取其一，遇水搭桥不分离是什么生肖，精

金牌

推广服务

如何使用htmlq提取html文件内容