英文:
Novice trying to learn java, html, curl or whatever toward learning how to automate an interaction with an online dictionary
问题
背景信息:
我是一个有一定经验的退休业余爱好者,在开发Windows应用程序方面有中等经验。我正在尝试学习Java、HTML、Curl或其他相关内容,以便学习如何自动化与在线词典的交互。我这么做的目的是为了支持我正在开发的一些文字益智游戏。
我的系统是一台运行64位Windows专业版的HP笔记本电脑。我使用MS Visual Studio 2015 Express来开发我的应用程序。
不用说,我对于Curl方面一无所知。是的,我已经阅读了所有我能找到的关于HTML、Java和Curl的在线文档和教程,但我很快发现这些信息超出了我的理解范围。
所以如果这篇帖子模糊或者不够具体,请原谅。我正在尽我最大的努力。
我觉得我可能已经在正确的轨道上,但不知道如何捕获来自在线服务器的响应。有人可以引导我朝着上述目标努力吗?感谢您关注这个问题。
罗伯特·赫赫(Robert Hoech)
问题陈述:
命令行:
C:\Users\Robert\Documents\Rob\CURL>curl -v http://www.merriam-webster/dictionary
/capricious >data.txt
写入到data.txt的内容:
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
写入到控制台的内容:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0*
Trying 198.105.254.23:80...
0 0 0 0 0 0 0 0 --:--:-- 0:00:20 --:--:-- 0*
连接到198.105.254.23端口80失败:超时
* 正在尝试 198.105.244.23:80...
0 0 0 0 0 0 0 0 --:--:-- 0:00:21 --:--:-- 0*
已连接到www.merriam-webster(198.105.244.23)端口80(#0)
> GET /dictionary/capricious HTTP/1.1
> Host: www.merriam-webster
> User-Agent: curl/7.71.1
> Accept: */*
>
* 将捆绑标记为不支持多用途
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Fri, 24 Jul 2020 20:48:42 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: close
< Location: http://localhost
< Expires: Fri, 24 Jul 2020 20:48:41 GMT
< Cache-Control: no-cache
<
{ [189字节的数据]
100 178 0 178 0 0 8 0 --:--:-- 0:00:21 --:--:-- 43
* 关闭连接 0
英文:
BACKGROUND INFO:
I am a retired duffer with moderate experience developing windows applications for PC. I am trying to learn java, html, curl or whatever toward learning how to automate an interaction with an online dictionary. My purpose in doing so is to support some word puzzle games I am developing.
My system is a HP laptop running 64-bit Windows Professional. I develop my apps using MS visual studio 2015 Express.
Needless to say, I have no idea what I am doing vis-à-vis curl. Yes, I have read all of the online documentation and tutorials on html, java and curl I could find, but I very quickly find said info over my head.
So please forgive me if this posting is vague or insufficiently specific. I’m doing the best I can.
I feel like I might be on the right track but do not know how to capture the response from the online server. Can someone steer me toward achieving my goal stated above? Thank you for attending to this.
Robert Hoech
ISSUE STATEMENT:
CMD:
C:\Users\Robert\Documents\Rob\CURL>curl -v http://www.merriam-webster/dictionary
/capricious >data.txt
WRITTEN TO data.txt
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>
WRITTEN TO CONSOLE:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0*
Trying 198.105.254.23:80...
0 0 0 0 0 0 0 0 --:--:-- 0:00:20 --:--:-- 0*
connect to 198.105.254.23 port 80 failed: Timed out
* Trying 198.105.244.23:80...
0 0 0 0 0 0 0 0 --:--:-- 0:00:21 --:--:-- 0*
Connected to www.merriam-webster (198.105.244.23) port 80 (#0)
> GET /dictionary/capricious HTTP/1.1
> Host: www.merriam-webster
> User-Agent: curl/7.71.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 301 Moved Permanently
< Server: nginx
< Date: Fri, 24 Jul 2020 20:48:42 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: close
< Location: http://localhost
< Expires: Fri, 24 Jul 2020 20:48:41 GMT
< Cache-Control: no-cache
<
{ [189 bytes data]
100 178 0 178 0 0 8 0 --:--:-- 0:00:21 --:--:-- 43
* Closing connection 0
答案1
得分: 0
通过“curl”获得的响应是一个301重定向到“http://localhost”。这意味着从“您的机器”上读取文件。显然,那样行不通...
我认为这是麦里奥-韦伯斯特网站告诉您,您正在尝试的操作不被他们的服务条款所允许。
如果您想忽略这一点(并冒着可能的诉讼风险!),您可以尝试更改代理字符串,以欺骗该网站,让其认为您的应用程序是一个网络浏览器。
一个更好的方法是:
- 联系麦里奥-韦伯斯特,看看是否有一个API服务(或类似的东西)可以允许您使用。(您可能需要付费。)
- 尝试寻找另一个免费的在线词典。(确保您彻底阅读服务条款!)
- 找到一个免费的固定词汇列表,并将其嵌入到您的应用程序中。
更新 - 所以我自己尝试了一下,这是我找到的:
-
您正在使用的URL不正确:
http://www.merriam-webster/dictionary/capricious
主机名部分不正确。它应该是“www.merriam-webster.com”。
-
如果您使用正确的主机名和“http:”,它会重定向到“https:”。
-
对以下URL使用curl:
https://www.merriam-webster.com/dictionary/capricious
会返回一个HTML文档,其中似乎包含了定义。在实践中是否可以进行抓取...我无法确定。(但我无法让页面注意到“Accept:text/plain”头部。)
但我还发现了一个官方的麦里奥-韦伯斯特API;请参阅https://dictionaryapi.com/。该页面说明有免费/非商业用途的选项。
英文:
The response you are getting via "curl" is a 301 Redirect to "http://localhost". That means read the file off "your machine". Obviously, that won't work ...
I think that this is the Merriam-Webster site telling you that what you are trying to do is not permitted by their Terms of Service.
If you wanted to ignore that (and risk a possible lawsuit!) you could try changing the agent string to trick the website into thinking your application is a web browser.
A better idea would be:
- Contact Merriam-Webster to see if there is an API service (or something) that you are permit you to use. (You may have to pay for it.)
- Trying and find an alternative free online dictionary. (Make sure that you read the Terms of Service thoroughly!)
- Find a free fixed word list and embed it in your application.
UPDATE - So I tried this myself, and here is what I found:
-
The URL you are using is incorrect:
http://www.merriam-webster/dictionary/capricious
The hostname part is incorrect. It should be "www.merriam-webster.com".
-
If you use "http:" with the correct hostname, it redirects to "https:"
-
Curling
https://www.merriam-webster.com/dictionary/capricious
gives an HTML document which looks like it has the definition in it. Whether it is scrape-able in practice ... I can't say. (But I couldn't get the page to pay attention to an "Accept: text/plain" header.)
But I also found that there is an official Merriam-Webster API; see https://dictionaryapi.com/. The page says that there is a free / non-commercial use option.
专注分享java语言的经验与见解,让所有开发者获益!
评论