有没有办法在PythonAnywhere上使用tika库?

huangapple 未分类评论44阅读模式
英文:

Is there any way to use tika library on pythonanywhere?

问题

我正在解决一个解析问题,并且已经在本地系统中使用了tika库来读取PDF文档。由于我现在要在Web上部署我的解析器,所以不允许在pythonanywhere服务器上使用tika库。我已经阅读到pythonanywhere不支持tika,但我仍然在服务器上导入并安装,而且没有出现错误。我已经被困在这个问题上好几天了。

2020-04-08 11:49:26,003: 正在检索http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar 到 /tmp/tika-server.jar。
2020-04-08 11:49:26,006: 在 /parser [POST] 上出现异常
跟踪 (Traceback) 最近一次的调用 (most recent call):
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/tika.py",行 (line) 798,调用 (call) 中的 getRemoteJar
    urlretrieve(urlOrPath, destPath)
  ...
  ...
urllib.error.HTTPError: HTTP错误 403: 被禁止访问

**没有匹配 (NO MATCH)**
在处理上述异常期间,另一个异常出现:

**没有匹配 (NO MATCH)**
Traceback (跟踪) 最近一次的调用 (most recent call):
  ...
  ...
urllib.error.HTTPError: HTTP错误 403: 被禁止访问

**没有匹配 (NO MATCH)**

**非常感谢任何解决方案!**
英文:

I'm working on a parsing problem and have used tika library in local system to read pdf documents. As now I'm deploying my parser on the web, I'm not allowed to use tika library on pythonanywhere server. I have read that pythonanywhere don't support tika but I'm import and installing anyway on server without error. I'm stuck with it from couple days now.

2020-04-08 11:49:26,003: Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to /tmp/tika-server.jar.
2020-04-08 11:49:26,006: Exception on /parser [POST]
Traceback (most recent call last):
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/tika.py", line 798, in getRemoteJar
    urlretrieve(urlOrPath, destPath)
  File "/usr/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
**NO MATCH**
During handling of the above exception, another exception occurred:
**NO MATCH**
Traceback (most recent call last):
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/mubasharnazar/mysite/server.py", line 1546, in hello
    response = read_pdf(name.filename)
  File "/home/mubasharnazar/mysite/server.py", line 115, in read_pdf
    file_data = parser.from_file(file)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/parser.py", line 40, in from_file
    output = parse1(service, filename, serverEndpoint, headers=headers, config_path=config_path, requestOptions=requestOptions)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/tika.py", line 338, in parse1
    rawResponse=rawResponse, requestOptions=requestOptions)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/tika.py", line 531, in callServer
    serverEndpoint = checkTikaServer(scheme, serverHost, port, tikaServerJar, classpath, config_path)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/tika.py", line 592, in checkTikaServer
    getRemoteJar(tikaServerJar, jarPath)
  File "/home/mubasharnazar/.virtualenvs/flaskk/lib/python3.7/site-packages/tika/tika.py", line 808, in getRemoteJar
    urlretrieve(urlOrPath, destPath)
  File "/usr/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Any solution would be highly appreciated?

huangapple
  • 本文由 发表于 2020年4月7日 23:26:12
  • 转载请务必保留本文链接:https://java.coder-hub.com/61083601.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定