You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have a problem when query zh.wikipedia, here is my code and console output.
import wptools
page = wptools.page('西安', lang='zh')
page.get_query(proxy='http://127.0.0.1:1080') # local proxy
zh.wikipedia.org (query) 西安
zh.wikipedia.org (query) 西安市 (&plcontinue=7536|0|炮里街道)
Traceback (most recent call last):
File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 199, in _load_response
data = utils.json_loads(response)
File "d:\software\Anaconda\lib\site-packages\wptools\utils.py", line 95, in json_loads
return json.loads(data, encoding='utf-8')
File "d:\software\Anaconda\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "d:\software\Anaconda\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "d:\software\Anaconda\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\software\Anaconda\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\software\Anaconda\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
cli.main()
File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
run()
File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
File "d:\software\Anaconda\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "d:\software\Anaconda\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "d:\software\Anaconda\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "d:\项目代码\wiki\wikitools_exp.py", line 3, in <module>
page.get_query(proxy='http://127.0.0.1:1080') # 本地建立代理
File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 641, in get_query
File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 183, in _get
self._set_data(action)
File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 200, in _set_data
self._set_query_data(action)
File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 295, in _set_query_data
data = self._load_response(action)
File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 201, in _load_response
raise ValueError(_query)
ValueError: https://zh.wikipedia.org/w/api.php?action=query&exintro&formatversion=2&inprop=url|watchers&list=random&pithumbsize=240&pllimit=500&ppprop=disambiguation|wikibase_item&prop=extracts|info|links|pageassessments|pageimages|pageprops|pageterms|redirects&redirects&rdlimit=500&rnlimit=1&rnnamespace=0&titles=%E8%A5%BF%E5%AE%89%E5%B8%82&plcontinue=7536|0|炮里街道
I've noticed that when the query of "西安" is finished, wptools continued to query "炮里街道", that is not what I needed. So I further read the source code and in the file page.py, line 640, it seems that wptools try to make more queries from the "continue" field.
Issue 57 said that this is a new support, but this support should be an option implement in the function "get_querymore" (line 645). However, this continuation support is now implemented in function "get_query" too. I believe that this is a little bug to be fixed.
Although redundant, it still works well for en.wikipedia. But when query zh.wikipedia, there seems something wrong with the URL and pycurl always returns "Bad request" (core.py line 175), which is not an json format and cannot be dumped by json. So I believe that this is another bug.
At present, I simply delete line 640-641 of page.py and it works very well to me. Looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Hello, I have a problem when query zh.wikipedia, here is my code and console output.
I've noticed that when the query of "西安" is finished, wptools continued to query "炮里街道", that is not what I needed. So I further read the source code and in the file page.py, line 640, it seems that wptools try to make more queries from the "continue" field.
Issue 57 said that this is a new support, but this support should be an option implement in the function "get_querymore" (line 645). However, this continuation support is now implemented in function "get_query" too. I believe that this is a little bug to be fixed.
Although redundant, it still works well for en.wikipedia. But when query zh.wikipedia, there seems something wrong with the URL and pycurl always returns "Bad request" (core.py line 175), which is not an json format and cannot be dumped by json. So I believe that this is another bug.
At present, I simply delete line 640-641 of page.py and it works very well to me. Looking forward to your reply.
The text was updated successfully, but these errors were encountered: