We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
似乎是在获取 https://navi.cnki.net/knavi/journals/categories 时出错导致提前结束,用网页打开https://navi.cnki.net/knavi/journals/categories 时只显示学科导航似乎分类丢失
2024-12-26 00:02:11 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: cnki) 2024-12-26 00:02:11 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.3.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows-10-10.0.19045-SP0 2024-12-26 00:02:11 [scrapy.addons] INFO: Enabled addons: [] 2024-12-26 00:02:11 [py.warnings] WARNING: C:\Users\xmh\cnki\Lib\site-packages\scrapy\utils\request.py:120: ScrapyDeprecationWarning: 'REQUEST_FINGERPRINTER_IMPLEMENTATION' is a deprecated setting. It will be removed in a future version of Scrapy. return cls(crawler)
2024-12-26 00:02:11 [asyncio] DEBUG: Using selector: SelectSelector 2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2024-12-26 00:02:11 [scrapy.extensions.telnet] INFO: Telnet Password: d25b876f2a7d90a1 2024-12-26 00:02:11 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.feedexport.FeedExporter', 'scrapy.extensions.logstats.LogStats'] 2024-12-26 00:02:11 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'cnki', 'CONCURRENT_REQUESTS': 3, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 1, 'FEED_EXPORT_ENCODING': 'utf-8', 'NEWSPIDER_MODULE': 'cnki.spiders', 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7', 'SPIDER_MODULES': ['cnki.spiders'], 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'} 2024-12-26 00:02:12 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'cnki.middlewares.CnkiDownloaderMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-12-26 00:02:12 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-12-26 00:02:12 [scrapy.middleware] INFO: Enabled item pipelines: ['cnki.pipelines.CNKIPipeline'] 2024-12-26 00:02:12 [scrapy.core.engine] INFO: Spider opened 2024-12-26 00:02:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-12-26 00:02:12 [journal] INFO: Spider opened: journal 2024-12-26 00:02:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2024-12-26 00:02:13 [scrapy.core.engine] DEBUG: Crawled (403) <POST https://navi.cnki.net/knavi/journals/categories> (referer: None) 2024-12-26 00:02:13 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://navi.cnki.net/knavi/journals/categories>: HTTP status code is not handled or not allowed 2024-12-26 00:02:13 [scrapy.core.engine] INFO: Closing spider (finished) 2024-12-26 00:02:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: dataset/2023-11-30.csv 2024-12-26 00:02:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 417, 'downloader/request_count': 1, 'downloader/request_method_count/POST': 1, 'downloader/response_bytes': 1914, 'downloader/response_count': 1, 'downloader/response_status_count/403': 1, 'elapsed_time_seconds': 0.840805, 'feedexport/success_count/FileFeedStorage': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 12, 25, 16, 2, 13, 524053, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 2925, 'httpcompression/response_count': 1, 'httperror/response_ignored_count': 1, 'httperror/response_ignored_status_count/403': 1, 'items_per_minute': None, 'log_count/DEBUG': 6, 'log_count/INFO': 13, 'log_count/WARNING': 1, 'response_received_count': 1, 'responses_per_minute': None, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2024, 12, 25, 16, 2, 12, 683248, tzinfo=datetime.timezone.utc)} 2024-12-26 00:02:13 [scrapy.core.engine] INFO: Spider closed (finis
The text was updated successfully, but these errors were encountered:
No branches or pull requests
似乎是在获取 https://navi.cnki.net/knavi/journals/categories 时出错导致提前结束,用网页打开https://navi.cnki.net/knavi/journals/categories 时只显示学科导航似乎分类丢失
2024-12-26 00:02:11 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: cnki)
2024-12-26 00:02:11 [scrapy.utils.log] INFO: Versions: lxml 5.3.0.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.2.1, Twisted 24.11.0, Python 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.3.0 (OpenSSL 3.4.0 22 Oct 2024), cryptography 44.0.0, Platform Windows-10-10.0.19045-SP0
2024-12-26 00:02:11 [scrapy.addons] INFO: Enabled addons:
[]
2024-12-26 00:02:11 [py.warnings] WARNING: C:\Users\xmh\cnki\Lib\site-packages\scrapy\utils\request.py:120: ScrapyDeprecationWarning: 'REQUEST_FINGERPRINTER_IMPLEMENTATION' is a deprecated setting.
It will be removed in a future version of Scrapy.
return cls(crawler)
2024-12-26 00:02:11 [asyncio] DEBUG: Using selector: SelectSelector
2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop
2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2024-12-26 00:02:11 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop
2024-12-26 00:02:11 [scrapy.extensions.telnet] INFO: Telnet Password: d25b876f2a7d90a1
2024-12-26 00:02:11 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2024-12-26 00:02:11 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'cnki',
'CONCURRENT_REQUESTS': 3,
'COOKIES_ENABLED': False,
'DOWNLOAD_DELAY': 1,
'FEED_EXPORT_ENCODING': 'utf-8',
'NEWSPIDER_MODULE': 'cnki.spiders',
'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
'SPIDER_MODULES': ['cnki.spiders'],
'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'}
2024-12-26 00:02:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.offsite.OffsiteMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'cnki.middlewares.CnkiDownloaderMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-12-26 00:02:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-12-26 00:02:12 [scrapy.middleware] INFO: Enabled item pipelines:
['cnki.pipelines.CNKIPipeline']
2024-12-26 00:02:12 [scrapy.core.engine] INFO: Spider opened
2024-12-26 00:02:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-12-26 00:02:12 [journal] INFO: Spider opened: journal
2024-12-26 00:02:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-12-26 00:02:13 [scrapy.core.engine] DEBUG: Crawled (403) <POST https://navi.cnki.net/knavi/journals/categories> (referer: None)
2024-12-26 00:02:13 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://navi.cnki.net/knavi/journals/categories>: HTTP status code is not handled or not allowed
2024-12-26 00:02:13 [scrapy.core.engine] INFO: Closing spider (finished)
2024-12-26 00:02:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (0 items) in: dataset/2023-11-30.csv
2024-12-26 00:02:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 417,
'downloader/request_count': 1,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 1914,
'downloader/response_count': 1,
'downloader/response_status_count/403': 1,
'elapsed_time_seconds': 0.840805,
'feedexport/success_count/FileFeedStorage': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 12, 25, 16, 2, 13, 524053, tzinfo=datetime.timezone.utc),
'httpcompression/response_bytes': 2925,
'httpcompression/response_count': 1,
'httperror/response_ignored_count': 1,
'httperror/response_ignored_status_count/403': 1,
'items_per_minute': None,
'log_count/DEBUG': 6,
'log_count/INFO': 13,
'log_count/WARNING': 1,
'response_received_count': 1,
'responses_per_minute': None,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2024, 12, 25, 16, 2, 12, 683248, tzinfo=datetime.timezone.utc)}
2024-12-26 00:02:13 [scrapy.core.engine] INFO: Spider closed (finis
The text was updated successfully, but these errors were encountered: