小言_互联网的博客

scrapy框架使用教程1

462人阅读  评论(0)

开发准备,安装scrapy框架

Scrapy是Python开发的一个快速、高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。

** 1,基础环境搭建python开发 安装最新版python (省略)**

详细安装方法

** 2,安装scrapy**

pip install scrapy
(或者)
pip3 install scrapy

安装界面

PS C:\WINDOWS\system32> pip install scrapy
Collecting scrapy
  Using cached https://files.pythonhosted.org/packages/29/4b/585e8e111ffb01466c59281f34febb13ad1a95d7fb3919fd57c33fc732a5/Scrapy-1.7.3-py2.py3-none-any.whl
Collecting lxml; python_version != "3.4" (from scrapy)
  Using cached https://files.pythonhosted.org/packages/bc/87/c3cecadcb5d7924cd71724b177343149cfc3609a89b197a991ac8593ed8c/lxml-4.4.1-cp37-cp37m-win_amd64.whl
Collecting w3lib>=1.17.0 (from scrapy)
  Using cached https://files.pythonhosted.org/packages/6a/45/1ba17c50a0bb16bd950c9c2b92ec60d40c8ebda9f3371ae4230c437120b6/w3lib-1.21.0-py2.py3-none-any.whl
Collecting queuelib (from scrapy)
  Using cached https://files.pythonhosted.org/packages/4c/85/ae64e9145f39dd6d14f8af3fa809a270ef3729f3b90b3c0cf5aa242ab0d4/queuelib-1.5.0-py2.py3-none-any.whl
Collecting service-identity (from scrapy)
  Using cached https://files.pythonhosted.org/packages/e9/7c/2195b890023e098f9618d43ebc337d83c8b38d414326685339eb024db2f6/service_identity-18.1.0-py2.py3-none-any.whl
Collecting Twisted>=13.1.0; python_version != "3.4" (from scrapy)
  Using cached https://files.pythonhosted.org/packages/ee/d9/5b79fef4a7d7dc4d526151904eae5dd207f80433ae646a258b32abbe77d4/Twisted-19.7.0-cp37-cp37m-win_amd64.whl
  ...
    Using cached https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl
Requirement already satisfied: setuptools in c:\program files (x86)\microsoft visual studio\shared\python37_64\lib\site-packages (from zope.interface>=4.4.2->Twisted>=13.1.0; python_version != "3.4"->scrapy) (40.8.0)
Collecting idna>=2.5 (from hyperlink>=17.1.1->Twisted>=13.1.0; python_version != "3.4"->scrapy)
  Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl
Collecting pycparser (from cffi!=1.11.3,>=1.8->cryptography->service-identity->scrapy)
  Using cached https://files.pythonhosted.org/packages/68/9e/49196946aee219aead1290e00d1e7fdeab8567783e83e1b9ab5585e6206a/pycparser-2.19.tar.gz
Installing collected packages: lxml, six, w3lib, queuelib, attrs, pyasn1, pyasn1-modules, pycparser, cffi, asn1crypto, cryptography, service-identity, Automat, zope.interface, incremental, constantly, PyHamcrest, idna, hyperlink, Twisted, PyDispatcher, cssselect, parsel, pyOpenSSL, scrapy
  Running setup.py install for pycparser ... done
  Running setup.py install for PyDispatcher ... done
Successfully installed Automat-0.7.0 PyDispatcher-2.0.5 PyHamcrest-1.9.0 Twisted-19.7.0 asn1crypto-0.24.0 attrs-19.1.0 cffi-1.12.3 constantly-15.1.0 cryptography-2.7 cssselect-1.1.0 hyperlink-19.0.0 idna-2.8 incremental-17.5.0 lxml-4.4.1 parsel-1.5.2 pyOpenSSL-19.0.0 pyasn1-0.4.6 pyasn1-modules-0.2.6 pycparser-2.19 queuelib-1.5.0 scrapy-1.7.3 service-identity-18.1.0 six-1.12.0 w3lib-1.21.0 zope.interface-4.6.0
至此scrapy安装成功

首次安装可能报错,因为电脑上可以能没有某些依赖环境 如果报错可根据报错信息安装下面安装包
pyOpenSSL:在官网下载wheel文件。
Twisted:在官网下载wheel文件。
PyWin32:在官网下载wheel文件。

扩展阅读
xpath语法
pymysql操作数据库

######百度下 json csv文件的格式


转载:https://blog.csdn.net/cetd123/article/details/102485466
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场