Python: How to load multiple web pages in parallel
๐ง By: Konstantin Shutkin
This article describes how to load the content of multiple web pages from multiple urls in parallel with python.
Step 1. Installation of aiohttp
First you need to install an aiohttp package. To install aiohttp run the command:
pip install aiohttp[speedups]
The [speedups]
suffix is needed to install aiohttp accelerating packages - aiodns and cchardet.
Step 2. Creation of a script
Then create a main.py file with this code:
import aiohttp
import asyncio
import socket
async def fetch_urls(urls):
resolver = aiohttp.AsyncResolver()
connector = aiohttp.TCPConnector(resolver=resolver, family=socket.AF_INET, use_dns_cache=False)
session = aiohttp.ClientSession(connector=connector)
async def fetch_url(url, session):
async with session.get(url) as resp:
print(resp.status)
print(await resp.text())
tasks = [fetch_url(url, session) for url in urls]
await asyncio.gather(*tasks)
await session.close()
loop = asyncio.get_event_loop()
urls = ['http://httpbin.org/get?key=value1', 'http://httpbin.org/get?key=value2', 'http://httpbin.org/get?key=value3']
loop.run_until_complete(fetch_urls(urls))
Now you can run main.py file with the command:
python3 main.py
You will see this output:
200
{
"args": {
"key": "value2"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
...
All three queries will be executed in parallel. You can add any urls to the urls
list, for example:
urls = ['https://yandex.com', 'https://google.com', 'https://yahoo.com']
In order to make HEAD, POST, PUT, DELETE requests, just replace session.get(url)
in your code with the appropriate method:
session.post('http://httpbin.org/post', data=b'data')
session.put('http://httpbin.org/put', data=b'data')
session.delete('http://httpbin.org/delete')
session.head('http://httpbin.org/get')
session.options('http://httpbin.org/get')
session.patch('http://httpbin.org/patch', data=b'data')