Digital Studium

Blog about Linux, DevOps and cloud technologies

๐ŸŒ ะ ัƒััะบะธะน

Digital Studium

Python: How to load multiple web pages in parallel


๐Ÿง˜ By: Konstantin Shutkin

This article describes how to load the content of multiple web pages from multiple urls in parallel with python.

Step 1. Installation of aiohttp

First you need to install an aiohttp package. To install aiohttp run the command:

pip install aiohttp[speedups]

The [speedups] suffix is needed to install aiohttp accelerating packages - aiodns and cchardet.

Step 2. Creation of a script

Then create a main.py file with this code:

import aiohttp
import asyncio
import socket


async def fetch_urls(urls):
    resolver = aiohttp.AsyncResolver()
    connector = aiohttp.TCPConnector(resolver=resolver, family=socket.AF_INET, use_dns_cache=False)
    session = aiohttp.ClientSession(connector=connector)

    async def fetch_url(url, session):
        async with session.get(url) as resp:
            print(resp.status)
            print(await resp.text())

    tasks = [fetch_url(url, session) for url in urls]
    await asyncio.gather(*tasks)
    await session.close()


loop = asyncio.get_event_loop()

urls = ['http://httpbin.org/get?key=value1', 'http://httpbin.org/get?key=value2', 'http://httpbin.org/get?key=value3']

loop.run_until_complete(fetch_urls(urls))                           

Now you can run main.py file with the command:

python3 main.py

You will see this output:

200
{
    "args": {
    "key": "value2"
    }, 
    "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
...             

All three queries will be executed in parallel. You can add any urls to the urls list, for example:

urls = ['https://yandex.com', 'https://google.com', 'https://yahoo.com']

In order to make HEAD, POST, PUT, DELETE requests, just replace session.get(url) in your code with the appropriate method:

session.post('http://httpbin.org/post', data=b'data')
session.put('http://httpbin.org/put', data=b'data')
session.delete('http://httpbin.org/delete')
session.head('http://httpbin.org/get')
session.options('http://httpbin.org/get')
session.patch('http://httpbin.org/patch', data=b'data')