top of page

Copy Webpages with Python



This lab uses the requests module to copy the UTF-8 text from a webpage and then save that text to a file. This allows you to have a local copy of the webpage.


This is a simplistic example of how easy it is to scrape a webpage. Note that the robots.txt file of a website is not even considered.


Please note the CSS, Images and any other embeds are still hosted on the websites servers this just grabs the HTML, Javascript and Styling.


We import the requests module.


We create a list of pages we want to copy.


We then loop through the list using the get() function with an f sting to get the data from each webpage. We use the text attribute to get the text from the page and asign the value to the response variable.


We create a filename using the site name and add .html as the extension.


We then write the contents of the page to the filename using open().


import requests

site_list = ['www.arstechnica.com',
             'www.gizmodo.com']

for site in site_list:
    response = requests.get(f'https://{site}').text

    file_name = f'{site}.html'
    with open(file_name, 'w') as file:
        file.write(response)

Commentaires


bottom of page