Hello,

I made a simple script to scraper threads.net using python and selenium. the script is just few lines long and it’s easy to understand.

So what this script does?

first it will open edge browser(which you can change it to firefox or chrome). now you have to enter credentials to log into it. your browsing data and credentials will be stored in user_data which you can move around.

It scroll through threads’s feed/hashtag/explore and It will store the src of every image it encounters so at the end we will have a links.txt file containing all the links to the images we have encountered.

now we have links.txt and we can use the following command to download all the images from the links.txt

wget -i links.txt

the script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.options import Options
import time

options = Options()
options.add_argument("--user-data-dir=user_data")

driver = webdriver.Edge(options=options)

driver.get('https://threads.net')

s = set()

input("Press any key to continue...")
for i in range(30):
    try:
        elements = driver.find_elements(By.XPATH, "//img")
        for e in elements:
            s.add(e.get_attribute("src"))
        driver.execute_script("window.scrollBy(0, 1000);")
        time.sleep(0.2)
    except:
        print("oopsie")

with open("links.txt", 'w') as f:
    links = list(s)
    for l in links:
        f.write(l+"\n")

driver.quit()

I hope it was usefull :D

Edit: here is a link to links.txt https://0x0.st/HGjx.txt

  • whoareu@lemmy.caOP
    link
    fedilink
    arrow-up
    5
    arrow-down
    3
    ·
    10 months ago

    Because this way I can download a lot of wallpaper and anime pictures :D

    There are tons of anime and wallpaper on instagram.com.

    you can use this script to scrape instagram too! just change the url in driver.get().