Build a Reddit Bot Part 1
Part 1: Read posts from reddit
Part 4: Marvin the Depressed Bot
Introduction
So we are going to build a simple Reddit Bot that will do two things:
It will monitor a particular subreddit for new posts, and when someone posts “I love Python”, it will reply “Me too!”.
It will also monitor all comments to recent posts, and if it finds one that says “I hate Python”, it will post a link to /r/learnpython and ask the commenter to ask a question there.
Prerequisite knowledge
Only a basic knowledge of Python is required, as building bots is fairly easy.
Part 1
In part one, we will see how we can read data from Reddit using the Reddit API. The source code is available at Github:
COMMENT: Reader Farid reports that Reddit has updated it’s website to a new look. If you come across a link above that does not work, then you will have to change the url. For example the link https://www.reddit.com/dev/api says not found. If we change the link to https://old.reddit.com/dev/api then the link should work. In short, if a reddit link does not work change the “www” to “old”, so the link looks like “old.reddit.com”
Software bot
A software bot is a program that can interact with websites autonomously. They can be as simple or as complex as you want them to be.
The bot runs in the background and monitors a website. When it sees a change (like a post on Reddit), it can reply to it, upvote, or do any other task it was programmed to.
Monitoring websites
There are many ways to monitor websites. You can use web scraping tools like urllib or Beautifulsoup any anything similar. There is a slight problem with this, though. Bots can make thousands of requests a second, and this can overload servers. So most big websites ban bots. Ignore this at your own risk. I have been banned from Google for hours, had my Gmail locked till I entered a dozen captachas, my mobile and the name of my first cat.
If you want to do this properly, stick to any rules the website has.
The Reddit API
Reddit provides an API, and unlike some websites, it’s actually quite easy to use. It’s based on REST and json, so in theory doesn’t require any fancy setup.
https://www.reddit.com/dev/api
The important thing is to follow the rules they set. Two of the most important ones are:
- You can’t make more than 1 request every 2 seconds (or 30 a minute)
- You must not lie about your user agent
Read the rest here.
The user agent is what identifies your browser. Libraries like Python’s urllib are severely restricted by Reddit to prevent abuse. Reddit recommends you use your own special user agent, and that’s what we’ll do.
Using the API
The API is quite easy to use, like I said. You make a REST request, and this can be done via urllib2 (as long as you set the user agent properly). This is how you would do it. I have put two links below. Open both in a new tab:
https://www.reddit.com/r/learnPython/
https://www.reddit.com/r/learnPython/hot/.json
The first is how a human would see it. The second is how your code sees it. As you can see, getting the json is fairly easy.
The problem with this approach is that you still have to make sure you rate limit your requests. You also have to parse the json yourself. Json is easy to parse in Python, as it’s essentially a Python dictionary, but if you actually look at the json, there is a lot of data.
Introducing PRAW
[Update Dec 2016: Reddit and Praw now force you to use Oauth. I’ve updated the article to use that]
Praw is a library that fixes many of these problems for you. It limits how many requests you can make, and makes it easy to extract the json. Install it by:
pip install praw
You need to do some setup first.
Create Reddit App
Go to: https://www.reddit.com/prefs/apps/
And select Create App:
You will now get a client_id (red box below) and secret (blue box below). Note it down, but keep it secret.
This page describes how to change praw.ini files: https://praw.readthedocs.io/en/v4.0.0/getting_started/configuration/prawini.html
You will find the file in your Python install folder, under Lib\Site-Packages\praw\praw.ini
COMMENT: As Bryce points out in the comments: I don’t recommend modifying the package-level praw.ini
as those changes will be overwritten every time the package is updated. Instead praw.ini
should be placed in the directory that the program is run from (often the same directory as the file). Other options are specified here.
Add the values we noted down:
There is a new field: user_agent.
Remember I said the Reddit rules say you have to have a specific user agent? I’m choosing the name _PyEng Bot. _The number at the end is the version. This is recommended, because once your code is out there, people might abuse it. If someone spams Reddit with your code, Reddit will ban that user agent.
In that case, you just move the version up. Not ideal, but you have to accept that your code may be misused by spammers.
Let’s go over the code now. Download it at Github.
import praw
We import praw.
reddit = praw.Reddit('bot1')
subreddit = r.subreddit("learnpython")
We create a Reddit instance using the values we saved under bot1.
Then we get the subreddit learnpython.
Now, if you look on the subreddit, you can see that there is a hot tab. This does not indicate the temperature there is high or that there are racy swimsuit models. It means the most popular posts. That’s what we are going to read now. The function to do so is get_hot().
for submission in subreddit.hot(limit=5):
We get the top 5 hot submissions. At this stage, you can do this to see which functions are available (you can do that at any stage, or look at Praw’s documentation).
Seeing a snipped list:
dir(submission)
['approve',
'approved_by',
'author',
'domain',
'downs',
'downvote',
'edit',
'edited',
'saved',
'score',
'secure_media',
'secure_media_embed',
'selftext',
'selftext_html',
'title',
'ups',
'upvote',
'url',
'user_reports',
'visited',
'vote']
I’ll point out a few important ones. Title is the title, as it appears on Reddit’s main page. Selftext is the optional text you can put on posts- most posts don’t have these. learnpython is unique in that most posts do have text (usually the poster asking their question), which is why I’ve chosen it. score is the total score, adding upvotes and downvotes (both of which are also available).
These are the three we will print:
for submission in subreddit.hot(limit=5):
print("Title: ", submission.title)
print("Text: ", submission.selftext)
print("Score: ", submission.score)
print("---------------------------------\n")
That’s it. Run the script, and open Reddit in a browser at the same time. Check that you are getting the right results.
Next time
Next time we will look at how to send a reply to a post on Reddit. Next Part