Skip to contents

In parts, a wrapper for the RedditExtractoR package; pulls posts and comments from specified subreddits or searches.

Usage

reddit(topics, search = NULL, ..., sort = "hot", filename = "reddit.csv",
  write = TRUE, lim = 100, filter = "\\[removed\\]", clean = TRUE,
  comments_only = FALSE, posts_only = FALSE,
  useragent = paste("R LUSI @", date()))

reddit.karma(users)

reddit.usercomments(users, filename = NULL, subreddits = NULL, lim = 100,
  type = "comments", useragent = paste("R LUSI @", date()))

reddit.lsm(data)

Arguments

topics

A string or vector of strings corresponding to subreddit names (e.g., 'trees' referring to reddit.com/r/trees/). Only the first value is used if search is specified.

Passed to find_thread_urls as the search_terms argument. If this is specified, if topic is specified, the first topic value will be added as the subreddit argument (which will restrict search to that subreddit).

...

Passed additional arguments to find_thread_urls if search is specified.

sort

How to sort initial comments. Only applies if search is not specified. Default is 'hot', with 'new', 'rising', 'top', 'gilded', and 'ads', as options.

filename

Name of the file to be saved in the current working directory. This will currently always be a csv file.

write

Logical: if FALSE, data will not be save to a file (they will just be stored as objects if you've named your reddit call).

lim

Numeric: sets the number of posts to pull per topic. Only applies if search is not specified.

filter

Passed to grepl. A pattern used to filter posts by the content of their comments. default is '\[removed\]' to filter out those comments that have been deleted.

clean

Logical; if FALSE, converts curly to straight quotes.

comments_only

Logical; if TRUE, will exclude original posts from each thread.

posts_only

Logical; if TRUE, will only collect the initial post for each thread. This changes what is returned, relative to posts returned along with comments.

useragent

String to set as the request's User-Agent.

users

A vector of user names (as in reddit.com/user/username; such as those in the user column from the reddit function). Information is never gathered twice for the same user; reddit.usercomments simply removes duplicate user names (i.e., unique(users)), whereas reddit.karma will return karma scores in the same order as the input, filling in the same information for duplicate users.

subreddits

A vector of subreddits to filter for; only comments within the specified subreddits are returned. Should exactly match the subreddit_name_prefix field (e.g., 'r/Anxiety' for https://www.reddit.com/r/Anxiety/), including 'r/' before each subreddit name, though this will be added if missing.

type

Type of user data to download; either 'comments' or 'submissions' (posts).

data

The data.frame returned from a reddit call (e.g., comments = reddit('trees'); reddit.lsm(comments))

Value

From reddit if posts_only is FALSE: A data.frame with a row for each post / comment:

  • url: URL of the thread.

  • author: Username of the author.

  • date: Date of the submission in YYYY-MM-DD format.

  • timestamp: Time of the submission in seconds since 1970-01-01; use as.POSIXct to convert to a date object.

  • score: Score.

  • upvotes: Number of downvotes.

  • downvotes: Number of upvotes.

  • golds: Number of golds.

  • comment: Text of the submission. Called comment for compatibility, though it may be a post or comment, as indicated by type.

  • comment_id: Position of the submission within the thread, where 0 means original post.

  • type: Whether the submission is a comment or post.

If posts_only is TRUE, a data.frame with all variables from the Reddit response, where, for instance, text will be selftext.

Examples

if (FALSE) { # \dontrun{
# these will all save a file called 'reddit.csv' to the current working directory.
# pulls from a single, depression related subreddit:
reddit("depression")

# pull from a few subreddits, also saving the data as an object ('comments'):
topics <- c("trees", "Meditation")
comments <- reddit(topics)

# pull comments from a search
reddit(search = "politics")

# calculate language style matching between each comment and the comment it's replying to
# within the first thread of the trees subreddit
thread_lsm <- reddit.lsm(comments[comments$title == comments$title[1], ])

# download the 5 most recent comments from 10 users who commented in the trees subreddit
user_comments <- reddit.usercomments(comments$user[1:5], lim = 5)
} # }