Reddit — reddit • lusilab

In parts, a wrapper for the RedditExtractoR package; pulls posts and comments from specified subreddits or searches.

Usage

reddit(topics, search = NULL, ..., sort = "hot", filename = "reddit.csv",
  write = TRUE, lim = 100, filter = "\\[removed\\]", clean = TRUE,
  comments_only = FALSE, posts_only = FALSE,
  useragent = paste("R LUSI @", date()))

reddit.karma(users)

reddit.usercomments(users, filename = NULL, subreddits = NULL, lim = 100,
  type = "comments", useragent = paste("R LUSI @", date()))

reddit.lsm(data)

Arguments

topics: A string or vector of strings corresponding to subreddit names (e.g., 'trees' referring to reddit.com/r/trees/). Only the first value is used if search is specified.
search: Passed to find_thread_urls as the search_terms argument. If this is specified, if topic is specified, the first topic value will be added as the subreddit argument (which will restrict search to that subreddit).
...: Passed additional arguments to find_thread_urls if search is specified.
sort: How to sort initial comments. Only applies if search is not specified. Default is 'hot', with 'new', 'rising', 'top', 'gilded', and 'ads', as options.
filename: Name of the file to be saved in the current working directory. This will currently always be a csv file.
write: Logical: if FALSE, data will not be save to a file (they will just be stored as objects if you've named your reddit call).
lim: Numeric: sets the number of posts to pull per topic. Only applies if search is not specified.
filter: Passed to grepl. A pattern used to filter posts by the content of their comments. default is '\[removed\]' to filter out those comments that have been deleted.
clean: Logical; if FALSE, converts curly to straight quotes.
comments_only: Logical; if TRUE, will exclude original posts from each thread.
posts_only: Logical; if TRUE, will only collect the initial post for each thread. This changes what is returned, relative to posts returned along with comments.
useragent: String to set as the request's User-Agent.
users: A vector of user names (as in reddit.com/user/username; such as those in the user column from the reddit function). Information is never gathered twice for the same user; reddit.usercomments simply removes duplicate user names (i.e., unique(users)), whereas reddit.karma will return karma scores in the same order as the input, filling in the same information for duplicate users.
subreddits: A vector of subreddits to filter for; only comments within the specified subreddits are returned. Should exactly match the subreddit_name_prefix field (e.g., 'r/Anxiety' for https://www.reddit.com/r/Anxiety/), including 'r/' before each subreddit name, though this will be added if missing.
type: Type of user data to download; either 'comments' or 'submissions' (posts).
data: The data.frame returned from a reddit call (e.g., comments = reddit('trees'); reddit.lsm(comments))

Value

From reddit if posts_only is FALSE: A data.frame with a row for each post / comment:

url: URL of the thread.
author: Username of the author.
date: Date of the submission in YYYY-MM-DD format.
timestamp: Time of the submission in seconds since 1970-01-01; use as.POSIXct to convert to a date object.
score: Score.
upvotes: Number of downvotes.
downvotes: Number of upvotes.
golds: Number of golds.
comment: Text of the submission. Called comment for compatibility, though it may be a post or comment, as indicated by type.
comment_id: Position of the submission within the thread, where 0 means original post.
type: Whether the submission is a comment or post.

If posts_only is TRUE, a data.frame with all variables from the Reddit response, where, for instance, text will be selftext.

Examples

if (FALSE) { # \dontrun{
# these will all save a file called 'reddit.csv' to the current working directory.
# pulls from a single, depression related subreddit:
reddit("depression")

# pull from a few subreddits, also saving the data as an object ('comments'):
topics <- c("trees", "Meditation")
comments <- reddit(topics)

# pull comments from a search
reddit(search = "politics")

# calculate language style matching between each comment and the comment it's replying to
# within the first thread of the trees subreddit
thread_lsm <- reddit.lsm(comments[comments$title == comments$title[1], ])

# download the 5 most recent comments from 10 users who commented in the trees subreddit
user_comments <- reddit.usercomments(comments$user[1:5], lim = 5)
} # }