In parts, a wrapper for the RedditExtractoR package; pulls posts and comments from specified subreddits or searches.
Usage
reddit(topics, search = NULL, ..., sort = "hot", filename = "reddit.csv",
write = TRUE, lim = 100, filter = "\\[removed\\]", clean = TRUE,
comments_only = FALSE, posts_only = FALSE,
useragent = paste("R LUSI @", date()))
reddit.karma(users)
reddit.usercomments(users, filename = NULL, subreddits = NULL, lim = 100,
type = "comments", useragent = paste("R LUSI @", date()))
reddit.lsm(data)
Arguments
- topics
A string or vector of strings corresponding to subreddit names (e.g., 'trees' referring to reddit.com/r/trees/). Only the first value is used if search is specified.
- search
Passed to
find_thread_urls
as the search_terms argument. If this is specified, if topic is specified, the first topic value will be added as the subreddit argument (which will restrict search to that subreddit).- ...
Passed additional arguments to
find_thread_urls
if search is specified.- sort
How to sort initial comments. Only applies if search is not specified. Default is 'hot', with 'new', 'rising', 'top', 'gilded', and 'ads', as options.
- filename
Name of the file to be saved in the current working directory. This will currently always be a csv file.
- write
Logical: if FALSE, data will not be save to a file (they will just be stored as objects if you've named your reddit call).
- lim
Numeric: sets the number of posts to pull per topic. Only applies if
search
is not specified.- filter
Passed to
grepl
. A pattern used to filter posts by the content of their comments. default is '\[removed\]' to filter out those comments that have been deleted.- clean
Logical; if
FALSE
, converts curly to straight quotes.- comments_only
Logical; if
TRUE
, will exclude original posts from each thread.- posts_only
Logical; if
TRUE
, will only collect the initial post for each thread. This changes what is returned, relative to posts returned along with comments.- useragent
String to set as the request's User-Agent.
- users
A vector of user names (as in reddit.com/user/username; such as those in the user column from the reddit function). Information is never gathered twice for the same user;
reddit.usercomments
simply removes duplicate user names (i.e., unique(users)), whereas reddit.karma will return karma scores in the same order as the input, filling in the same information for duplicate users.- subreddits
A vector of subreddits to filter for; only comments within the specified subreddits are returned. Should exactly match the subreddit_name_prefix field (e.g., 'r/Anxiety' for https://www.reddit.com/r/Anxiety/), including 'r/' before each subreddit name, though this will be added if missing.
- type
Type of user data to download; either 'comments' or 'submissions' (posts).
- data
The
data.frame
returned from a reddit call (e.g.,comments = reddit('trees'); reddit.lsm(comments)
)
Value
From reddit
if posts_only
is FALSE
:
A data.frame
with a row for each post / comment:
url
: URL of the thread.author
: Username of the author.date
: Date of the submission in YYYY-MM-DD format.timestamp
: Time of the submission in seconds since 1970-01-01; useas.POSIXct
to convert to a date object.score
: Score.upvotes
: Number of downvotes.downvotes
: Number of upvotes.golds
: Number of golds.comment
: Text of the submission. Calledcomment
for compatibility, though it may be a post or comment, as indicated bytype
.comment_id
: Position of the submission within the thread, where 0 means original post.type
: Whether the submission is a comment or post.
If posts_only
is TRUE
, a data.frame
with all variables from the
Reddit response, where, for instance, text will be selftext
.
Examples
if (FALSE) { # \dontrun{
# these will all save a file called 'reddit.csv' to the current working directory.
# pulls from a single, depression related subreddit:
reddit("depression")
# pull from a few subreddits, also saving the data as an object ('comments'):
topics <- c("trees", "Meditation")
comments <- reddit(topics)
# pull comments from a search
reddit(search = "politics")
# calculate language style matching between each comment and the comment it's replying to
# within the first thread of the trees subreddit
thread_lsm <- reddit.lsm(comments[comments$title == comments$title[1], ])
# download the 5 most recent comments from 10 users who commented in the trees subreddit
user_comments <- reddit.usercomments(comments$user[1:5], lim = 5)
} # }