Expand your Bluesky network with R
Find people followed by the people you follow, but who you don't follow, using R and the atrrr package
This post is inspired by the Bluesky Network Analyzer made by @theo.io.
I’m encouraging everyone I know online to join the scientific community on Bluesky.
In that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.
I started following accounts of people I knew from X and from a few starter packs I came across. One way to expand your network is to take all the accounts you follow, see who they are following but you aren’t. You can rank this list descending by the number of your follows who follow them, and use that list as a way to fill out your network.
Let’s do this with just a few lines of code in R. The atrrr package (CRAN, GitHub, Docs) is one of several packages that wraps the AT protocol behind Bluesky, allowing you to interact with Bluesky through a set of R functions. It’s super easy to use and the docs are great.
The code below does this. It will first authenticate with an app password. It then retrieves all the accounts you follow. Next, it gets who all those accounts follow, and removes the accounts you already follow.1
library(dplyr)
library(atrrr)
# Authenticate first (switch out with your username)
bsky_username <- "youraccount.bsky.social"
# If you already have an app password:
bsky_app_pw <- "change-me-change-me-123"
auth(user=bsky_username, password=bsky_app_pw)
# Or be guided through the process
auth()
# Get the people you follow
f <- get_follows(actor=bsky_username, limit=Inf)
# Get just their handles
fh <- f$actor_handle
# Get who your follows are following
ff <-
  fh |>
  lapply(get_follows, limit=Inf) |>
  setNames(fh)
# Make it a data frame
ffdf <- bind_rows(ff, .id="follow")
# Get counts, removing ppl you already follow
ffcounts <-
  ffdf |>
  count(actor_handle, sort=TRUE) |>
  anti_join(f, by="actor_handle") |>
  filter(actor_handle!="handle.invalid")
# Join back to account info, add URL
ffcounts <-
  ffdf |>
  distinct(actor_handle, actor_name) |>
  inner_join(x=ffcounts, y=_, by="actor_handle") |>
  mutate(url=paste0("https://bsky.app/profile/",
                    actor_handle))
This returns a data frame of all the accounts followed by the people you follow, but who you don’t already follow, descending by the number of accounts you follow who follow them (mouthful right there).
Optional, but you can make this nicer by using the gt package to make a nice table with a clickable link.
# Optional, clean up and create a nice table
library(gt)
library(glue)
top <- 20L
ffcounts |>
  head(top) |>
  rename(Handle=actor_handle, N=n, Name=actor_name) |>
  mutate(Handle=glue("[{Handle}]({url})")) |>
  mutate(Handle=lapply(Handle, gt::md)) |>
  select(-url) |>
  gt() |>
  tab_header(
    title=md(glue("**My top {top} follows' follows**")),
    subtitle="Collected November 19, 2024") |>
  tab_style(
    style="font-weight:bold",
    locations=cells_column_labels()
  ) |>
  cols_align(align="left") |>
  opt_row_striping(row_striping = TRUE)
I can’t embed an HTML file here, but here’s what that output looks like. You can click any one of the names and follow the account if you find it useful.
Maybe you do this iteratively - add your top follows’ follows, then rerun the process a few times to possibly discover unknown second-degree connections.
The code here essentially replicates what @theo.io’s Bluesky Network Analyzer is doing, but all locally using R. That web app is faster and easier to use, and does some smart caching and throttling to avoid API rate limits. See the footnote for more.
Be careful here if you follow a lot of people or you may get rate-limited. This code uses the API to retrieve all the people you follow, then finds all the people each of those accounts follow. This worked fine when I ran this on the ~300 people I follow, but I got rate limited when I tried to run this over the ~5,000 accounts who follow me. If this happens do you, you may need to write a little accessory function that builds in a Sys.sleep() or uses a loop instead of an lapply() with a periodic built-in pause every few hundred queries.



