R
We’re not going to teach you RStudio comprehensively—excellent resources already exist for that purpose. We particularly recommend R for Data Science, Data Science in Education Using R, the RStudio primers, and Happy Git and GitHub for the useR.
Instead, this chapter covers just enough to reproduce our workflows and extend them to your own data science work. Think of this as the minimal viable setup for doing the work in this book.
A note on Positron: Throughout this book, we use RStudio, but Positron (Posit’s newer IDE built on VS Code) offers similar functionality and may appeal to users coming from VS Code or working across R and Python. The R code in this book will work identically in either environment.
The Four Essential Panes
When you open RStudio, you’ll see four panes (though the layout is customizable). Here’s what matters for our work:
- Source (top-left): Where you write and edit your code files
- Console (bottom-left): Where R actually executes code and shows results
- Environment (top-right): Shows the objects (data, functions, etc.) you’ve created
- Files/Plots/Help (bottom-right): Navigate files, view visualizations, and access documentation
You don’t need to customize anything yet—the defaults work fine.
R Projects: The Foundation of Reproducible Work
Always work in an R Project. This is the single most important RStudio habit for reproducibility.
An R Project is a folder that contains all files for a particular analysis—data, scripts, outputs—and sets your working directory automatically. This means your code will work on any computer without changing file paths.
To create a new project:
- File → New Project
- Choose “New Directory” or “Existing Directory”
- Name your project (e.g., “my-first-css-analysis”)
- Click “Create Project”
Notice the .Rproj file in your folder? That’s your project file. Double-click it to open RStudio with the correct working directory already set.
Three Ways to Write R Code
RStudio supports multiple file types for writing R code. You’ll encounter all three in data science work:
R Scripts (.R)
Plain text files containing R code, executed line-by-line or all at once. Great for data processing pipelines, functions, and code you’ll reuse.
Create one: File → New File → R Script
# This is an R script
# Lines starting with # are comments
data <- read.csv("mydata.csv")
summary(data)R Markdown (.Rmd)
Documents that weave together narrative text (in Markdown) and code chunks. When you “knit” an R Markdown file, it executes the code and generates a formatted document (HTML, PDF, or Word).
You should know R Markdown exists because you’ll encounter it in older resources and projects, but we’re not using it in this book.
Quarto (.qmd)
Quarto is the successor to R Markdown, with better multi-language support (R, Python, Julia) and more consistent syntax. It’s what we’ll use throughout this book.
Create one: File → New File → Quarto Document
The structure looks similar to R Markdown, but the rendering engine is more powerful:
---
title: "My Analysis"
format: html
---
## Introduction
This is regular text.
```{r}
# This is a code chunk
x <- 1:10
mean(x)
```Click the “Render” button (or Ctrl/Cmd + Shift + K) to execute all code and generate your document.
Key difference for beginners: Think of R scripts as “just code” and Quarto as “code + explanation + output” in one document. Use scripts for behind-the-scenes work, Quarto for analysis you want to share or understand later.
Console vs. Source: Exploration vs. Preservation
A common pattern in data science work:
- Console: Quick exploration, testing ideas, checking output. Code here disappears when you close RStudio.
- Source (scripts/Quarto): Code you want to keep, reproduce, or share. This is your permanent record.
Early on, you might type everything in the Console. That’s fine for learning! But as soon as you do something you want to remember, put it in a script or Quarto file.
Version Control with Git and GitHub
Git is how computational social scientists share, version, and collaborate on their work. GitHub is the most popular platform for hosting Git repositories.
Why Git Matters
Without version control, you end up with files like analysis_final.R, analysis_final2.R, analysis_ACTUALLY_final.R. Git tracks every change you make, so you can:
- Go back to any previous version
- Experiment without fear of breaking things
- Collaborate with others without emailing files back and forth
- Share your work publicly for transparency and reproducibility
The Git Pane in RStudio
If Git is configured, you’ll see a “Git” tab in the Environment pane (top-right). This is where you stage changes, commit them with messages, and push to GitHub—all without leaving RStudio.
Using Git isn’t strictly required to follow this book, but it’s strongly encouraged. Most data science work lives in Git repositories.
Setting Up Git and GitHub: A Friendlier Path
Git authentication is famously difficult. We’re going to use an opinionated approach that minimizes pain: Personal Access Tokens (PAT) via the usethis package.
This approach is: - R-native (stays in your comfort zone) - Guided (the package walks you through it) - Good enough for this book’s purposes
Step 1: Install Required Packages
install.packages("usethis")
install.packages("gitcreds")Step 2: Configure Your Git Identity
Tell Git who you are (one-time setup):
usethis::use_git_config(
user.name = "Jane Doe", # Your actual name
user.email = "jane@example.com" # Email associated with GitHub
)Important: Use the email address associated with your GitHub account.
Step 3: Create a GitHub Personal Access Token
usethis::create_github_token()This opens GitHub in your browser. You’ll need to:
- Log in to GitHub if you haven’t already
- Leave the default scopes checked (or at minimum select “repo” and “workflow”)
- Set expiration to 90 days (we recommend this for security, though you’ll need to regenerate the token when it expires)
- Click “Generate token” at the bottom
- Copy the token immediately—GitHub only shows it once!
The token looks like a long random string: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Step 4: Store Your Token in R
gitcreds::gitcreds_set()When prompted, paste your token and press Enter. R will securely store it for future use.
Step 5: Verify Everything Worked
usethis::git_sitrep()This gives you a diagnostic report. If you see: - Your name and email listed - “Personal access token for ‘https://github.com’: ‘
You’re all set! If something looks wrong, see the troubleshooting section below.
Common Problems & Solutions
Problem: “I closed the browser before copying my token”
Solution: No worries—tokens are disposable by design. Run usethis::create_github_token() again to generate a new one.
Problem: “It worked yesterday, now Git asks for credentials again”
Solution: Your token likely expired. Run usethis::create_github_token() again, copy the new token, and run gitcreds::gitcreds_set() to update it.
Problem: “I get ‘Permission denied’ or authentication errors”
Solution: Check usethis::git_sitrep(). Your token may have expired or wasn’t stored correctly. Try running gitcreds::gitcreds_set() again.
Problem: “I don’t see a Git pane in RStudio”
Solution: Either Git isn’t installed on your computer, or RStudio can’t find it. See Happy Git with R, Chapter 6 for installation instructions.
Alternative: SSH Keys (For the Adventurous)
If you plan to use Git extensively beyond this book, SSH keys are more convenient—you set them up once and never think about authentication again. However, they’re more complex to configure initially.
If you’re interested in the SSH approach, see Happy Git with R, Chapters 10-12. For now, the PAT method will serve you well.
Using Git in RStudio
We’ll introduce Git commands as you need them throughout the book. For now, know that the Git pane in RStudio gives you a graphical interface for the most common operations:
- Pull: Download changes from GitHub
- Commit: Save a snapshot of your changes with a descriptive message
- Push: Upload your commits to GitHub
You can also use terminal commands (git status, git add, git commit, git push) if you prefer, but the RStudio interface works great for our purposes.
When You Get Stuck
Reading Error Messages
When something goes wrong, R prints an error message in the Console (usually in red). These messages often look cryptic at first, but they’re trying to help:
Error in mean(x) : object 'x' not foundThis tells you exactly what went wrong: R couldn’t find an object called x. Either you misspelled it, or you haven’t created it yet.
Getting Help on Functions
R has built-in documentation for every function:
?mean # Opens help for the mean() function
??regression # Searches all help files for "regression"Help files include descriptions, arguments, examples, and related functions.
Large Language Models and AI tools
In section 3, we describe using Large Language Models (LLMs) as a part of a workflow. They can also be helpful for getting unstuck, and, indeed, this is a great tool for data scientists. We recommend two primary ways.
Asking LLMs directly - pick your favorite
Using GitHub copilot (setup described below )
Going Deeper
This chapter gave you the essentials for working through this book. When you’re ready to level up your RStudio skills:
- R for Data Science (2e): Comprehensive introduction to data science with R
- Data Science in Education Using R: Applied data science in educational contexts
- RStudio User Guide: Official documentation for all RStudio features
- Quarto Documentation: Deep dive into Quarto’s capabilities
- Happy Git and GitHub for the useR: Everything you need for Git/GitHub workflows
Advanced RStudio features worth exploring later:
- Keyboard shortcuts (Alt+Shift+K to see them all)
- Code snippets (type “fun” + Tab to scaffold a function)
- Debugger (for tracking down errors in complex code)
- Visual markdown editor (WYSIWYG editing for Quarto documents)
- Background jobs (run long scripts without freezing RStudio)
For now, you have everything you need to get started. Let’s dive in to