LLMs Methods

5.1 Why Large Language Models Matter in Educational Research

Large Language Models (LLMs) have transformed how researchers analyze, generate, and interpret text. Unlike traditional NLP pipelines that rely on token counts and surface patterns, LLMs reason over context, semantics, and discourse structure.

In educational research, this means: - Summarizing and coding open-ended student reflections - Analyzing institutional policy documents - Generating scaffolds or rubrics for teaching materials - Synthesizing qualitative and quantitative findings into narratives

LLMs thus expand the researcher’s computational toolkit—from statistical pattern recognition to context-aware meaning-making.

5.2 Cloud-based LLMs: Capabilities and Setup

Cloud-based Large Language Models (LLMs) offer access to state-of-the-art generative and analytical capabilities without requiring local hardware or model management. They run on remote servers and are accessed via APIs—making them ideal for rapid prototyping, large-scale text processing, and exploratory analyses in educational research.

5.2.1 Overview of Major Cloud Providers

The landscape of LLMs evolves quickly. As of 2025, the following providers represent the most common cloud-based options for educational data analysis.

Provider	Example Models	Access	R / HTTP Wrapper
OpenAI	GPT-4o / GPT-5	API key via platform.openai.com	`{httr2}`, `{openai}`, `{ellmer}`
Anthropic	Claude 3 Opus · Claude 4	API key via console.anthropic.com	`{httr2}`, `{anthropic}`, `{ellmer}`
Google	Gemini 2.5 · Gemini Ultra (2025)	AI Studio	`{googleGenerativeAI}`, `{ellmer}`
Mistral AI	Mixtral 8×22B · Mistral 7B v0.3	mistral.ai	HTTP via `{httr2}`
DeepSeek	DeepSeek-V3.1 (MoE 128K ctx)	API via deepseek.ai	HTTP via `{httr2}`

💡 Tip: Always confirm the latest model names, context-window sizes, and pricing tiers on each provider’s documentation page. Many vendors update models quarterly, affecting reproducibility.

5.2.2 Connecting to a Cloud API from R

Below is a minimal reproducible example using the OpenAI API through the {httr2} package. The same structure applies to most HTTP-based providers.

library(httr2)
library(jsonlite)

resp <- request("https://api.openai.com/v1/chat/completions") |>
  req_headers(
    "Authorization" = paste("Bearer", Sys.getenv("OPENAI_API_KEY")),
    "Content-Type"  = "application/json"
  ) |>
  req_body_json(list(
    model = "gpt-4o-mini",
    messages = list(
      list(role = "system", content = "You are an assistant helping with educational data analysis."),
      list(role = "user",   content = "Summarize these student reflections in three themes.")
    )
  )) |>
  req_perform()

content <- resp_body_json(resp)
cat(content$choices[[1]]$message$content)

🧩 This pattern underlies all API workflows:

(1) authenticate → (2) send prompt → (3) receive structured response.

5.2.3 Using the `{ellmer}` Package — A Unified LLM Interface for R

The ellmer package, developed within the Posit/Tidyverse ecosystem, provides a consistent way to access multiple LLM providers from R through one simple interface.

Key advantages of {ellmer}

Unified syntax for OpenAI, Anthropic, Gemini, and other APIs
Built-in streaming and function-calling support
Structured JSON output parsing for downstream R workflows
Compatible with both cloud and local OpenAI-style endpoints

Installation and Setup

install.packages("ellmer")
library(ellmer)

# Set API key (example: OpenAI)
Sys.setenv(OPENAI_API_KEY = "your_api_key_here")

# Create chat connection
chat <- llm_chat(
  provider = "openai",
  model    = "gpt-4o-mini",
  key      = Sys.getenv("OPENAI_API_KEY")
)

Example: Clustering Student Reflections

responses <- c(
  "I learned how to write better R code.",
  "Collaborating with peers improved my understanding.",
  "I struggled with data visualization."
)

prompt <- paste("Cluster these reflections into 3 short themes:\n-",
                paste(responses, collapse = "\n- "))

result <- llm_chat_complete(
  chat,
  messages = list(
    list(role = "system", content = "You are an assistant helping analyze educational reflections."),
    list(role = "user",   content = prompt)
  )
)

cat(result$choices[[1]]$message$content)

✅ The {ellmer} interface lets you swap models by changing only

provider = and model = arguments—no need to rewrite your R code.

5.2.4 Typical Educational Use Cases

Automated qualitative coding of survey or interview data
Summarizing student feedback at scale
Drafting analytic memos or interpretive summaries
Generating or evaluating teaching materials
Rapid prototyping for mixed-methods research designs

5.2.5 Reproducibility and Ethical Considerations

Because cloud services update frequently, reproducibility requires careful documentation:

Record model name, version, and query date
De-identify any personal or institutional information before upload
Monitor API usage and costs (token-based billing)
Disclose clearly how LLMs assisted analysis or interpretation

⚖️ Responsible use of cloud LLMs means balancing analytical convenience with ethical data stewardship.

5.2.6 Summary

Cloud-based LLMs provide high-performance, instantly accessible text-analysis capabilities for educational research.

They are ideal for large-scale data exploration, qualitative summarization, and prototyping analytical workflows.

Next we will shift focus from remote APIs to local LLMs,

exploring how offline tools such as LM Studio can achieve similar capabilities while maintaining full data privacy.

5.3 Local LLMs: Privacy-Preserving and Offline Analysis

While cloud models offer convenience, they also raise concerns around data privacy, cost, and IRB compliance.

Local LLMs solve these challenges by running entirely on your own computer.

5.3.1 What Are Local LLMs?

Local LLMs are open or custom models executed on your local hardware.

They process text without sending it to external servers—ensuring full data sovereignty.

Common open-source families: Llama 3, Qwen, DeepSeek, Mistral, gpt-oss.

Key Advantages

Data never leaves your device
No API keys or internet required
Highly customizable and often cost-free
Enables fully offline reproducible analysis

5.3.2 Getting Started with LM Studio

LM Studio is a free, cross-platform desktop application for managing and running local LLMs.

It provides a GUI for model downloads, prompt testing, and an optional REST API for automation.

Supported Platforms: macOS (Apple Silicon), Windows (x64/ARM64), Linux (x64)

Docs: lmstudio.ai/docs

Installation Steps

Download LM Studio for your system from the official site.
Install and launch the application.
Download a model such as Llama 3, Qwen, Mistral, or DeepSeek.
(Optional) Enable API access for scripting.
(Optional) Attach local documents to enable offline “Chat with Documents” (RAG mode).

Main Features

Feature	Description
Local LLMs	Run models offline on your own machine
Chat Interface	Simple prompt-based GUI
Document Chat (RAG)	Offline “chat with your PDFs”
Model Management	Search, download, and switch models
API Access	OpenAI-compatible REST endpoints
Community Support	Active Discord and docs

5.3.3 Calling the LM Studio API from R

LM Studio exposes an OpenAI-compatible REST API, so R code can look almost identical to the cloud example:

library(httr)
library(jsonlite)

prompt <- "Summarize the following open-ended survey responses: ..."

response <- POST(
  url  = "http://localhost:1234/v1/completions",
  body = toJSON(list(prompt = prompt, max_tokens = 200), auto_unbox = TRUE),
  encode = "json"
)

content(response)

🔐 Because the request stays within your local network, no data ever leaves your computer.

5.4 Cloud vs Local LLMs: Choosing the Right Tool

Criterion	Cloud-based LLMs	Local LLMs
Cost	Pay-per-token or subscription	Free (after hardware)
Privacy	Data sent to provider	Data stays local
Performance	Highest accuracy & speed	Depends on hardware
Maintenance	Automatic updates	Manual model management
Customization	Limited fine-tuning	Fully modifiable
Best for	Large public datasets or prototype analysis	Sensitive or regulated data

🧭 Many educational researchers prototype analyses on the cloud for speed, then reproduce them locally for privacy and reproducibility.

5.5 Practical Setup Checklist

Before running LLM-based analyses:

✅ Select your preferred model and platform
✅ Configure API key (cloud) or local endpoint (LM Studio)
✅ Test connectivity with a short prompt
✅ Log model name, version, and date
✅ De-identify data and store outputs securely

✨ Following these steps ensures your AI-assisted research remains ethical, reproducible, and IRB-compliant.

5.6 Summary

Both cloud-based and local LLMs enable researchers to integrate generative AI into educational inquiry.

Use Case	Cloud LLM	Local LLM
Rapid prototyping	✅	⚪
Large-scale text processing	✅	⚪
Sensitive student data	⚪	✅
Offline analysis	⚪	✅
Long-term reproducibility	⚪	✅

In short, cloud LLMs excel in convenience and scale,

while local LLMs excel in privacy and control.

Most projects benefit from combining both.

Looking Ahead

Chapter 6 will demonstrate thematic and qualitative text analysis using LM Studio,

showing how local LLMs can perform end-to-end qualitative coding and synthesis.
Chapter 7 extends this workflow to multimodal data—images — illustrating how AI can connect diverse data types in educational contexts.