Cloud VS Local
5.1 Why Large Language Models Matter in Educational Research
If you have ever looked at a pile of open-ended student responses and thought, “I know there is something important in here, but where do I even start?” - you are exactly the kind of researcher LLMs can help.
Large Language Models (LLMs) have changed how we analyze, generate, and interpret text. Traditional NLP workflows are still valuable, but they usually focus on counts and patterns at the word level. LLMs add something different: they can reason across context, meaning, and discourse.
In educational research, that opens new possibilities:
- Summarizing and coding open-ended reflections
- Analyzing institutional policy documents
- Drafting instructional scaffolds and rubrics
- Connecting qualitative insights with quantitative findings
In short, LLMs expand your toolkit from pattern detection toward context-aware meaning-making.
5.2 Cloud-Based LLMs: Capabilities and Setup
Cloud-based LLMs are the quickest way to get started. You do not need to manage model files or local hardware. You send a request to a provider API, get a response, and continue your workflow.
That makes cloud models especially useful for rapid prototyping, large-scale text processing, and exploratory analysis.
Overview of Major Cloud Providers
The landscape moves fast. As of 2025, these are among the most common cloud options used in educational data analysis:
| Provider | Example Models | Access | R / HTTP Wrapper |
|---|---|---|---|
| OpenAI | GPT-4o / GPT-5 | API key via platform.openai.com | {httr2}, {openai}, {ellmer} |
| Anthropic | Claude 3 Opus · Claude 4 | API key via console.anthropic.com | {httr2}, {anthropic}, {ellmer} |
| Gemini 2.5 · Gemini Ultra (2025) | AI Studio | {googleGenerativeAI}, {ellmer} |
|
| Mistral AI | Mixtral 8×22B · Mistral 7B v0.3 | mistral.ai | HTTP via {httr2} |
| DeepSeek | DeepSeek-V3.1 (MoE 128K ctx) | API via deepseek.ai | HTTP via {httr2} |
Tip: always verify model names, context limits, and pricing before you run a study. Providers update frequently, and those updates can affect reproducibility.
Connecting to a Cloud API from R
Here is a minimal example using the OpenAI API with {httr2}. The same pattern applies to most HTTP-based providers.
library(httr2)
library(jsonlite)
resp <- request("https://api.openai.com/v1/chat/completions") |>
req_headers(
"Authorization" = paste("Bearer", Sys.getenv("OPENAI_API_KEY")),
"Content-Type" = "application/json"
) |>
req_body_json(list(
model = "gpt-4o-mini",
messages = list(
list(role = "system", content = "You are an assistant helping with educational data analysis."),
list(role = "user", content = "Summarize these student reflections in three themes.")
)
)) |>
req_perform()
content <- resp_body_json(resp)
cat(content$choices[[1]]$message$content)Core API pattern: authenticate -> send prompt -> parse response.
Using {ellmer}: One Interface for Many Providers
The {ellmer} package (from the Posit ecosystem) gives you a unified interface across multiple providers. So instead of rewriting your code every time you switch models, you mostly change configuration values.
Why researchers like {ellmer}
- Unified syntax for OpenAI, Anthropic, Gemini, and other APIs
- Built-in streaming and function-calling support
- Structured JSON output parsing for downstream R workflows
- Compatible with both cloud and local OpenAI-style endpoints
Installation and Setup
install.packages("ellmer")
library(ellmer)
# Set API key (example: OpenAI)
Sys.setenv(OPENAI_API_KEY = "your_api_key_here")
# Create chat connection
chat <- llm_chat(
provider = "openai",
model = "gpt-4o-mini",
key = Sys.getenv("OPENAI_API_KEY")
)Example: Clustering Student Reflections
responses <- c(
"I learned how to write better R code.",
"Collaborating with peers improved my understanding.",
"I struggled with data visualization."
)
prompt <- paste("Cluster these reflections into 3 short themes:\n-",
paste(responses, collapse = "\n- "))
result <- llm_chat_complete(
chat,
messages = list(
list(role = "system", content = "You are an assistant helping analyze educational reflections."),
list(role = "user", content = prompt)
)
)
cat(result$choices[[1]]$message$content)Nice bonus: you can usually switch models by changing only
provider =andmodel =.
Typical Educational Use Cases
- Automated qualitative coding of survey or interview data
- Summarizing student feedback at scale
- Drafting analytic memos or interpretive summaries
- Generating or evaluating teaching materials
- Rapid prototyping for mixed-methods research designs
Reproducibility and Ethical Considerations
Because cloud services change often, careful documentation is essential:
- Record model name, version, and query date
- De-identify any personal or institutional information before upload
- Monitor API usage and costs (token-based billing)
- Disclose clearly how LLMs assisted analysis or interpretation
Responsible AI use means balancing convenience with privacy, transparency, and data stewardship.
Summary
Cloud LLMs are fast, powerful, and easy to start with. For many projects, they are the best environment for exploration and iteration.
Next, we turn to local LLMs, where privacy and institutional control become the priority.
5.3 Local LLMs: Privacy-Preserving and Offline Analysis
Cloud models are convenient, but they can raise real concerns around privacy, cost, and IRB compliance. Local LLMs address these concerns by running directly on your machine.
What Are Local LLMs?
Local LLMs are open or custom models executed on local hardware. They process text without sending data to external servers, giving you full data sovereignty.
Common open-source families: Llama 3, Qwen, DeepSeek, Mistral, gpt-oss.
Key Advantages
- Data never leaves your device
- No API keys or internet required
- Highly customizable and often cost-free
- Enables fully offline reproducible analysis
Getting Started with LM Studio
LM Studio is a free cross-platform desktop app for running and managing local LLMs. It gives you a GUI for downloading models, testing prompts, and (optionally) exposing a REST API for automation.
Supported Platforms: macOS (Apple Silicon), Windows (x64/ARM64), Linux (x64)
Docs: lmstudio.ai/docs
Installation Steps
- Download LM Studio for your system from the official site.
- Install and launch the application.
- Download a model such as Llama 3, Qwen, Mistral, or DeepSeek.
- (Optional) Enable API access for scripting.
- (Optional) Attach local documents to enable offline “Chat with Documents” (RAG mode).
Main Features
| Feature | Description |
|---|---|
| Local LLMs | Run models offline on your own machine |
| Chat Interface | Simple prompt-based GUI |
| Document Chat (RAG) | Offline “chat with your PDFs” |
| Model Management | Search, download, and switch models |
| API Access | OpenAI-compatible REST endpoints |
| Community Support | Active Discord and docs |
Calling the LM Studio API from R
LM Studio provides an OpenAI-compatible API, so your R code can look very similar to cloud-based examples:
library(httr)
library(jsonlite)
prompt <- "Summarize the following open-ended survey responses: ..."
response <- POST(
url = "http://localhost:1234/v1/completions",
body = toJSON(list(prompt = prompt, max_tokens = 200), auto_unbox = TRUE),
encode = "json"
)
content(response)Because requests stay local, sensitive data does not leave your environment.
5.4 Cloud vs. Local LLMs: Choosing the Right Tool
| Criterion | Cloud-based LLMs | Local LLMs |
|---|---|---|
| Cost | Pay-per-token or subscription | Free (after hardware) |
| Privacy | Data sent to provider | Data stays local |
| Performance | Highest accuracy & speed | Depends on hardware |
| Maintenance | Automatic updates | Manual model management |
| Customization | Limited fine-tuning | Fully modifiable |
| Best for | Large public datasets or prototype analysis | Sensitive or regulated data |
A common workflow is hybrid: prototype quickly in the cloud, then reproduce locally for privacy and auditability.
5.5 Practical Setup Checklist
Before running LLM-based analyses, check the following:
- ✅ Select your preferred model and platform
- ✅ Configure API key (cloud) or local endpoint (LM Studio)
- ✅ Test connectivity with a short prompt
- ✅ Log model name, version, and date
- ✅ De-identify data and store outputs securely
This small checklist can save you from major reproducibility and compliance problems later.
5.6 Summary
Both cloud and local LLMs can support strong educational research. The right choice depends on your project constraints.
| Use Case | Cloud LLM | Local LLM |
|---|---|---|
| Rapid prototyping | ✅ | ⚪ |
| Large-scale text processing | ✅ | ⚪ |
| Sensitive student data | ⚪ | ✅ |
| Offline analysis | ⚪ | ✅ |
| Long-term reproducibility | ⚪ | ✅ |
In short: cloud LLMs excel at convenience and scale; local LLMs excel at privacy and control. Most research programs benefit from using both strategically.
Looking Ahead
- Chapter 7 demonstrates thematic and qualitative text analysis with LM Studio, including end-to-end coding and synthesis.
- Chapter 8 extends this workflow to multimodal data (images), showing how AI can connect diverse forms of evidence in educational research.