Cloud VS Local

5.1 Why Large Language Models Matter in Educational Research

If you have ever looked at a pile of open-ended student responses and thought, “I know there is something important in here, but where do I even start?” - you are exactly the kind of researcher LLMs can help.

Large Language Models (LLMs) have changed how we analyze, generate, and interpret text. Traditional NLP workflows are still valuable, but they usually focus on counts and patterns at the word level. LLMs add something different: they can reason across context, meaning, and discourse.

In educational research, that opens new possibilities:

  • Summarizing and coding open-ended reflections
  • Analyzing institutional policy documents
  • Drafting instructional scaffolds and rubrics
  • Connecting qualitative insights with quantitative findings

In short, LLMs expand your toolkit from pattern detection toward context-aware meaning-making.

5.2 Cloud-Based LLMs: Capabilities and Setup

Cloud-based LLMs are the quickest way to get started. You do not need to manage model files or local hardware. You send a request to a provider API, get a response, and continue your workflow.

That makes cloud models especially useful for rapid prototyping, large-scale text processing, and exploratory analysis.

Overview of Major Cloud Providers

The landscape moves fast. As of 2025, these are among the most common cloud options used in educational data analysis:

Provider Example Models Access R / HTTP Wrapper
OpenAI GPT-4o / GPT-5 API key via platform.openai.com {httr2}, {openai}, {ellmer}
Anthropic Claude 3 Opus · Claude 4 API key via console.anthropic.com {httr2}, {anthropic}, {ellmer}
Google Gemini 2.5 · Gemini Ultra (2025) AI Studio {googleGenerativeAI}, {ellmer}
Mistral AI Mixtral 8×22B · Mistral 7B v0.3 mistral.ai HTTP via {httr2}
DeepSeek DeepSeek-V3.1 (MoE 128K ctx) API via deepseek.ai HTTP via {httr2}

Tip: always verify model names, context limits, and pricing before you run a study. Providers update frequently, and those updates can affect reproducibility.

Connecting to a Cloud API from R

Here is a minimal example using the OpenAI API with {httr2}. The same pattern applies to most HTTP-based providers.

library(httr2)
library(jsonlite)

resp <- request("https://api.openai.com/v1/chat/completions") |>
  req_headers(
    "Authorization" = paste("Bearer", Sys.getenv("OPENAI_API_KEY")),
    "Content-Type"  = "application/json"
  ) |>
  req_body_json(list(
    model = "gpt-4o-mini",
    messages = list(
      list(role = "system", content = "You are an assistant helping with educational data analysis."),
      list(role = "user",   content = "Summarize these student reflections in three themes.")
    )
  )) |>
  req_perform()

content <- resp_body_json(resp)
cat(content$choices[[1]]$message$content)

Core API pattern: authenticate -> send prompt -> parse response.

Using {ellmer}: One Interface for Many Providers

The {ellmer} package (from the Posit ecosystem) gives you a unified interface across multiple providers. So instead of rewriting your code every time you switch models, you mostly change configuration values.

Why researchers like {ellmer}

  • Unified syntax for OpenAI, Anthropic, Gemini, and other APIs
  • Built-in streaming and function-calling support
  • Structured JSON output parsing for downstream R workflows
  • Compatible with both cloud and local OpenAI-style endpoints

Installation and Setup

install.packages("ellmer")
library(ellmer)

# Set API key (example: OpenAI)
Sys.setenv(OPENAI_API_KEY = "your_api_key_here")

# Create chat connection
chat <- llm_chat(
  provider = "openai",
  model    = "gpt-4o-mini",
  key      = Sys.getenv("OPENAI_API_KEY")
)

Example: Clustering Student Reflections

responses <- c(
  "I learned how to write better R code.",
  "Collaborating with peers improved my understanding.",
  "I struggled with data visualization."
)

prompt <- paste("Cluster these reflections into 3 short themes:\n-",
                paste(responses, collapse = "\n- "))

result <- llm_chat_complete(
  chat,
  messages = list(
    list(role = "system", content = "You are an assistant helping analyze educational reflections."),
    list(role = "user",   content = prompt)
  )
)

cat(result$choices[[1]]$message$content)

Nice bonus: you can usually switch models by changing only provider = and model =.

Typical Educational Use Cases

  • Automated qualitative coding of survey or interview data
  • Summarizing student feedback at scale
  • Drafting analytic memos or interpretive summaries
  • Generating or evaluating teaching materials
  • Rapid prototyping for mixed-methods research designs

Reproducibility and Ethical Considerations

Because cloud services change often, careful documentation is essential:

  • Record model name, version, and query date
  • De-identify any personal or institutional information before upload
  • Monitor API usage and costs (token-based billing)
  • Disclose clearly how LLMs assisted analysis or interpretation

Responsible AI use means balancing convenience with privacy, transparency, and data stewardship.

Summary

Cloud LLMs are fast, powerful, and easy to start with. For many projects, they are the best environment for exploration and iteration.

Next, we turn to local LLMs, where privacy and institutional control become the priority.

5.3 Local LLMs: Privacy-Preserving and Offline Analysis

Cloud models are convenient, but they can raise real concerns around privacy, cost, and IRB compliance. Local LLMs address these concerns by running directly on your machine.

What Are Local LLMs?

Local LLMs are open or custom models executed on local hardware. They process text without sending data to external servers, giving you full data sovereignty.

Common open-source families: Llama 3, Qwen, DeepSeek, Mistral, gpt-oss.

Key Advantages

  • Data never leaves your device
  • No API keys or internet required
  • Highly customizable and often cost-free
  • Enables fully offline reproducible analysis

Getting Started with LM Studio

LM Studio is a free cross-platform desktop app for running and managing local LLMs. It gives you a GUI for downloading models, testing prompts, and (optionally) exposing a REST API for automation.

Supported Platforms: macOS (Apple Silicon), Windows (x64/ARM64), Linux (x64)

Docs: lmstudio.ai/docs

Installation Steps

  1. Download LM Studio for your system from the official site.
  2. Install and launch the application.
  3. Download a model such as Llama 3, Qwen, Mistral, or DeepSeek.
  4. (Optional) Enable API access for scripting.
  5. (Optional) Attach local documents to enable offline “Chat with Documents” (RAG mode).

Main Features

Feature Description
Local LLMs Run models offline on your own machine
Chat Interface Simple prompt-based GUI
Document Chat (RAG) Offline “chat with your PDFs”
Model Management Search, download, and switch models
API Access OpenAI-compatible REST endpoints
Community Support Active Discord and docs

Calling the LM Studio API from R

LM Studio provides an OpenAI-compatible API, so your R code can look very similar to cloud-based examples:

library(httr)
library(jsonlite)

prompt <- "Summarize the following open-ended survey responses: ..."

response <- POST(
  url  = "http://localhost:1234/v1/completions",
  body = toJSON(list(prompt = prompt, max_tokens = 200), auto_unbox = TRUE),
  encode = "json"
)

content(response)

Because requests stay local, sensitive data does not leave your environment.


5.4 Cloud vs. Local LLMs: Choosing the Right Tool

Criterion Cloud-based LLMs Local LLMs
Cost Pay-per-token or subscription Free (after hardware)
Privacy Data sent to provider Data stays local
Performance Highest accuracy & speed Depends on hardware
Maintenance Automatic updates Manual model management
Customization Limited fine-tuning Fully modifiable
Best for Large public datasets or prototype analysis Sensitive or regulated data

A common workflow is hybrid: prototype quickly in the cloud, then reproduce locally for privacy and auditability.


5.5 Practical Setup Checklist

Before running LLM-based analyses, check the following:

  • ✅ Select your preferred model and platform
  • ✅ Configure API key (cloud) or local endpoint (LM Studio)
  • ✅ Test connectivity with a short prompt
  • ✅ Log model name, version, and date
  • ✅ De-identify data and store outputs securely

This small checklist can save you from major reproducibility and compliance problems later.


5.6 Summary

Both cloud and local LLMs can support strong educational research. The right choice depends on your project constraints.

Use Case Cloud LLM Local LLM
Rapid prototyping
Large-scale text processing
Sensitive student data
Offline analysis
Long-term reproducibility

In short: cloud LLMs excel at convenience and scale; local LLMs excel at privacy and control. Most research programs benefit from using both strategically.


Looking Ahead

  • Chapter 7 demonstrates thematic and qualitative text analysis with LM Studio, including end-to-end coding and synthesis.
  • Chapter 8 extends this workflow to multimodal data (images), showing how AI can connect diverse forms of evidence in educational research.