LLMs Methods

5.1 Why Large Language Models Matter in Educational Research

Large Language Models (LLMs) have transformed how researchers analyze, generate, and interpret text. Unlike traditional NLP pipelines that rely on token counts and surface patterns, LLMs reason over context, semantics, and discourse structure.

In educational research, this means: - Summarizing and coding open-ended student reflections - Analyzing institutional policy documents - Generating scaffolds or rubrics for teaching materials - Synthesizing qualitative and quantitative findings into narratives

LLMs thus expand the researcher’s computational toolkit—from statistical pattern recognition to context-aware meaning-making.


5.2 Cloud-based LLMs: Capabilities and Setup

Cloud-based Large Language Models (LLMs) offer access to state-of-the-art generative and analytical capabilities without requiring local hardware or model management. They run on remote servers and are accessed via APIs—making them ideal for rapid prototyping, large-scale text processing, and exploratory analyses in educational research.


5.2.1 Overview of Major Cloud Providers

The landscape of LLMs evolves quickly. As of 2025, the following providers represent the most common cloud-based options for educational data analysis.

Provider Example Models Access R / HTTP Wrapper
OpenAI GPT-4o / GPT-5 API key via platform.openai.com {httr2}, {openai}, {ellmer}
Anthropic Claude 3 Opus · Claude 4 API key via console.anthropic.com {httr2}, {anthropic}, {ellmer}
Google Gemini 2.5 · Gemini Ultra (2025) AI Studio {googleGenerativeAI}, {ellmer}
Mistral AI Mixtral 8×22B · Mistral 7B v0.3 mistral.ai HTTP via {httr2}
DeepSeek DeepSeek-V3.1 (MoE 128K ctx) API via deepseek.ai HTTP via {httr2}

💡 Tip: Always confirm the latest model names, context-window sizes, and pricing tiers on each provider’s documentation page. Many vendors update models quarterly, affecting reproducibility.


5.2.2 Connecting to a Cloud API from R

Below is a minimal reproducible example using the OpenAI API through the {httr2} package. The same structure applies to most HTTP-based providers.

library(httr2)
library(jsonlite)

resp <- request("https://api.openai.com/v1/chat/completions") |>
  req_headers(
    "Authorization" = paste("Bearer", Sys.getenv("OPENAI_API_KEY")),
    "Content-Type"  = "application/json"
  ) |>
  req_body_json(list(
    model = "gpt-4o-mini",
    messages = list(
      list(role = "system", content = "You are an assistant helping with educational data analysis."),
      list(role = "user",   content = "Summarize these student reflections in three themes.")
    )
  )) |>
  req_perform()

content <- resp_body_json(resp)
cat(content$choices[[1]]$message$content)

🧩 This pattern underlies all API workflows:

(1) authenticate → (2) send prompt → (3) receive structured response.


5.2.3 Using the {ellmer} Package — A Unified LLM Interface for R

The ellmer package, developed within the Posit/Tidyverse ecosystem, provides a consistent way to access multiple LLM providers from R through one simple interface.

Key advantages of {ellmer}

  • Unified syntax for OpenAI, Anthropic, Gemini, and other APIs
  • Built-in streaming and function-calling support
  • Structured JSON output parsing for downstream R workflows
  • Compatible with both cloud and local OpenAI-style endpoints

Installation and Setup

install.packages("ellmer")
library(ellmer)

# Set API key (example: OpenAI)
Sys.setenv(OPENAI_API_KEY = "your_api_key_here")

# Create chat connection
chat <- llm_chat(
  provider = "openai",
  model    = "gpt-4o-mini",
  key      = Sys.getenv("OPENAI_API_KEY")
)

Example: Clustering Student Reflections

responses <- c(
  "I learned how to write better R code.",
  "Collaborating with peers improved my understanding.",
  "I struggled with data visualization."
)

prompt <- paste("Cluster these reflections into 3 short themes:\n-",
                paste(responses, collapse = "\n- "))

result <- llm_chat_complete(
  chat,
  messages = list(
    list(role = "system", content = "You are an assistant helping analyze educational reflections."),
    list(role = "user",   content = prompt)
  )
)

cat(result$choices[[1]]$message$content)

✅ The {ellmer} interface lets you swap models by changing only

provider = and model = arguments—no need to rewrite your R code.


5.2.4 Typical Educational Use Cases

  • Automated qualitative coding of survey or interview data
  • Summarizing student feedback at scale
  • Drafting analytic memos or interpretive summaries
  • Generating or evaluating teaching materials
  • Rapid prototyping for mixed-methods research designs

5.2.5 Reproducibility and Ethical Considerations

Because cloud services update frequently, reproducibility requires careful documentation:

  • Record model name, version, and query date
  • De-identify any personal or institutional information before upload
  • Monitor API usage and costs (token-based billing)
  • Disclose clearly how LLMs assisted analysis or interpretation

⚖️ Responsible use of cloud LLMs means balancing analytical convenience with ethical data stewardship.


5.2.6 Summary

Cloud-based LLMs provide high-performance, instantly accessible text-analysis capabilities for educational research.

They are ideal for large-scale data exploration, qualitative summarization, and prototyping analytical workflows.

Next we will shift focus from remote APIs to local LLMs,

exploring how offline tools such as LM Studio can achieve similar capabilities while maintaining full data privacy.

5.3 Local LLMs: Privacy-Preserving and Offline Analysis

While cloud models offer convenience, they also raise concerns around data privacy, cost, and IRB compliance.

Local LLMs solve these challenges by running entirely on your own computer.

5.3.1 What Are Local LLMs?

Local LLMs are open or custom models executed on your local hardware.

They process text without sending it to external servers—ensuring full data sovereignty.

Common open-source families: Llama 3, Qwen, DeepSeek, Mistral, gpt-oss.

Key Advantages

  • Data never leaves your device
  • No API keys or internet required
  • Highly customizable and often cost-free
  • Enables fully offline reproducible analysis

5.3.2 Getting Started with LM Studio

LM Studio is a free, cross-platform desktop application for managing and running local LLMs.

It provides a GUI for model downloads, prompt testing, and an optional REST API for automation.

Supported Platforms: macOS (Apple Silicon), Windows (x64/ARM64), Linux (x64)

Docs: lmstudio.ai/docs

Installation Steps

  1. Download LM Studio for your system from the official site.
  2. Install and launch the application.
  3. Download a model such as Llama 3, Qwen, Mistral, or DeepSeek.
  4. (Optional) Enable API access for scripting.
  5. (Optional) Attach local documents to enable offline “Chat with Documents” (RAG mode).

Main Features

Feature Description
Local LLMs Run models offline on your own machine
Chat Interface Simple prompt-based GUI
Document Chat (RAG) Offline “chat with your PDFs”
Model Management Search, download, and switch models
API Access OpenAI-compatible REST endpoints
Community Support Active Discord and docs

5.3.3 Calling the LM Studio API from R

LM Studio exposes an OpenAI-compatible REST API, so R code can look almost identical to the cloud example:

library(httr)
library(jsonlite)

prompt <- "Summarize the following open-ended survey responses: ..."

response <- POST(
  url  = "http://localhost:1234/v1/completions",
  body = toJSON(list(prompt = prompt, max_tokens = 200), auto_unbox = TRUE),
  encode = "json"
)

content(response)

🔐 Because the request stays within your local network, no data ever leaves your computer.


5.4 Cloud vs Local LLMs: Choosing the Right Tool

Criterion Cloud-based LLMs Local LLMs
Cost Pay-per-token or subscription Free (after hardware)
Privacy Data sent to provider Data stays local
Performance Highest accuracy & speed Depends on hardware
Maintenance Automatic updates Manual model management
Customization Limited fine-tuning Fully modifiable
Best for Large public datasets or prototype analysis Sensitive or regulated data

🧭 Many educational researchers prototype analyses on the cloud for speed, then reproduce them locally for privacy and reproducibility.


5.5 Practical Setup Checklist

Before running LLM-based analyses:

  • ✅ Select your preferred model and platform
  • ✅ Configure API key (cloud) or local endpoint (LM Studio)
  • ✅ Test connectivity with a short prompt
  • ✅ Log model name, version, and date
  • ✅ De-identify data and store outputs securely

✨ Following these steps ensures your AI-assisted research remains ethical, reproducible, and IRB-compliant.


5.6 Summary

Both cloud-based and local LLMs enable researchers to integrate generative AI into educational inquiry.

Use Case Cloud LLM Local LLM
Rapid prototyping
Large-scale text processing
Sensitive student data
Offline analysis
Long-term reproducibility

In short, cloud LLMs excel in convenience and scale,

while local LLMs excel in privacy and control.

Most projects benefit from combining both.


Looking Ahead

  • Chapter 6 will demonstrate thematic and qualitative text analysis using LM Studio,

    showing how local LLMs can perform end-to-end qualitative coding and synthesis.

  • Chapter 7 extends this workflow to multimodal data—images — illustrating how AI can connect diverse data types in educational contexts.