LLM Methods

Overview

As educational researchers integrate large language models into data analysis workflows, our methodological responsibilities evolve in important ways (Kasneci et al., 2023). There are three primary approaches to this integration:

first, using LLMs for code autocompletion and inline coding assistance, such as through integrated development environments like Positron;
second, using LLMs to support the analysis process through complete code generation and supervised execution, such as with tools like Posit Databot; and
third, deploying LLMs to directly analyze qualitative (e.g., text data).

The third approach in particular raises questions about research integrity — including correctness, transparency, and reproducibility — that are worth examining carefully before treating LLM outputs as research evidence (Bender et al., 2021).

The Responsible Use Framework

The responsible use framework for LLMs in research, presented by Joe Cheng at R+AI 2025 (Cheng, 2025), can be used as a practical evaluation tool centering on three essential criteria. Correctness refers to whether the LLM produces accurate and reliable results that can be verified and trusted. Transparency addresses whether the process and reasoning behind the AI’s outputs are visible and understandable to researchers, allowing them to inspect how conclusions were reached. Reproducibility concerns whether the same analysis can be repeated and yield consistent results, a foundational requirement of scientific research. These three categories align with broader responsible AI governance principles commonly found in organizational frameworks but are specifically tailored to the research context where verifiability and scientific rigor are key. For an LLM application in research to be considered responsibly used, it should ideally achieve “yes” answers across all three criteria, ensuring that the technology enhances rather than compromises research integrity.

Applying the Responsible Use Framework

When we examine the first two usage types (e.g., code autocomplete and code generation) through the responsible usage framework, we find more encouraging alignment with the three essential criteria of correctness, transparency, and reproducibility . Code-generating and code-assisting LLMs produce verifiable output that researchers can inspect, test, and validate before execution, ensuring correctness through human review. The process maintains transparency because the generated code itself is visible and interpretable, allowing researchers to understand exactly what analytical steps are being performed. Reproducibility is achieved because the same code can be run multiple times on the same data to yield consistent results, and the code can be shared alongside research findings.

In contrast, the third approach where LLMs directly analyze qualitative or text data within a black box that is the LLM may raise critical questions about research integrity. Using LLMs for this purpose present inherent challenges against these principles: they are notorious for generating convincing but incorrect answers, operate as black boxes with limited transparency, and lack reproducibility due to their non-deterministic nature. When we apply this framework to direct text analysis by LLMs, significant concerns may emerge: we cannot guarantee correctness, the process lacks transparency, and reproducibility remains uncertain at best.

Achieving Responsible Usage Through Evidence-Based Results

However, for the third type of usage where LLMs directly analyze data (as we do in Chapters 9 and 10), we can work toward achieving “yes” answers across all (or as many as possible) three framework criteria by requiring LLMs to produce evidence-based results. As we cover in Chapter 9, it is possible to use a local LLM that is connected to R/Positron. In this approach, the researcher has control over the data analysis over simply pasting text into an LLMs chatbox. For example, this can be done by structuring prompts to demand transparent, verifiable outputs, such as requiring the LLM to identify themes, provide verbatim quotes from the source data, report frequencies with counts and percentages, and present findings in standardized tabular formats. We transform the black box into a more transparent analytical tool. This approach ensures correctness by grounding every claim in specific textual evidence that researchers can verify, enhances transparency by making the analytical reasoning traceable through quoted examples and quantitative metrics, and improves reproducibility by standardizing the output format and maintaining clear documentation of the analytical process. When LLMs are constrained to cite their sources, quantify their observations, and structure their findings systematically, they shift from opaque pattern generators to accountable research assistants whose work can be validated against the original data. This methodology aligns with the recommendation for constrained use and micromanaged implementation, ensuring that LLMs remain within appropriate boundaries while still providing valuable analytical support for qualitative research.