ALTER-Math

AI Models and Tools

Llama 3 – RLRGD Educational Trustworthy AI

13B-parameter pedagogy-aware dialogue model for K–12 math tutors

Llama 3 – RLRGD Educational Trustworthy AI is a 13B-parameter model fine-tuned from Meta’s Llama 3 using our RLRGD (Reinforcement Learning from Reverse-Generated Data) framework to support trustworthy, pedagogy-aware dialogue in K–12 education, especially in math. Trained on anonymized ALTER-Math student–teacher conversations, reverse-generated unsafe or inadequate samples, and Safe-LLM–based synthetic data, the model is optimized via a custom RL objective to favor teacher-like, facilitative responses while reducing unsafe, irrelevant, or excessively verbose outputs. It is designed for use in interactive math tutors, AI teaching assistants, and educational chatbot environments where safety, personalization, and instructional quality are critical.
Model page: uf-aice-lab/llama3-rlrgd-edu-trustworthy

Multimodal ViT-L14

Vision–language model for automatic scoring of mathematics homework

Multimodal ViT-L14 is a vision–language model for automatic scoring of mathematics homework that jointly leverages screenshots of problems and student work together with textual descriptions and responses. Built on the OpenCLIP ViT-L14 encoder with a three-layer regression head, it produces embeddings suitable for retrieval (e.g., via cosine similarity to past graded examples) and generates continuous scores that approximate teacher-assigned grades. Trained on a multimodal math dataset with teacher scores between 0 and 1 and optimized with MSE loss using LoRA fine-tuning, the model supports flexible pipelines that combine nearest-neighbor retrieval and automated scoring to assist teachers in evaluating student work.
Model page: uf-aice-lab/Multimodal_ViT_L14

BERT-based Multi-label Cognitive Load Classifier

Text model for multi-label cognitive and affective state detection

The BERT-based Multi-label Cognitive Load Classifier is a fine-tuned bert-base-uncased model that identifies multiple cognitive and affective states from K–12 student–AI dialogues in math learning environments. Given a conversation, it predicts binary labels for constructs such as math confidence/anxiety, AI confidence/concerns, and intrinsic, extraneous, and germane cognitive load, based on data collected from 160 students over 1,440 interactions with an AI-powered teachable agent. Trained with a standard multi-label text classification setup (BCE-with-logits loss, AdamW, and a held-out test set), the model is intended for unobtrusive assessment in learning analytics pipelines, helping researchers, developers, and teachers monitor student experience and adapt AI tutors in real time.
Model page: uf-aice-lab/congtive_load_multilabel

DeepSeek-1.5B 23-label Adapter

Parameter-efficient adapter for 23-label NLP classification on DeepSeek-R1 1.5B

deepseek1.5b_23labels is a parameter-efficient adapter built on top of the reasoning-focused deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B model, turning it into a lightweight 23-label classifier for downstream NLP tagging and prediction tasks. Distributed in PEFT format, it reuses the strong language understanding and reasoning capabilities of the base 1.5B model while adding a compact classification head, making it suitable for scenarios where you need to score or categorize text along multiple dimensions without retraining a full large language model.
Adapter page: uf-aice-lab/deepseek1.5b_23labels

BLIPNet Model

BLIP-based vision–language model for math captioning and classification

BLIPNet is a PyTorch-based vision–language model structure that combines BLIP’s image-to-text generation capabilities with a lightweight classification head for math-related tasks. Built on top of the Salesforce/blip-image-captioning-base checkpoint and aligned with the UF AICE Lab BLIP-Math pipeline, it allows users to generate text (e.g., math captions or explanations) from images while simultaneously producing 5-class classification logits from the same embeddings. You can either load the released BLIP_Math_Generation_Classification weights directly or use this architecture as a template to scale up the classification head (for example, by increasing the hidden dimension) for your own domain-specific applications.
Model page: uf-aice-lab/BLIP_Math_Generation_Classification

Open-source Repositories

RAG Knowledge Graph

Retrieval-augmented generation framework with vector, tree, and knowledge-graph retrieval

RAG Knowledge Graph is a research and development repository for exploring retrieval-augmented generation (RAG) with large language models such as GPT-4 and Llama 3. It implements multiple retrieval pipelines—including vector-based, tree-based, and knowledge-graph–based context retrieval—alongside a semantic classifier that leverages these representations. The repo also provides a web application that uses knowledge-graph–based retrieval to support interactive querying and analysis over document collections.
Repository: uf-aice-lab/rag-knowledge-graph

Adversarial Math Word Problem Generation

EMNLP 2024 codebase for adversarial math word problem generation and LLM robustness analysis

Adversarial Math Word Problem Generation is the official EMNLP 2024 codebase for creating adversarial variants of math word problems that reliably break large language models while preserving the original problem structure and difficulty. The repository provides pipelines for generating adversarial problems using abstract syntax tree–based numeric perturbations, attacking a range of open- and closed-source LLMs, and analyzing their failure modes to reveal shared vulnerabilities in math reasoning.
Repository: ruoyuxie/adversarial_mwps_generation