NeuroLite-4M: A Compact Small Language Model for Real-Time Edge Processing

Abstract

This paper details the foundational research behind Nexora AI’s initial prototype: a compact, energy-efficient small language model (SLM) designed for real-time edge processing and scalable intent detection. Built upon lightweight transformer variants and quantized inference strategies, Nexora introduces a minimal architecture named NeuroLite-4M, capable of high-throughput semantic inference with just 4 million parameters. The paper includes architectural blueprints, early benchmarks, and theoretical justifications for low-complexity natural language understanding.

1. Introduction

Large-scale NLP models often demand significant computational resources, making them impractical for edge devices or privacy-sensitive applications. Nexora AI, a project under COREA Starstroupe’s open-source initiative, develops NeuroLite-4M, a small language model (SLM) optimized for speed, efficiency, and cognitive compression. Designed for intent detection, query classification, and semantic entity extraction, NeuroLite-4M operates with minimal resources, supporting COREA Starstroupe’s mission to advance accessible human-machine interaction.

2. Model Architecture Overview

2.1 Base Configuration

NeuroLite-4M is defined by:

2.2 Layer Composition

Each transformer block consists of:

The attention mechanism employs Linformer-style key compression, reducing complexity from O(n²) to O(n), where n is sequence length.

3. Quantization Strategy

Weights are quantized to 8-bit integers using symmetric per-tensor quantization for deployment on low-power CPUs and microcontrollers.

3.1 Quantization Function

Q(w) = round(w / s) * s, s = max(|w|) / 127

This minimizes distortion and is applied post-training. Quantization reduced model size from 14.5MB to 3.8MB and improved inference speed by 42% on ARM Cortex-A72.

4. Training Regime

4.1 Corpus Composition

The training corpus comprises:

4.2 Objective

Masked Language Modeling (MLM) with 15% token masking per sequence.

4.3 Optimization

Training parameters:

Loss converged after 120k steps:

L = -Σ log P(wi | w1, ..., wi-1)

5. Intent Detection via Latent Embedding

Final token embeddings hL are extracted and mean-pooled:

hpool = (1/n) Σ hL,i

A 3-layer intent classifier is applied:

Early benchmarks:

Task Accuracy
Binary Intent Match 91.2%
Multi-class Intent ID 84.5%
Semantic Clustering 78.3%

Intent classes: query, command, question, affirmation, cancellation.

6. Energy Efficiency Analysis

6.1 Compute Profile (per 100 inferences)

Performance on Raspberry Pi 4:

Compared to distilled BERT:

7. Conclusion and Next Steps

NeuroLite-4M, developed under COREA Starstroupe’s Nexora AI, demonstrates the efficacy of small language models in constrained environments, supporting ambient intent recognition and privacy-first applications. Future work in 2024 will focus on multilingual support, context memory, and task-specific fine-tuning, advancing COREA Starstroupe’s open-source mission.

Appendix: Layer Weight Distribution (Truncated)

Layer Mean Weight Std Dev Max Weight Min Weight
L1 0.034 0.122 0.98 -0.91
L4 0.029 0.087 0.76 -0.66
L6 0.031 0.113 0.82 -0.73

References