Auralis: A Next-Generation Small Language Model for Nuanced NLP in Resource-Constrained Environments

Ron Asnahon, January 2025

Abstract

This paper presents Auralis, Corea STARSTROUPE’s next-generation small language model (SLM), designed for nuanced natural language processing in resource-constrained environments. Building upon NeuroLite, Auralis introduces hierarchical token routing, dynamic context embedding, and neurosymbolic fusion. At 8.7 million parameters, it nearly doubles the cognitive capacity of Nexora’s NeuroLite-4M while preserving energy efficiency. We outline its architecture, multistage training process, intent decoding framework, and quantitative benchmarks demonstrating domain transfer capabilities across finance, education, and conversational AI.

1. Introduction

Small language models (SLMs) are increasingly vital for scalable, sustainable NLP in edge devices and low-bandwidth platforms. Auralis, developed by COREA Starstroupe, builds on Nexora’s NeuroLite-4M to deliver nuanced understanding of temporality, subjectivity, and context with minimal computational overhead. This paper details Auralis’s architecture, training pipeline, and benchmarks, advancing COREA Starstroupe’s open-source mission to enhance human-machine interaction.

2. System Architecture

2.1 Summary

Auralis is defined by:

Parameters: 8.7M
Layers: 10 Transformer-lite blocks
Embedding Dimension: 192
Sequence Limit: 384 tokens
Custom Vocabulary: 24,000 tokens (domain-adaptive)

2.2 Key Innovations

Auralis introduces:

Context Drift Correction Module (CDCM): Realigns embeddings during long utterances using temporal attention weights.
Hierarchical Routing Transformer: Dynamically assigns attention depth based on token relevance, reducing computational load.
Symbolic-Augmented Decoder (SAD): Integrates basic logic rules to enhance low-context inference accuracy.

2.3 Architecture Schematic (Abbreviated)

Token Embed → Positional Encode → CDCM → 10x {LayerNorm → MH Attention → Residual → FFN → Routing Dropout} → SAD → Output Layer

3. Training Pipeline

3.1 Corpus

The training corpus includes:

6.2M tokenized conversational snippets
Sources: educational Q&A, medical dialogues, productivity commands
Preprocessing: noise reduction, coreference resolution, punctuation normalization

3.2 Objectives

Combined objectives:

Masked Language Modeling (MLM) with 18% dynamic token masking
Sentence Order Prediction
Intent Discrimination

3.3 Optimization Parameters

Training setup:

Optimizer: LAMB
Initial LR: 2.5e-4, cosine warmup over 20k steps
Epochs: 12 full passes
Hardware: 2× A100 GPUs (32GB)

3.4 Convergence Graph

Total training loss over time t is defined as:

L(t) = L_MLM(t) + L_SOP(t) + L_ID(t)

Solution: Assuming L_MLM is cross-entropy loss, L_SOP is binary cross-entropy, and L_ID is softmax cross-entropy, for a batch at step t:

L_MLM(t) = -Σ [y_i log(p_i)]

L_SOP(t) = -Σ [y_j log(q_j) + (1-y_j) log(1-q_j)]

L_ID(t) = -Σ [y_k log(r_k)]

With empirical convergence at t = 120k steps, L(t) ≈ 0.85 (sum of weighted losses, normalized by batch size).

4. Intent Embedding and Query Decoding

Pooled embedding h_pool is computed from the last token layer:

h_pool = (1/n) Σ h_L,i

Feed into a 4-head intent decoder:

Dense(192→96) → ReLU
Dense(96→32) → ReLU
Dense(32→8) → Softmax (intent logits)

Real-world benchmarks:

Domain	Accuracy	Latency (ms)	Energy (Wh/100 inf)
Productivity	93.4%	15.2	0.13
Conversational	89.1%	14.7	0.12
Medical	87.3%	18.5	0.15

5. Computational Efficiency

5.1 FLOPs Estimate

For layers L=10, heads H=4, embedding dim d=192:

FLOPs ≈ 2 * L * (4 * d * d * n + H * n * d)

Solution: For sequence length n=384:

FLOPs ≈ 2 * 10 * (4 * 192 * 192 * 384 + 4 * 384 * 192)

= 20 * (4 * 147,456 * 384 + 1,536 * 192)

= 20 * (226,492,416 + 294,912)

= 20 * 226,787,328 ≈ 4.536 × 10⁹ FLOPs per forward pass

5.2 Model Size (Quantized)

Model sizes:

Full precision: 34.2MB
8-bit quantized: 9.6MB
Speed boost: +63% on Raspberry Pi 5 (vs NeuroLite)

6. Ablation Analysis

Ablation results for productivity domain:

Configuration	Intent Accuracy
Base (no CDCM)	83.5%
w/ CDCM	87.4%
+ SAD module	90.2%
+ Routing Transformer	93.4%

CDCM and Routing Transformers significantly improved resilience to ambiguous phrasing.

7. Conclusion

Auralis, developed by COREA Starstroupe, advances small language models with modular, neurosymbolic, and efficient NLP capabilities. Its performance across domains and suitability for edge devices position it as a benchmark for mobile cognition and on-device assistants, aligning with COREA Starstroupe’s non-profit mission. Future extensions include multilingual fine-tuning, incremental context memory, and deployment on wearables.

Appendix A: Layer Norm Distribution (Avg across Epoch 10)

Layer	Mean Gamma	Mean Beta
L3	0.97	-0.12
L7	1.04	0.08
L10	0.99	-0.03

References

Chowdhery, A., et al. (2022). PaLM: Scaling Language Models with Pathways. arXiv preprint arXiv:2204.02311.
Corea STARSTROUPE Auralis Blueprint Series. (2024). Internal Documentation.
Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers. arXiv preprint arXiv:1810.04805.
Wang, S., et al. (2020). Linformer: Reducing Transformer Memory Footprint. arXiv preprint arXiv:2006.04768.