Auralis: A Next-Generation Small Language Model for Nuanced NLP in Resource-Constrained Environments
Abstract
This paper presents Auralis, Corea STARSTROUPE’s next-generation small language model (SLM), designed for nuanced natural language processing in resource-constrained environments. Building upon NeuroLite, Auralis introduces hierarchical token routing, dynamic context embedding, and neurosymbolic fusion. At 8.7 million parameters, it nearly doubles the cognitive capacity of Nexora’s NeuroLite-4M while preserving energy efficiency. We outline its architecture, multistage training process, intent decoding framework, and quantitative benchmarks demonstrating domain transfer capabilities across finance, education, and conversational AI.
1. Introduction
Small language models (SLMs) are increasingly vital for scalable, sustainable NLP in edge devices and low-bandwidth platforms. Auralis, developed by COREA Starstroupe, builds on Nexora’s NeuroLite-4M to deliver nuanced understanding of temporality, subjectivity, and context with minimal computational overhead. This paper details Auralis’s architecture, training pipeline, and benchmarks, advancing COREA Starstroupe’s open-source mission to enhance human-machine interaction.
2. System Architecture
2.1 Summary
Auralis is defined by:
- Parameters: 8.7M
- Layers: 10 Transformer-lite blocks
- Embedding Dimension: 192
- Sequence Limit: 384 tokens
- Custom Vocabulary: 24,000 tokens (domain-adaptive)
2.2 Key Innovations
Auralis introduces:
- Context Drift Correction Module (CDCM): Realigns embeddings during long utterances using temporal attention weights.
- Hierarchical Routing Transformer: Dynamically assigns attention depth based on token relevance, reducing computational load.
- Symbolic-Augmented Decoder (SAD): Integrates basic logic rules to enhance low-context inference accuracy.
2.3 Architecture Schematic (Abbreviated)
Token Embed → Positional Encode → CDCM → 10x {LayerNorm → MH Attention → Residual → FFN → Routing Dropout} → SAD → Output Layer
3. Training Pipeline
3.1 Corpus
The training corpus includes:
- 6.2M tokenized conversational snippets
- Sources: educational Q&A, medical dialogues, productivity commands
- Preprocessing: noise reduction, coreference resolution, punctuation normalization
3.2 Objectives
Combined objectives:
- Masked Language Modeling (MLM) with 18% dynamic token masking
- Sentence Order Prediction
- Intent Discrimination
3.3 Optimization Parameters
Training setup:
- Optimizer: LAMB
- Initial LR: 2.5e-4, cosine warmup over 20k steps
- Epochs: 12 full passes
- Hardware: 2× A100 GPUs (32GB)
3.4 Convergence Graph
Total training loss over time t is defined as:
L(t) = LMLM(t) + LSOP(t) + LID(t)
Solution: Assuming LMLM is cross-entropy loss, LSOP is binary cross-entropy, and LID is softmax cross-entropy, for a batch at step t:
LMLM(t) = -Σ [yi log(pi)]
LSOP(t) = -Σ [yj log(qj) + (1-yj) log(1-qj)]
LID(t) = -Σ [yk log(rk)]
With empirical convergence at t = 120k steps, L(t) ≈ 0.85 (sum of weighted losses, normalized by batch size).
4. Intent Embedding and Query Decoding
Pooled embedding hpool is computed from the last token layer:
hpool = (1/n) Σ hL,i
Feed into a 4-head intent decoder:
- Dense(192→96) → ReLU
- Dense(96→32) → ReLU
- Dense(32→8) → Softmax (intent logits)
Real-world benchmarks:
Domain | Accuracy | Latency (ms) | Energy (Wh/100 inf) |
---|---|---|---|
Productivity | 93.4% | 15.2 | 0.13 |
Conversational | 89.1% | 14.7 | 0.12 |
Medical | 87.3% | 18.5 | 0.15 |
5. Computational Efficiency
5.1 FLOPs Estimate
For layers L=10, heads H=4, embedding dim d=192:
FLOPs ≈ 2 * L * (4 * d * d * n + H * n * d)
Solution: For sequence length n=384:
FLOPs ≈ 2 * 10 * (4 * 192 * 192 * 384 + 4 * 384 * 192)
= 20 * (4 * 147,456 * 384 + 1,536 * 192)
= 20 * (226,492,416 + 294,912)
= 20 * 226,787,328 ≈ 4.536 × 10⁹ FLOPs per forward pass
5.2 Model Size (Quantized)
Model sizes:
- Full precision: 34.2MB
- 8-bit quantized: 9.6MB
- Speed boost: +63% on Raspberry Pi 5 (vs NeuroLite)
6. Ablation Analysis
Ablation results for productivity domain:
Configuration | Intent Accuracy |
---|---|
Base (no CDCM) | 83.5% |
w/ CDCM | 87.4% |
+ SAD module | 90.2% |
+ Routing Transformer | 93.4% |
CDCM and Routing Transformers significantly improved resilience to ambiguous phrasing.
7. Conclusion
Auralis, developed by COREA Starstroupe, advances small language models with modular, neurosymbolic, and efficient NLP capabilities. Its performance across domains and suitability for edge devices position it as a benchmark for mobile cognition and on-device assistants, aligning with COREA Starstroupe’s non-profit mission. Future extensions include multilingual fine-tuning, incremental context memory, and deployment on wearables.
Appendix A: Layer Norm Distribution (Avg across Epoch 10)
Layer | Mean Gamma | Mean Beta |
---|---|---|
L3 | 0.97 | -0.12 |
L7 | 1.04 | 0.08 |
L10 | 0.99 | -0.03 |
References
- Chowdhery, A., et al. (2022). PaLM: Scaling Language Models with Pathways. arXiv preprint arXiv:2204.02311.
- Corea STARSTROUPE Auralis Blueprint Series. (2024). Internal Documentation.
- Devlin, J., et al. (2018). BERT: Pre-training of Deep Bidirectional Transformers. arXiv preprint arXiv:1810.04805.
- Wang, S., et al. (2020). Linformer: Reducing Transformer Memory Footprint. arXiv preprint arXiv:2006.04768.