Completetinymodelraven Top Patched -
A lightweight safety filter is included in the safety/ folder of the repository. Enable it via:
Note: The Raven Top is slightly less accurate than models 10x its size, but 20x faster and smaller. For 90% of edge tasks, the trade-off is worth it. Even with a "Complete" model, you may encounter hiccups. Issue 1: "Unknown architecture 'raven'" Solution: Update your transformers library. The Raven architecture was merged in PR #28745. Alternatively, run pip install --upgrade transformers . Issue 2: Slow generation on the first run Solution: The "Top" version precomputes positional encodings on first load. This is normal. Subsequent runs will be fast. Issue 3: Memory leak during long generation (>2000 tokens) Solution: The Raven Top requires manual cache clearing. Use: completetinymodelraven top
tokenizer = AutoTokenizer.from_pretrained("completetinymodelraven_top") inputs = tokenizer("Explain quantum computing in one sentence:", return_tensors="pt").to("cuda") A lightweight safety filter is included in the
After fine-tuning, export the adapters. The resulting model will still run on the edge, but now specialized for your use case. Because the CompleteTinyModelRaven Top runs locally, there is no data leakage to API endpoints. However, the model is not aligned against harmful content by default. The base "Raven Top" was trained on a filtered Common Crawl subset, but developers should implement their own safety guardrails if deploying in public-facing applications. Even with a "Complete" model, you may encounter hiccups