VBART

Don't settle for less. Elevate your Turkish applications with VBART today!

VBART is a standout first LLM solution for the Turkish language, designed by leading Turkish engineers.

This model aims to provide accurate and context-aware insights tailored for a broad range of Turkish LLM applications. It is a sequence-to-sequence transformer model pre-trained for 2.7M steps on 135 GBs of clean Turkish data from scratch and exposed to 708B tokens.

It achieves state-of-the-art results in text generation tasks when fine-tuned. It outperforms multilingual models up to 3X sizes while being very compact, efficient and cheap to run.

Model Variants

It comes in two sizes:



VBART-Large: 374M parameters



VBART-XLarge: 740M parameters

Tokenizer &
Representation Power

It uses the SentencePiece Unigram model tokenizer, and its vocabulary size is 32,000. It takes 1.33 tokens to encode a Turkish word on average for the VBART tokenizer.

This number is 3.10 tokens for OpenAI’s BPE tokenizer. Moreover, the vocabulary size is one-third of the OpenAI tokenizer.

Consequently, VBART is 7.28 times more compact than OpenAI models in terms of representation power.

Context Length

The default context length for the encoder and decoder is 2023; however, it can be easily extended for longer sequences.

Framework Support

The model is available for Tensorflow, PyTroch, and Huggingface, and can be fine-tunded using any of these frameworks.

Time to Unlock Success

VBART offers a variety of working models (consultancy, license to use/fine-tune) according to your specific needs. If you're interested in using this model, please reach out to us.

CONTACT US

Reach Out To Us