Meet our new ruDALL-E neural network!

Write a text query and get an image generated by ruDALL-E

Try it out

We recently released new, large ruDALL-E Kandinsky model!🎉🎊🥳 This model is even better at generating beautiful and complex images!

You can already try it in the Salute app if you know Russian. In the app, just say «Позови художника» and then ask the ruDALL-E artist to draw something with your voice 🧑‍🎨

We also have a bot in Discord, where you can not only generate pictures, but also see the generations of other users! However, in the Salut application, the queue for generation is significantly smaller, than in Discord

Bright bedroom with a large bed and large green palm trees around

Goal

Our task was to create a multimodal neural network that studies concepts in several modalities, namely in the verbal and visual forms, in order to build a better understanding of the world. The transformer is taught to autoregressively model text and image tokens as a single data stream.

Application

Image generation solves two important problems that cannot be solved by information retrieval: 1) it allows you to take into account the exact description of what you want, 2) it creates an original image that did not exist before. Image generation can be used, for example, for article illustration, in copywriting or advertising.

The biggest computational challenge in Russian history

On the Christofari cluster, the model was trained for 37 days on 512 TESLA V100 GPUs, and then another 11 days on 128 GPUs - a total of 20352 GPU days. Our largest trained XXL model (12 billion parameters) is comparable to the English DALL-E from OpenAI!

ruDALL-E Malevich (XL)

Based on a short text description, ruDALL-E generates bright and colourful images on a variety of topics and subjects. The model understands a wide range of concepts and generates completely new images and objects that did not exist in the real world.

Training and model parameters:

  • 1.3 billion parameters
  • Image encoder - a custom VQGAN model that converts an image into a sequence of 32×32 characters
  • YTTM text tokenizer with a dictionary of 16,000 tokens
  • Specialized attention masks for visual sequences
  • Support for re-ranking of results by the ruCLIP model
  • Raising resolution support using the RealESRGAN model

Beautiful mountain landscape
Very beautiful dog
Beautiful yellow bird with a red beak

ruDALL-E Kandinsky (XXL)

Russian text-to-image model that generates images from text. The architecture is the same as ruDALL-E XL. Even more parameters in the new version!

Training and model parameters:

  • 12 billion parameters
  • Image encoder - a custom VQGAN model that converts an image into a sequence of 32×32 characters
  • YTTM text tokenizer with a dictionary of 16,384 tokens
  • Specialized attention masks for visual sequences
  • Support for re-ranking of results by the ruCLIP model
  • Upscaling support: RealESRGAN or guided diffusion

Sunflowers in a vase, Vincent van Gogh
Surrealism, style
An anime stylized potato with electrical discharge effects on background of a modern city in neon cybepunk style
Sunset and the city

ruDALL-E Emojich

Based on a short text description, ruDALL-E generates bright and colourful images on a variety of topics and subjects. The model understands a wide range of concepts and generates completely new images and objects that did not exist in the real world.

Try it out

Training and model parameters:

  • ruDALL-E Emojich - ruDALL-E Malevich finetune. For model finetuning 2749 emoji icons and corresponding Russian-language descriptions were collected
  • 1.3 billion parameters
  • Image encoder - a custom VQGAN model that converts an image into a sequence of 32×32 characters
  • YTTM text tokenizer with a dictionary of 16,000 tokens
  • Specialized attention masks for visual sequences
  • Raising resolution support using the RealESRGAN model

Gandalf
Lego Donald Trump