Cloud computing division of Amazon, Amazon Web Services (AWS), has introduced a new family of multimodal AI models named Nova at its annual re:Invent conference.
The announcement, made by AWS CEO Andy Jassy, marks a significant step in generative AI development, with models designed to process text, images, and video.
The Nova lineup includes four text-generating models—Micro, Lite, Pro, and Premier—each offering distinct capabilities. Micro is optimized for speed, handling text-only tasks with minimal latency, while Lite and Pro support multimodal inputs like text, images, and video, with Pro balancing speed, accuracy, and cost. Meanwhile Premier, set for release in early 2025, is designed for advanced workloads, including model customization and fine-tuning.
These models support 15 languages and can process large datasets, with token limits ranging from 128,000 to 300,000 and plans to expand to over 2 million tokens in 2025.
AWS also launched Nova Canvas and Nova Reel, tools aimed at generative media.
Nova Canvas allows users to create and edit images through text prompts, offering features like background removal and style adjustments while Nova Reel generates short videos of up to six seconds from text prompts or reference images, with future updates expected to extend video lengths to two minutes.
Both tools include controls for responsible use, such as watermarking and content moderation, to limit the creation of harmful or inappropriate material.
Jassy emphasized the models’ efficiency and cost-effectiveness, highlighting their integration with AWS Bedrock for seamless application development.
Looking ahead, AWS is working on a speech-to-speech model for release in early 2025 and an “any-to-any” model, capable of converting inputs like text, speech, images, and video into any other format, by mid-2025.
These developments, according to Jassy, represent the future of frontier AI technology.
