Amazon unveils Amazon Nova, a new generation of foundation models (FMs), as the next step in its AI journey. With the ability to process text, image, and video as prompts, customers can leverage Amazon Nova-powered generative AI applications to understand videos, charts, and documents, or create videos and other multimedia content.
“Inside Amazon, we have about 1,000 Gen AI applications in motion, and we’ve had a bird’s-eye view of what application builders are still grappling with,” said Rohit Prasad, SVP of Amazon Artificial General Intelligence. “Our new Amazon Nova models are intended to help with these challenges for internal and external builders, and provide compelling intelligence and content generation while also delivering meaningful progress on latency, cost-effectiveness, customisation, information grounding, and agentic capabilities.”
All Amazon Nova models are efficient, cost-effective, and designed for easy integration with customer systems and data. They support a variety of tasks across 200 languages and multiple modalities. Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro are at least 75% less expensive than the best performing models in their respective intelligence classes in Amazon Bedrock. They are also the fastest models in their respective intelligence classes in Amazon Bedrock.
Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro are generally available from today, while Amazon Nova Premier will be available in the Q1 2025 timeframe. The foundation models mark Amazon's bold move to establish itself in the generative AI space, directly competing with rivals like Adobe, Meta, and OpenAI.
The models are integrated with Amazon Bedrock, a fully managed service that provides access to high-performing foundation models from leading AI companies and Amazon through a single API. With Amazon Bedrock, customers can easily experiment with and evaluate Amazon Nova models, as well as other foundation models, to find the best option for their applications.
In 2025, two additional Amazon Nova models will be introduced: a speech-to-speech model and a native multimodal-to-multimodal, or “any-to-any,” model. The speech-to-speech model will understand streaming speech input in natural language, interpreting verbal and nonverbal cues like tone and cadence to deliver natural human-like interactions. The any-to-any model will handle text, images, audio, and video as both input and output. It will simplify application development by allowing the same model to perform a variety of tasks, such as translating content across modalities, editing content, and powering AI agents capable of understanding and generating all modalities.