AI Models
Generative AI has significantly advanced over the last 2 years, introducing models that create text, images, and more with remarkable sophistication. Here’s an overview of a few prominent generative AI models, each with unique capabilities, release dates, and intriguing trivia:
1. GPT-4: Released by OpenAI in March 2023, GPT-4 is a state-of-the-art language model renowned for its deep understanding of context and nuanced language generation. It excels in tasks such as content creation, chatbots, and coding assistance, surpassing its predecessors in scale and versatility. Interestingly, GPT-4 was trained on Microsoft’s Azure AI supercomputers, highlighting the collaboration between OpenAI and Microsoft. A good timeline on the release dates of the ChatGPT models can be found here.
2. DALL-E 3: Announced by OpenAI in September 2023, DALL-E 3 is a text-to-image generation model capable of creating detailed and diverse images from textual descriptions. It enhances prompt fidelity, allowing for the generation of complex visuals without extensive manual input. A notable feature of DALL-E 3 is its integration with ChatGPT, enabling users to generate images directly through conversational prompts. DALL-E has been the premiere image creation LLM, but there are so many to pick from now with a lot having their own specific use case.
3. LLaMA 4: Released by Meta in 2024, LLaMA 4 is an advanced language model trained on an extensive dataset using a vast GPU cluster. It offers open-source accessibility, appealing to startups and researchers seeking control over their models and data. An interesting aspect of LLaMA 4 is its training on a GPU cluster described as “bigger than anything” else, showcasing Meta’s commitment to large-scale AI research. The dataset that the model is trained on is publicly available data.
4. Gemini 2.0: Launched by Google DeepMind in December 2024, Gemini 2.0 is a multimodal large language model with expanded capabilities, including image and audio generation. It integrates advanced AI into autonomous agents, enhancing real-time interactive media environments. A fascinating feature of Gemini 2.0 is its “Deep Research” capability, allowing it to compile information from the web into user-friendly reports, demonstrating its advanced reasoning abilities.
5. Claude 3.5 Sonnet: Introduced by Anthropic in June 2024, Claude 3.5 Sonnet is recognized for its intelligence and conversational abilities. It introduces “artifacts,” enabling real-time content updates during tasks like coding or web design, and emphasizes user privacy. Interestingly, Claude 3.5 Sonnet is described as being twice as fast and five times cheaper to run compared to its predecessor, Claude 3 Opus.
6. Cosmos: Developed by Nvidia and announced in 2024, Cosmos is a family of AI models designed to generate images and 3D models for training humanoid robots and self-driving cars. It enhances robots’ understanding of the physical world, facilitating realistic training scenarios. An intriguing aspect of Cosmos is its application in training AI systems for self-driving cars, contributing to advancements in autonomous vehicle technology.
7. Mixtral: Created by Mistral AI and released in 2024, Mixtral utilizes a Mixture of Experts (MoE) architecture, allowing it to dynamically allocate resources to different tasks. This design enables efficient handling of diverse language generation tasks with improved performance. A notable feature of Mixtral is its open-source nature, providing accessibility to a wide range of developers and researchers.
8. Stable Diffusion XL Base 1.0: An evolution in the Stable Diffusion series, this model offers enhanced image generation capabilities, producing high-quality visuals with greater detail and coherence. It’s widely used in creative industries for art and design. Interestingly, Stable Diffusion XL Base 1.0 has been utilized in various applications, including AI-generated art competitions and design projects. There are many stable diffusion models, and you can try SD 1.5 for free over at nightcafe (while you are there, check out the large number of models available for image creation!).
9. Gen2: A powerful AI art creator released in 2024, Gen2 specializes in generating realistic and diverse images from textual prompts. It supports artists and designers by providing a tool for rapid visual concept development. An interesting aspect of Gen2 is its ability to generate images with minimal input, streamlining the creative process for users.
10. Pangu-Coder2: Developed by Huawei and introduced in 2024, Pangu-Coder2 is a code generation model that assists developers by generating code snippets based on natural language descriptions. It streamlines the software development process, enhancing productivity. A notable feature of Pangu-Coder2 is its support for multiple programming languages, making it versatile for various development needs. You probably haven’t heard of this, and whilst ChatGPT and other models can provide code from a given prompt, Pangu-Coder2 is a model specifically for coding.
11. Deepseek: Developed by Deepseek AI and introduced in 2024, Deepseek is a high-performance language model designed to push the boundaries of text generation, comprehension, and multilingual capabilities. Built to offer strong reasoning and problem-solving abilities, it competes with major players like GPT-4 and Claude 3.5. One of Deepseek’s standout features is its ability to generate detailed, structured responses with minimal hallucination, making it particularly useful for knowledge-intensive tasks. The model has gained traction in research and enterprise applications, particularly in fields requiring high levels of accuracy and contextual awareness. As of writing, Deepseek is a very popular LLM.
What about LLMs and models you can run locally at home? I’m glad you asked. There are a few to pick from, but I use Ollama (as the framework to run LLMs locally) and have quite a few models to pick from that I can run. I will report back on Ollama at a later stage as I’m still trying it out and discovering what it can do, but the features that I found attractive is that it integrates fairly easily into web/desktop apps and mobile platforms, as well as importing models from PyTorch.
This is in no way a complete list of AI models due to the pace at which the AI landscape is changing (and I mean daily!), but it’s a starting list for you to explore and learn about if you haven’t already.
What’s your favourite?
