Home Blog Page 3

What is DSPy? Will it Challenge LLM Frameworks

DSPy, now stands for 𝗗eclarative 𝗦elf-improving Language 𝗣rograms (in p𝘆thon), according to Omar Khattab, author of DSPy. DSPy is a framework developed by StanfordNLP for algorithmically optimizing language model (LM) prompts and weights, particularly when LMs are used multiple times within a pipeline. It helps in separating the flow of a program from the parameters, such as prompt instructions, few-shot examples, and LM weights.

This is helpful since this separation simplifies the process of using language models to build a complex system by eliminating the need to manually tweak prompts and finetune LMs, which can be hard and messy. DSPy abstracts LM pipelines as text transformation graphs, allowing for the automatic optimization of prompt structures to solve specific problems. It also provides a clean class-based representation of workflows and a way to solve for the best prompt structure, promising to eliminate tedious prompt engineering. Essentially, DSPy aims to streamline the use of LMs in complex systems by automating the optimization of prompt structures and finetuning steps, thereby reducing the manual effort and complexity involved in using LMs within a pipeline.

DSPy Key Features

DSPy is a framework for optimizing large language model (LM) prompts and weights, especially in complex pipelines. Its key features include:

  1. Separation of program flow and parameters: DSPy separates the flow of the program (modules) from the parameters (LM prompts and weights) of each step, making it easier to optimize and modify the system.
  2. LM-driven optimizers: DSPy introduces new optimizers that can tune the prompts and/or the weights of LM calls to maximize a given metric. These optimizers are LM-driven algorithms that generate effective prompts and weight updates for each LM in the pipeline.
  3. Improved reliability and performance: DSPy can teach powerful models like GPT-3.5 or GPT-4 to be more reliable and avoid specific failure patterns. It can also improve the performance of local models like T5-base or Llama2-13b.
  4. Systematic approach: DSPy provides a more systematic approach to solving hard tasks with LMs, reducing the need for manual prompting and one-off synthetic data generators.
  5. General-purpose modules: DSPy provides general-purpose modules like ChainOfThought and ReAct, which replace string-based prompting tricks and make it easier to build complex systems with LMs.
  6. Compilation process: DSPy compiles the same program into different instructions, few-shot prompts, and/or weight updates for each LM, allowing for more effective and efficient use of LMs in the pipeline.

How does DSPy work?

The DSPy framework, as described in the provided document, works by integrating LM Assertions as a programming construct for expressing computational constraints that language models (LMs) should satisfy. These constraints ensure that the LM pipeline’s behavior aligns with specified invariants or guidelines, enhancing the reliability, predictability, and correctness of the pipeline’s output. The LM Assertions are categorized into two well-defined programming constructs, namely Assertions and Suggestions, denoted by the constructs Assert and Suggest. They enforce constraints and guide an LM pipeline’s execution flow. The Assert construct offers a sophisticated retry mechanism, while supporting a number of other new optimizations. On an Assert failing, the pipeline transitions to a special retry state, allowing it to reattempt a failing LM call while being aware of its previous attempts and the error message raised. If, after a maximum number of self-refinement attempts, the assertion still fails, the pipeline transitions to an error state and raises an AssertionError, terminating the pipeline.

Essentially, it helps make language models more reliable and predictable by adding a new programming construct called LM Assertions. These assertions allow you to specify rules or guidelines that the LM should follow when generating output.There are two types of assertions: Assert and Suggest. The Assert construct enforces a strict rule that the LM must follow, while the Suggest construct provides a guideline that the LM should try to follow. If an Assert fails, the LM will try to fix the error and retry the failed call, up to a maximum number of times. If it still fails after the maximum number of attempts, an error is raised and the pipeline is terminated.This retry mechanism and other optimizations make it easier to build complex LM pipelines that produce reliable and correct output. By using LM Assertions, you can ensure that your LM pipeline behaves as expected and avoid common failure patterns.

Advantages of using DSPy

  1. Improved reliability and predictability: By specifying constraints and guidelines for the LM pipeline, you can ensure that the output is reliable and predictable, even in complex scenarios.
  2. Enhanced correctness: LM Assertions help ensure that the LM pipeline’s output is correct and aligns with the specified invariants or guidelines.

Also note that this is not a direct competitor to Langchain, as a matter of fact the two could actually be used together.

Examples and Use Cases

DSPy isn’t just another LLM framework; it’s a potential game-changer for agent development. Unlike pre-defined workflows in tools like Langchain, DSPy lets you programmatically guide LLMs with declarative modules. No more hand-crafted prompts – build agents who reason, retrieve information, and learn through composed modules like ChainOfThought and ReAct.

This opens doors to agents who answer your questions with clear steps, summarize complex topics with external knowledge, and even engage in creative content generation with defined styles. While both DSPy and Langchain aim to empower LLMs, DSPy’s focus on programmability and learning gives you unmatched control and interpretability. It’s akin to building modular robots instead of pre-programmed machines, opening a new chapter in the evolution of intelligent agents. Note that a lot of this is still in the early days and are constantly having changes and updates.

Getting Started with DSPy

Here are some resources to get you started with DSPy. In another blog post, we’ll discuss and walk through setting up DSPy for a beginner.

Official Documentation and Tutorials:

Installation:

  • Follow the installation instructions based on your environment (Python, Google Colab) on the official website.

Additional Resources:

Tips:

  • Start with the tutorials to get a basic understanding of DSPy’s concepts and workflow.
  • Explore the community projects for inspiration and learn from others’ implementations.
  • Don’t hesitate to experiment and try different modules and functionalities.
  • Join the DSPy community/discord forum or discussions to ask questions and connect with other users.

Remember, DSPy is an actively developed framework, so stay updated with the latest documentation and releases. Most importantly, have fun and explore the possibilities of programming LLMs with DSPy.

What is LangGraph?

large language models (LLMs) that maintain state, and it is built upon LangChain with the intention of being used in conjunction with it.

LangGraph expands the capabilities of the LangChain Expression Language by enabling the coordination of multiple chains or actors across multiple computational steps in a cyclical manner. This design is influenced by Pregel and Apache Beam. The current interface is modeled after NetworkX.The primary function of LangGraph is to introduce cycles into your LLM application. It is essential to note that this is NOT a directed acyclic graph (DAG) framework. If you wish to create a DAG, you should utilize the LangChain Expression Language directly. Cyclical structures are vital for agent-like behaviors, as they allow you to repeatedly call an LLM in a loop and request its next action.

How it works

Concept of stateful, multi-actor applications and how LangGraph enables their creation using LLMs

LangGraph is a library that enables the creation of stateful, multi-actor applications with LLMs (LangModel Models) using LangChain. It extends the LangChain Expression Language, allowing the coordination of multiple chains or actors across multiple steps of computation in a cyclic manner. This is particularly useful for building agent-like behaviors, where an LLM is called in a loop to determine the next action.

The concept of stateful, multi-actor applications is central to LangGraph. It allows the creation of applications where multiple actors (representing different components or entities) maintain their state and interact with each other in a coordinated manner. This is achieved by defining a StatefulGraph, which is parameterized by a state object that is passed around to each node. Each node then returns operations to update that state. These operations can either set specific attributes on the state or add to the existing attributes. The main type of graph in LangGraph is the StatefulGraph, which facilitates the management of state within the application.

Essentially, LangGraph enables the creation of stateful, multi-actor applications by providing a framework for coordinating multiple actors and managing their state using LLMs and LangChain.

LangGraph vs directed acyclic graph (DAG).

The main difference between LangGraph and a directed acyclic graph (DAG) is that LangGraph allows for cycles, while a DAG does not. A DAG is a directed graph with no directed cycles, meaning it is impossible to start at a vertex and follow the edges in such a way that eventually loops back to the same vertex. On the other hand, LangGraph specifically provides the ability to create cyclic behavior, allowing for repeated actions and interactions between actors in the graph

LangGraph Use Cases

Some examples of applications that can benefit from LangGraph include:

  1. Agent-like Behaviors: LangGraph is useful for applications that require agent-like behaviors, where an LLM is called in a loop, asking it what action to take next. This can be applied in chatbots, conversational agents, or any system where an agent needs to make sequential decisions based on the state of the conversation or environment. CrewAI is building something similiar to this using LangChain.
  2. Coordinating Multiple Chains or Actors: LangGraph extends the LangChain Expression Language with the ability to coordinate multiple chains or actors across multiple steps of computation in a cyclic manner. This feature is beneficial for applications that involve coordinating and managing multiple interconnected processes or actors.
  3. Web-Enabled Agents: WebVoyager, built with LangGraph is a new kind of web-browsing agent using multi-model AI.
  4. Stateful Applications: Applications that need to maintain and update a state as they progress, such as task-oriented dialogue systems, can benefit from the stateful nature of LangGraph.
  5. Custom Tool Integration: LangGraph allows the integration of custom tools, making it suitable for applications that require the use of diverse external tools and services in their decision-making processes.

LangGraph is beneficial for applications that require agent-like behaviors, coordination of multiple actors, cyclic behavior, stateful processing, and integration of custom tools. It is particularly well-suited for building complex, interactive, and stateful language-based applications.

Compare and Contrast LangGraph

Would be interesting to see how it compares to Llamaindex, PyTorch. LLM Frameworks in general seem to get a lot of flak for over-complicating things. DSPy has also been gaining popularity. DSPy, is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used multiple times within a pipeline. DSPy separates the flow of the program from the parameters of each step, allowing for more systematic and powerful optimization of LM prompts and weights. DSPy also introduces new optimizers that are LM-driven algorithms that can tune the prompts and/or weights of LM calls to maximize a given metric.

Final Thoughts

LangGraph shows promise as a valuable addition to the growing ecosystem of LangChain. With its ability to enable the creation of stateful, multi-actor applications using LLMs and LangChain, LangGraph opens up new possibilities for building complex and interactive language-based systems.

The future of agents looks promising, as they are expected to have massive use cases. While agents in the past may have been ineffective and token-consuming, advancements in technologies like LangGraph can help address these challenges. By allowing for more advanced agent runtimes from academia, such as LLM Compiler and plan-and-solve, LangChain aims to enhance the effectiveness and efficiency of agents.

Stateful tools are also on the horizon for LangChain, which would enable tools to modify the state of applications. This capability would further enhance the flexibility and adaptability of stateful, multi-actor applications, enabling them to better respond to dynamic environments.

Moreover, LangChain is actively exploring the integration of more controlled human-in-the-loop workflows. This would provide opportunities for human involvement and guidance in decision-making processes, augmenting the capabilities of automated systems.

In the future, LangGraph and LangChain are expected to continue evolving and growing, offering more advanced features and expanding their capabilities. This opens up exciting research directions and potential applications that could benefit from advancements in LangGraph and similar technologies.

Overall, LangGraph’s potential to improve agent performance, enable stateful applications, and support multi-agent workflows positions it as a promising tool in the realm of language-based systems. As the LangChain ecosystem continues to thrive and innovate, we can anticipate even more exciting developments in the near future.

Alibaba Releases Qwen 1.5

Alibaba, the world’s largest e-commerce giant in China, has released Qwen 1.5, a groundbreaking language model that has been making waves in the AI community. Developed in-house by Alibaba’s AI lab, Qwen 1.5 is the latest in line of innovative models. Back in November Alibaba released version 1 of Qwen 72B. This release includes several models, including their largest open source model, the 72B chat, which has surpassed the performance of other state-of-the-art models such as Claude 2.1 and GPT 3.5 on both MT-Bench and Alpaca-Eval v2. With a total of 6 models, Qwen 1.5 is capable of processing a 32K context length, making it a versatile and powerful tool for a wide range of applications.

Benchmarks & Performance

When it comes to benchmarks and Qwen 1.5 truly shines. In particular, the Qwen 1.5-7B model has shown impressive results in tool-use, outperforming the Mistral-7B model. This achievement highlights the robust capabilities of Qwen 1.5 in tasks requiring specialized knowledge and application.

The largest model in the Qwen 1.5 lineup, the 72B chat, delivers performance that is comparable to that of GPT-4, a highly advanced language model. This demonstrates the immense power and potential of Qwen 1.5 in leveraging artificial intelligence for complex language processing tasks.

With overall strong metrics across its different models, Qwen 1.5 offers users a reliable and efficient solution for a wide range of applications. Its impressive performance in various benchmarks showcases Alibaba’s commitment to pushing the boundaries of AI technology and delivering cutting-edge solutions to the e-commerce industry and beyond.

Closing Thoughts

In closing, Qwen 1.5 has demonstrated its remarkable capabilities and performance, particularly with its 72B model. This powerful language model exhibits performance that is comparable to, and even surpasses, Mistral-medium. This comparison serves as an encouragement for Mistral to release their proper mistral-medium model instead of relying on leaked Miqu weights. By doing so, it opens up the opportunity for further fine-tuning and improvement.

It’s worth noting that Qwen 1.5 has already paved the way for the development of a flagship LLM series called Quyen. This highlights the immense potential and impact of Qwen 1.5 in driving innovation and progress in the field of AI and language processing.

As we embrace the advancements brought forth by Qwen 1.5, we can anticipate further breakthroughs and discoveries that will shape the future of AI and its applications in various industries. Alibaba’s commitment to pushing the boundaries of AI technology is evident in the development and release of Qwen 1.5, ultimately driving progress and innovation in the e-commerce industry and beyond.

Nomic AI Releases Embeddings, A truly Open Source Embedding Model

Nomic embeded-text-v1 is the newest SOTA long-context embedding model. Tired of drowning in unstructured data – text documents, images, audio, you name it – that your current tools just can’t handle? Welcome to the open seas of understanding, where Nomic AI’s Embeddings act as your life raft, transforming this chaos into a treasure trove of insights.

Forget rigid spreadsheets and clunky interfaces. Nomic Atlas, the platform redefining how we interact with information, empowers you to explore, analyze, and structure massive datasets with unprecedented ease. But what truly sets Nomic apart is its commitment to openness and accessibility. That’s where Embeddings, their latest offering, comes in.

Embeddings are the secret sauce, the vector representations that unlock the meaning within your data. Imagine each data point as a ship on a vast, trackless ocean. Embeddings act as lighthouses, guiding you towards similar data, revealing hidden connections, and making sense of the seemingly incoherent.

And the best part? Nomic’s Embeddings are truly open source, meaning they’re free to use, modify, and share. This transparency fosters collaboration and innovation, putting the power of AI-powered analysis directly in your hands.

The Struggle with Unstructured Data

AI loves structured data. Imagine feeding spaghetti to a baby – that’s like throwing unstructured data at AI. Text documents, images, videos – a tangled mess AI struggles to digest. It craves the neat rows and columns of structured data, the spreadsheets and databases where information sits organized and labeled. Nomic open source AI’s Embeddings are transforming that spaghetti into bite-sized insights, ready for AI and unlock the hidden potential within your data.

Understanding Embeddings

Where Embedding Can Help

Embedding models have the potential to assist companies and developers in several key ways:

  • Handling Long-Form Content: Many organizations have vast troves of long-form content in research papers, reports, articles, and other documents. Embedding models can help make this content more findable and usable. By embedding these documents, the models can enable more semantic search and retrieval, allowing users to find relevant content even if the exact search keywords don’t appear in a document.
  • Auditing Model Behavior: As AI and machine learning models permeate more sensitive and critical applications, explainability and auditability become crucial. Embedding models can assist by providing a meaningful vector space that developers can analyze to better understand model behavior. By examining how certain inputs map to vector spaces, developers can gain insight into how models handle different data points.
  • Enhancing NLP Capabilities: Embedding models serve as a foundational layer that enhances many other natural language processing capabilities. By structuring language in vector spaces, embedding enables better performance downstream on tasks like sentiment analysis, topic modeling, text generation, and more. Embedding essentially extracts more understanding from text.

Embedding models empower more semantic search and retrieval, auditable model behaviors, and impactful NLP capabilities. Companies need embedders to help structure and exploit long-form content. And developers need embedding to infuse AI transparency and interpretability into sensitive applications. The vector spaces embedding provides for language are critical for many modern NLP breakthroughs.

Nomic AI’s Training Details

Nomic AI’s Embeddings boast impressive performance, and understanding their training process sheds light on this achievement. Instead of relying on a single training stage, Nomic employs a multi-stage pipeline, meticulously crafted to extract the most meaning from various sources.

Imagine baking a delicious cake. Each ingredient plays a specific role, and their careful combination creates the final masterpiece. Similarly, Nomic’s pipeline uses different “ingredients” in each stage:

Stage 1: Unsupervised Contrastive Learning:

  • Think of this as building the cake’s foundation. Nomic starts with a large, pre-trained BERT model. Think of BERT as a skilled baker with a repertoire of techniques.
  • Next, they feed BERT a unique dataset of weakly related text pairs. This might include question-answer pairs from forums like StackExchange, reviews with titles and bodies, or news articles with summaries. These pairings help BERT grasp semantic relationships between different types of text.
  • Think of this stage as BERT learning the basic grammar and flavor profiles of different ingredients.

Stage 2: Finetuning with High-Quality Labeled Data:

  • Now, the cake gets its delicious details! Here, Nomic introduces high-quality labeled datasets, like search queries and corresponding answers. These act like precise instructions for the baker, ensuring the cake isn’t just structurally sound but also flavorful.
  • A crucial step in this stage is data curation and hard-example mining. This involves selecting the most informative data points and identifying challenging examples that push BERT’s learning further. Think of this as the baker carefully choosing the freshest ingredients and mastering complex techniques.

This two-stage approach allows Nomic’s Embeddings to benefit from both the broad knowledge base of the pre-trained BERT model and the targeted guidance of high-quality labeled data. The result? Embeddings that capture rich semantic meaning and excel at various tasks, empowering you to unlock the true potential of your unstructured data.

Conclusion

Nomic AI’s Embeddings offer a compelling proposition: powerful performance, unparalleled transparency, and seamless integration. By reportedly surpassing OpenAI’s text-embedding-3-small model and sharing their entire training recipe openly, Nomic empowers anyone to build and understand state-of-the-art embeddings. This democratization of knowledge fosters collaboration and innovation, pushing the boundaries of what’s possible with unstructured data.

There is also seamless integration with popular LLM frameworks like Langchain and Llamaindex makes Nomic Embeddings instantly accessible to developers working on advanced search and summarization tasks. This translates to more efficient data exploration, uncovering hidden connections, and ultimately, deriving deeper insights from your information ocean.

So, whether you’re a seasoned data scientist or just starting your AI journey, Nomic Embeddings are an invitation to dive deeper. With their open-source nature, powerful performance, and seamless integration, they unlock a world of possibilities, empowering you to transform your unstructured data into a gold mine of insights.

Meta Open Sources Code Llama 70B

CodeL lama 70B, the latest and most powerful iteration of our open-source language model for code generation. That’s right, we’re not only pushing the boundaries of AI-powered coding, but making it freely accessible. With improved performance over previous iterations, Code Llama 70B is now available under the same permissive license as prior Code Llama releases.

Notably, Code Llama 70B achieves over 67.8% on the HumanEval benchmark, reaching performance on par with GPT-4.

This isn’t just a bigger engine under the hood – it’s a leap forward in code-wrangling capabilities. Code Llama 70B boasts significant performance improvements on key benchmarks, meaning you can say goodbye to tedious boilerplate and hello to lightning-fast generation, smarter autocompletion, and even tackling diverse programming languages.

Code LLama 70B comes in three distinct versions:

  • CodeLlama-70B: Your all-around powerhouse for general code generation across multiple languages.
  • CodeLlama-70B-Python: Tailored for the Python-specific tasks.
  • CodeLlama-70B-Instruct: Fine-tuned instruct version.

So, ready to unleash the Code Llama 70B in your projects? Buckle up, grab your access key, and prepare to experience the future of coding, where the only limit is your imagination. Dive deeper in the following sections to explore the model’s capabilities, access instructions, and see how Code Llama 70B can turbocharge your workflow. The future of code is open, and it’s here to stay.

Get ready to code smarter, not harder, with Code Llama 70B.

Decoding Code Llama 70B: Under the Hood

Let’s examine the engine driving Code Llama 70B. This section dives into the technical details, giving you a peek at the brains behind the magic.

Model Core:

  • Parameter Powerhouse: This version boasts a whopping 70 billion parameters, allowing it to process and generate complex code structures with stunning accuracy. Reportedly trained on 500B tokens.
  • Input/Output Dance: It takes plain text as input, weaving its insights into eloquent lines of code as output. Think of it as your AI translator for turning natural language into programming magic. Code llama 70B has a 16384 context window.
  • Transformer Tango: The model’s architecture leans on the optimized transformer approach, a well-established technique in the LLM world known for its flexibility and power.

Fine-Tuning for Focus:

  • Scale Matters: Code Llama 70B was fine-tuned on massive datasets with up to 16,000 tokens, ensuring it’s familiar with diverse coding structures and styles.
  • Supersized Inference: During inference, it can handle even larger chunks of code, up to 100,000 tokens, making it ideal for tackling complex projects and intricate problems.
  • Model Dates Code Llama and its variants have been trained between January 2023 and January 2024.

A Look Ahead:

  • Static Stability: This is a static model, trained on a carefully curated dataset. This ensures consistency and predictability in its outputs.
  • Community-Driven Evolution: Future versions of Code Llama – Instruct will focus on enhancing safety and refining its abilities. We’ll be actively incorporating community feedback into this process, so stay tuned for even more exciting developments!

Testing the model

Thanks to Perplexity, we can test the open source model through the Perplexity Labs feature. Here I wanted to ask Code Llama a random hard Leetcode question, one that hopefully is not in its trainnig data. I asked it question 2179. Count Good Triplets in an Array. Here is the prompt I gave it:

You are given two 0-indexed arrays nums1 and nums2 of length n, both of which are permutations of [0, 1, ..., n - 1].

A good triplet is a set of 3 distinct values which are present in increasing order by position both in nums1 and nums2. In other words, if we consider pos1v as the index of the value v in nums1 and pos2v as the index of the value v in nums2, then a good triplet will be a set (x, y, z) where 0 <= x, y, z <= n - 1, such that pos1x < pos1y < pos1z and pos2x < pos2y < pos2z.

class Solution:
    def goodTriplets(self, nums1: List[int], nums2: List[int]) -> int:

Notice I gave the prompt the main part of the question along with the start code, I did not mention the Leetcode number or question.

This the code result:

class Solution:
    def goodTriplets(self, nums1: List[int], nums2: List[int]) -> int:
        n = len(nums1)
        count = 0
        for i in range(n):
            for j in range(i + 1, n):
                for k in range(j + 1, n):
                    if nums1[i] < nums1[j] < nums1[k] and nums2[i] < nums2[j] < nums2[k]:
                        count += 1
        return count

The code compiled, but it did not pass all the tests.. I then gave it a follow up question, “try again, think step by step”. It still gave the wrong answer. I then gave the same prompt to Mistral Medium and returned this.

class Solution:
    def goodTriplets(self, nums1: List[int], nums2: List[int]) -> int:
        n = len(nums1)
        # Create two arrays to store the position of each element in nums1 and nums2
        pos1 = [0] * n
        pos2 = [0] * n
        for i in range(n):
            pos1[nums1[i]] = i
            pos2[nums2[i]] = i

        # Initialize the count of good triplets to 0
        count = 0
        # Iterate through all possible triplets
        for i in range(n):
            for j in range(i+1, n):
                for k in range(j+1, n):
                    # Check if the triplet is good
                    if pos1[i] < pos1[j] < pos1[k] and pos2[i] < pos2[j] < pos2[k]:
                        count += 1
        return count

Still, neither got the question right,, maybe with better prompting or showing examples might have helped. But let’s also note that Leetcode isn’t the best way to determine a language model’s coding abilities.

Code LLama 70B licensing

Code Llama 70B is free and open source as well as available for commercial use.

Nous-Hermes-2 Mixtral 8x7B: New Flagship LLM

Nous Research has just unveiled its latest and most impressive creation to date—the Nous-Hermes-2 Mixtral 8x7B. This groundbreaking flagship Large Language Model (LLM) represents a significant leap forward, being the company’s first model to be fine-tuned using Reinforcement Learning from Human Feedback (RLHF). It’s also the first to surpass the renowned Mixtral Instruct across a wide array of popular benchmarks, setting a new standard for AI performance.

Today marks the release of two distinct configurations of Nous-Hermes-2: the SFT (Supervised Fine-Tuning) only model and the enhanced SFT+DPO (Decentralized Policy Optimization) model, alongside a qlora adapter designed specifically for the DPO variant. Both models are now available to the public via HuggingFace, offering users the unique opportunity to test and determine the best fit for their specific applications.

Advancements in Nous-Hermes-2

Benchmarks

Here’s how it compares to Mixtral Instruct.

From Twitter, an example of the model writing code for data visualization:

Model Configurations

The Mixtral 8x7B model, Nous-Hermes-2, comes in two variants: SFT+DPO and SFT-Only. The SFT only model refers to the model with only the Sparse Fine-Tuning (SFT) technique applied, while the SFT+DPO model includes both the Sparse Fine-Tuning and the Data Parallelism Optimization (DPO) techniques. These two configurations allow users to choose between the SFT only or the combined SFT+DPO model based on their specific requirements and performance preferences

They also we released a QLoRA Adapter that can be attached or merged to any Mixtral Based model to potentially get the benefits of our DPO Training on other Mixtral Finetunes maybe even he base model. This likely means you can potentially improve the performance of fine-tuning other Mixtral models by adding the QLoRA Adapter, even if you’re not using the SFT+DPO variant of the Mixtral 8x7B model.

Conclusion

The advent of Nous-Hermes-2 Mixtral 8x7B marks a milestone in the progress of open-source AI, illustrating the rapid advancements being made each day. This significant release from the Nous team not only meets but surpasses the capabilities of the best open-source model on the market. With its superior performance in 10-shot MMLU, it sets a new bar for the industry, and while showcasing 5-shot MMLU would have been a valuable addition, the current achievements are no less impressive. In my experience, the DPO version seems better.

The model’s use of ChatML as the prompt format and the integration of system prompts for steerability highlight the forward-thinking approach of Nous Research. This not only enhances the model’s versatility but also makes it incredibly user-friendly. The seamless transition for developers and researchers currently using OpenAI APIs to Nous-Hermes-2 is a testament to the thoughtful engineering and user-centric design of the new model.

It’s clear that the gap between proprietary and open-source AI solutions is narrowing with each passing day. The Nous team’s commitment to innovation and openness is not just commendable but a driving force in the democratization of AI technology. Users across the globe can now harness the power of cutting-edge language models, thanks to the relentless efforts of researchers and developers in pushing the boundaries and expanding what’s possible in the realm of AI. With Nous-Hermes-2, the future of open-source AI looks brighter than ever.

Mixtral 8x7B outperforms or matches Llama 2 70B and GPT-3.5 across various benchmarks.

In the ever-evolving landscape of natural language processing, the pursuit of more powerful and versatile language models has led to remarkable breakthroughs. Among these, Mixtral 8x7B stands tall as a Sparse Mixture of Experts (SMoE) language model, showcasing a paradigm shift in performance and efficiency. This cutting-edge model, built upon the foundation of Mistral 7B, introduces a novel architecture with eight feedforward blocks (experts) per layer, revolutionizing the way tokens are processed.

With a keen focus on optimizing parameter usage, Mixtral 8x7B provides each token access to an impressive 47 billion parameters, all while utilizing a mere 13 billion active parameters during inference. Its unique approach, where a router network dynamically selects two experts for each token at every layer, allows for unparalleled adaptability and responsiveness.

Under the Hood: Mixtral 8x7B Architecture

Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model that has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference.

Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. The model architecture parameters are summarized in Table 1, and a comparison of Mixtral with Llama is provided in Table 2. Mixtral outperforms or matches Llama 2 70B performance on almost all popular benchmarks while using 5x fewer active parameters during inference.

Here’s why Mixtral is special:

  • It’s very good at different tasks like math, coding, and languages.
  • It uses less power than other similar models because it doesn’t have to use all its experts all the time.
  • This makes it faster and more efficient.

Think of it like this:

  • You need to solve a math problem and a coding problem.
  • Mixtral picks the math expert for the math problem and the coding expert for the coding problem.
  • They both work on their tasks and give you the answers, but you only talk to them one at a time.
  • Even though you don’t see all 8 experts all the time, they’re all ready to help if needed.

Benchmark Performances

The benchmark performances of the Mixtral 8x7B model, a Sparse Mixture of Experts (SMoE) language model, are compared to Llama 2 70B and GPT-3.5 across various tasks. Mixtral outperforms or matches Llama 2 70B and GPT-3.5 on most benchmarks, particularly in mathematics, code generation, and multilingual understanding.

It uses a subset of its parameters for every token, allowing for faster inference speed at low batch-sizes and higher throughput at large batch-sizes. Mixtral’s performance is reported on tasks such as commonsense reasoning, world knowledge, reading comprehension, math, and code generation. It is observed that Mixtral largely outperforms Llama 2 70B on all benchmarks, except on reading comprehension benchmarks, while using 5x fewer active parameters. Detailed results for Mixtral, Mistral 7B, Llama 2 7B/13B/70B, and Llama 1 34B2 are provided, showing that Mixtral outperforms or matches Llama 2 70B performance on almost all popular benchmarks while using 5x fewer active parameters during inference.

Impressive Retrieval

The retrieval accuracy of the Mixtral model is reported to be 100% regardless of the context length or the position of the information in the sequence. The model is able to successfully retrieve information from its context window of 32k tokens, regardless of the sequence length and the location of the information in the sequence.

Licensing and Open Source Community

Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model that is licensed under the Apache 2.0 license, making it free for academic and commercial usage, ensuring broad accessibility and potential for diverse applications. The model is released with open weights, allowing the community to run Mixtral with a fully open-source stack. The French startup recently raised $415M in venture funding and has one of the fastest-growing open-source communities.

It’s worth noting that the details regarding the data used for pre-training and the specific loss function employed are conspicuously absent from the available information. This omission leaves a gap in our understanding of the model’s training process. There is no mention of whether any additional loss for load balancing is being utilized, which could provide valuable insights into the model’s optimization strategy and robustness. Despite this gap, the outlined architectural and performance characteristics of Mixtral 8x7B offer a compelling glimpse into its capabilities and potential impact on the field of natural language processing.

Microsoft Fully Open Sources Phi-2

Microsoft has announced that Phi-2, its highly regarded Transformer model, will now be completely open source under the MIT License. This is a groundbreaking development that promises to usher in a new era of innovation and exploration within the field.

What is Phi-2?

Phi-2 is a state-of-the-art Transformer model boasting a whopping 2.7 billion parameters. It’s built for handling a variety of NLP tasks and was trained with an extensive dataset comprising 250 billion tokens, sourced from a combination of NLP synthetic data and carefully filtered web data.

Key Features of Phi-2:

  • Transformer Model: Phi-2 operates on the transformer architecture, renowned for its effectiveness in processing sequential data and powering major advancements in natural language processing. Despite having only 2.7 billion parameters, Phi-2 has demonstrated strong performance on various benchmarks, often surpassing larger models. This suggests that it might offer a good balance of performance and efficiency.
  • Massive Dataset: Phi-2 was trained on a massive dataset of 250 billion tokens, which includes both synthetic and real-world data. This diversity of data helps the model learn a broader range of language patterns and styles.
  • **QA, Chat, and Code**: Specifically designed to perform well with QA formats, chat formats, and code generation, Phi-2 is versatile in its application.
  • Research-Oriented: The model has not been fine-tuned with reinforcement learning from human feedback, positioning it as an ideal candidate for pure research purposes.

A Leap Towards Open Innovation

The recent shift to an MIT License for Phi-2 signifies a momentous occasion for developers, researchers, and hobbyists alike. Open-source licensing removes barriers to access, allowing for greater collaboration and transparency in research and development efforts.

What the MIT License Means for Phi-2:

  • Unrestricted Access: Developers can use, modify, and distribute the model with fewer legal implications, fostering an environment of open innovation.
  • Community Contributions: The open-source community can now contribute to Phi-2’s development, potentially accelerating improvements and enhancements.
  • Wider Adoption: With fewer restrictions, Phi-2 could see increased utilization across various projects and domains, leading to a better understanding of its capabilities and limitations.

Outperforming the Competitors

In my weeks of exploration, it’s become evident that Phi-2 stands out among its peers. Compared with smaller models like the Gemini Nano 2, Phi-2 has shown superior performance on common benchmarks such as MMLU (MultiModal Language Understanding) and BBH (Beyond the Benchmark Hub).

As the AI community starts to leverage the now open-sourced Phi-2, the potential to bridge the performance gap with larger models on complex tasks and reasoning becomes more tangible. The added MIT License is set to catalyze innovation, paving the way for new breakthroughs in the utility and efficiency of AI models like Phi-2.

Conclusion: A New Chapter for AI Research

The decision by Microsoft to fully open source Phi-2 under the MIT License marks a pivotal point in AI research. By lowering the barriers to entry, Microsoft is not only promoting transparency but also empowering a broad range of researchers and developers to contribute to the advancement of AI.

Stay tuned, as I continue to delve into Phi-2’s capabilities and prepare to release an extensive guide that will complement our series of publications. The future of AI research has never looked brighter, and with tools like Phi-2 readily available, the possibilities are endless. Join us in exploring this remarkable model and become a part of the next wave of AI innovation!

Midjourney releases v6

0

The world of AI art generation takes a leap forward with Midjourney’s latest release. Version 6 of this popular tool provides creators with greater control, detail, and creativity. In this groundbreaking update, Midjourney empowers users with longer prompt lengths, finer control over elements like color and shading, the ability to incorporate text, and more conversational fine-tuning.

v6 represents a major milestone for Midjourney as it aims to stay ahead of stiff competition from the likes of DALL-E 3 and other AI image generators. While these alternatives offer impressive features, Midjourney’s focus remains on artistic quality and user experience. This update even allows Midjourney to comprehend nuanced differences in punctuation and grammar to render prompts more accurately.

v6 gives creators the improved tools they need to bring their imaginative visions to life. With enhanced understanding of prompts and an expanded set of artistic capabilities, the possibilities are brighter than ever for Midjourney users to push boundaries in AI-assisted art. We can’t wait to see the beautiful, weird, and wonderful images this latest innovation inspires.

Midjourney takes leap forward with latest release

MidJourney has taken a significant leap forward with its latest release, version 6. This new release includes several notable improvements, such as a longer prompt length, more granular control over color and shading, the ability to add text to images, and the capability to fine-tune the output through a conversation with the AI. One of the most striking updates is the AI’s improved understanding of prompts, including nuances in punctuation and grammar. Additionally, MidJourney v6 is available through Discord, and access to a web version is being opened for users who have generated more than 10,000 pictures. The images generated by MidJourney v6 exhibit greater detail and realism compared to the previous version, showcasing the significant progress made in image generation capabilities.The latest release of MidJourney, version 6, brings several advancements, including:

  • Longer prompt length
  • More granular control over color and shading
  • Ability to add text to images
  • Improved understanding of prompts, including nuances in punctuation and grammar
  • Accessible through Discord, with the possibility of a web version for users who have generated more than 10,000 pictures

The images generated by MidJourney v6 demonstrate enhanced detail and realism compared to the previous version, reflecting a substantial advancement in image generation capabilities.

Midjourney v6 provides Greater Control, Detail & Creativity

The Midjourney v6 model offers several improvements over its predecessor, v5. These include much more accurate prompt following, longer prompts, improved coherence, and model knowledge. Additionally, v6 features improved image prompting and remix mode, as well as minor text drawing ability. The upscalers in v6 have both ‘subtle’ and ‘creative’ modes, which increase resolution by 2x. The model also supports various features and arguments at launch, such as –ar, –chaos, –weird, –tile, –stylize, and –style raw. However, some features are not yet supported but are expected to be added in the coming month.

Prompting with v6 is significantly different than v5, as the model is much more sensitive to the prompt. Users are advised to be explicit about what they want, as v6 is now much better at understanding explicit prompts. Lower values of –stylize (default 100) may have better prompt understanding, while higher values (up to 1000) may have better aesthetics. The model is available for alpha testing and is expected to be available for billable subscribers soon. It’s important to note that v6 is an alpha test, and things will change frequently and without notice as the model is taken to full release. The engineering team has also increased the moderation systems to enforce community standards with increased strictness and rigor. Overall, v6 represents a significant advancement in the capabilities of the Midjourney model, offering greater control, detail, and creativity in generating imagery.

Midjourney v6 Can Now do Text

Here is a tweet of a side-by-side comparison with DALL-E 3, which debuted earlier this year with the ability to add text

Final Thoughts

MidJourney’s latest marvel, is undeniable that v6 stands as a massively impressive leap in AI-powered image generation. Although v6’s rollout took longer than previous iterations of MidJourney, the patience of its user base has been rewarded with a suite of robust features that solidify the platform’s place at the forefront of digital artistry.

It’s important to note that despite this release being groundbreaking, it is still in its Alpha phase. This means that what we see today is merely the beginning of v6’s journey. The platform is ripe for further refinement and enhancements, promising an even more polished and versatile tool for creators in the near future.

Currently, MidJourney continues to operate primarily through Discord, maintaining its unique approach. Also access is exclusive to those with a subscription, emphasizing its premium position in a market where the democratization of AI art is becoming increasingly significant.

MidJourney’s v6 stands not only as a testament to the progress of AI technology but also as an invitation to artists and enthusiasts alike to engage with the future of creativity. Its delayed but substantial delivery hints at a thoughtful developmental process, one that prioritizes quality and user experience. As the platform continues to evolve and respond to user feedback, we can anticipate v6 to mature into an even more refined version, further revolutionizing the way we conceive, interact with, and ultimately manifest our creative ideas into visual realities.

Stable Zero123: Pushing the Boundaries of 3D Object Generation

0

Stability AI has unveiled its latest breakthrough in AI-generated 3D imagery – Stable Zero123. This new model sets a new high bar for creating photorealistic 3D renderings of objects from a single input image.

Stable Zero123 leverages three key innovations to achieve superior image quality compared to previous state-of-the-art models like Zero123-XL. First, the team curated a high-quality dataset from Objaverse, filtering out low-quality 3D objects and re-rendering the remaining objects with enhanced realism. Second, the model is provided with estimated camera angle data during training and inference, allowing it to generate images with greater precision. Finally, optimizations like pre-computed latents and an improved dataloader enabled much more efficient training, with a 40X speed-up over Zero123-XL.

Early tests show Stable Zero123 generates remarkably vivid and consistent 3D renderings across various object categories. Its ability to extrapolate realistic 3D structure from limited 2D image cues highlights the rapid progress in this blossoming field. With further advancements, AI-assisted 3D model creation could soon become indispensable across industries like gaming, VR, and 3D printing.

Enhanced Training Dataset

The Enhanced Training Dataset for the Stable Zero123 model is based on renders from the Objaverse dataset, utilizing an enhanced rendering method. The model is a latent diffusion model and was trained on the Stability AI cluster on a single node with 8 A100 80GBs GPUs. The training dataset and infrastructure used are specific to the development of the Stable Zero123 model.

Applications and Impact

The enhancements unveiled in Stable Zero123 could have wide-ranging impacts across several industries that rely on 3D digital content. Sectors like gaming and VR are constantly pushing the boundaries of realism in asset creation, and Stable Zero123’s ability to extrapolate intricate 3D models from basic 2D sketches could significantly accelerate development timelines. More consumer-focused applications like 3D printing may also benefit, as users can quickly iterate through design ideas without intensive modeling expertise.

Perhaps most promising is Stable Zero123’s potential to democratize advanced 3D creation capabilities. While photorealistic CGI rendering currently requires specialized skills and tools, Stable Zero123 provides a glimpse of more automated workflows. If ongoing research continues to enhance these generative AI systems, nearly anyone may soon possess the powers of professional 3D artists at their fingertips. Brand-new creative possibilities could emerge when designers and artists of all skill levels can experiment rapidly with 3D concepts that once seemed unattainable. In the near future, Stable Zero123’s innovations could unlock newfound productivity and imagination across industries.

Conclusion

With the launch of Stable Zero123, Stability AI continues its relentless pace of innovation in AI-generated media. Coming on the heels of breakthroughs like Stable Diffusion for image generation and Stable Diffusion Video for text-to-video creation, Stability AI is establishing itself as a leading force in this rapidly evolving landscape. Stable Zero123 delivers their most impressive achievement yet in photorealistic 3D model generation from limited 2D inputs.

The enhancements in data curation, elevation conditioning, and training efficiency have enabled unprecedented image quality leaps over previous state-of-the-art models. As Stability AI continues to push boundaries, applications spanning gaming, VR, 3D printing, and more may see transformative productivity gains from AI-assisted content creation. If progress maintains this velocity, the future looks bright for next-generation creative tools that capture imaginations and unlock new possibilities. Stable Zero123 provides a glimpse into this exciting frontier, where AI equips people across skill levels with once-unfathomable 3D creation superpowers. You can check out the weights on Huggingface.