Home Blog Page 7

Introducing LLaVA: The Next Gen Visual Assistant

LLaVA is a revolutionary large language and vision assistant, making significant strides in the field of multimodal AI. This innovative model brings together the best of language and vision understanding, offering a unique, comprehensive understanding of both visual and textual data. LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

LLaVA: Blending Vision and Language

LLaVA combines two major components: a vision encoder and Vicuna, another large language model. Through this combination, LLaVA is capable of understanding visual information and generating content both visually and textually. But what sets LLaVA apart is its unique approach to visual instruction tuning – it uses machine-generated instruction-following data to enhance large language models’ capabilities in understanding and generating content in multimodal domains.

Surprisingly, though LLaVA is trained with a small multimodal instruction-following dataset (~80K unique images), it demonstrates quite similar reasoning results with multimodal GPT-4 on these two examples, as stated in the arXiv paper.

The authors show impressive results on multimodal reasoning and instruction following with just this small dataset, further illustrating the effectiveness and efficiency of the LLaVA model.

The Underlying Mechanism: CLIP Image Encoder and LLaMA Decoder

LLaVA is built upon a CLIP image encoder and a LLaMA decoder, which is a recently-developed large language model by Meta, revered for its exceptional text understanding capabilities. LLaMA is fine-tuned for the new task involving images, where image and word tokens are passed to the LLaMA decoder for output.

The development of LLaVA involved an extensive data collection process, where language-image instruction-following samples were collected based on the COCO dataset. The performance of LLaVA and GPT-4 was then evaluated using a two-stage instruction-tuning procedure. Impressively, LLaVA achieved an 85.1% relative score compared to GPT-4, underlining the effectiveness of the self-instruct method in multimodal settings.

The project also provides detailed information about the data files used, along with usage and license notices. LLaVA further leverages language-only models to generate language-image instruction pairs, enabling effective instruction following in the multimodal domain.

Constantly Evolving: Open-source and Regularly Updated

LLaVA isn’t just a static model – it’s constantly evolving. As an open-source project, it allows contributions from a wide array of developers and AI enthusiasts. It has already set new state-of-the-art accuracy on science question answering tasks. Notably, when combined with GPT-4, LLaVA achieves even more impressive results. The arXiv paper reports, ‘Surprisingly, GPT-4 is able to provide consistent improvement over all question classes, and achieves a new SoTA accuracy of 92.53%.’ This speaks to the potential and adaptability of LLaVA, showing how it continues to evolve and adapt for better performance.

LLaVA has also shown impressive results with unseen images and instructions, further attesting to its robust capabilities.

How does LLaVA-1.5 deal with OCR?

Here we tested LLaVA’s optical character recognition capabilities by using a screenshot of the LLaVA paper on arXiv as input. Overall, LLaVA’s OCR performed very well – it was able to correctly extract nearly all of the plain text from the paper. I would estimate its accuracy at around 95-98% on normal body text without any special formatting or characters.

The few errors LLaVA made were primarily in extracting text from in-line citation brackets and numeric superscripts. It seems citations and special symbols like brackets still pose challenges for the model’s OCR system. Additionally, in some cases LLaVA omitted or merged together punctuation and spaces between words. However, it robustly recognized all standard letters and numbers in paragraphs, section headings, and figure captions.

Compared to other state-of-the-art multimodal AI models, LLaVA’s raw OCR abilities appear on par with, if not better than, similar large language models that incorporate computer vision. The high accuracy on plain body text suggests it has learned strong optical recognition abilities.

While there is still room for improvement, especially with specialized text, LLaVA’s overall OCR performance is impressive given a simple screenshot as input. As multimodal models continue to advance, extracting high-quality text from images to enrich language understanding will only become more important. LLaVA sets a strong foundation in this regard and points toward future OCR enhancements.

LLaVA Explaining Graphs & Flowcharts

So here it appears the LLaVA model seemed to have struggled a bit. The chart actually has 6 different time frames and four funds on the left. The next paragraph, again appears to hallucinate. It mentioned that there are 4 sections, even though we can clearly see 6. It stated that the 12 months section was on the top left, even though the table does not have a “12 months” section at all. Overall I would not trust LLaVa to read tables, graphs or complex diagrams.

Comparison with Bing GPT-4

I compared the same image and prompt with Bing GPT-4.

It basically came to the same conclusion. Both Models seemed to have missed the “1 Month”, “3 Month” & “YTD” sections. Bing also came up with the name Bing Global Growth Fund. Both models did say things that are correct, but there are also too many errors which still makes it an un-trusty assistant for visually analyzing data.

Comparing to Bard

And for those of you wondering how it compares to Bard, this is what we got.

It honestly did better than expected. It got all the timestamps correct and most of the numbers. However it mistook the MIWO with the SPX column. Still not too bad overall.

Conclusion

As we journey deeper into the world of AI, we encounter incredible innovations like LLaVA that continue to push the boundaries of what is possible. LLaVA is more than just another AI model. It’s a game-changer, a stride towards the future, bringing language and vision understanding into a seamless, potent blend.

With an uncanny knack for mimicking spirits of AI giants like GPT-4 and making strides with relatively small instruction-following datasets, this revolutionary tool has swiftly set a new gold standard in multimodal AI. When it’s coupled with GPT-4, the duet manage to hit an astonishing SoTA accuracy of 92.53%. Impressive, isn’t it?

But what truly sets LLaVA apart is its adaptability. In a rapidly evolving technological landscape, it isn’t merely a static invention. It grows, learns, and adapts, just like us. As an open-source project, it invites the collective genius of developers and AI enthusiasts to keep refining and improving it.

What’s more, the real-life applications of this tool are boundless. Imagine having a digital assistant that doesn’t just hear you, but sees your world as well. With LLaVA, we’re edging closer to that reality. In the end, LLaVA symbolizes a step into a future where AI doesn’t just understand our words but also sees our world.

How to Use Llama 2 locally

0

Llama 2 has arrived! The highly anticipated update to Meta’s language model is now available for local installation. We know many of you have been eager to get your hands on this powerful AI assistant. In this post, we’ll walk you through the steps for setting up Llama 2 locally on your own machine.

  • Preparing for Local Use

Whether you’re an AI enthusiast, developer, or business leader, having access to Llama 2 locally unlocks a world of possibilities. You’ll be able to utilize Llama’s advanced natural language capabilities for a wide range of applications, while keeping your data private and secure.

We’re thrilled to help guide you through the local setup process. With some simple configuration, you’ll have this remarkable AI assistant running smoothly in no time. The team at Meta has put in long hours to deliver this major update, and we think you’re going to love exploring everything Llama 2 has to offer.

Preparing for Local Use

Running Llama 2 locally provides a lot of flexibility since it doesn’t require an Internet connection. We’ve seen fascinating examples of its use, such as creating websites to showcase the cool factors of llamas. And with the release of Llama 2, we now have access to open-source tools that allow running it locally. Here are the main ones:

  • Llama.cpp (Mac/Windows/Linux)
  • Ollama (Mac)
  • MLC LLM (iOS/Android)

Let’s dive into each one of them.

Llama.cpp: A Versatile Port of Llama

Llama.cpp is a C/C++ port of the Llama, enabling the local running of Llama 2 using 4-bit integer quantization on Macs. However, it extends its support to Linux and Windows as well.

To install it on your M1/M2 Mac, here is a line you can use:

“`bash curl -L “https://replicate.fyi/install-llama-cpp” | bash “` This installation command will also run fine on an Intel Mac or Linux machine, but without the `LLAMA_METAL=1` flag:

“`bash curl -L “https://replicate.fyi/install-llama-cpp-cpu” | bash “`

For Windows on WSL, use:

“`bash curl -L “https://replicate.fyi/windows-install-llama-cpp” | bash “`

Ollama: A macOS App

Ollama is a macOS open-source app that lets you run, create, and share large language models with a command-line interface, and it already supports Llama 2.

To use the Ollama CLI, download the macOS app at ollama.ai/download. Once installed, you can freely download Lllama 2 and start chatting with the model.

Here are the lines you can use to download the model:

```bash download the 7B model (3.8 GB) ollama pull llama2

or the 13B model (7.3 GB) ollama pull llama2:13b ```

And then run the model:

“`bash ollama run llama2 “`

Windows: A Detailed Guide

To install Llama on Windows, you need to follow these steps:

  1. Clone and download the Llama repository.
  2. Visit the Meta website and register to download the model/s. Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.
  3. Once you get the email, navigate to your downloaded llama repository and run the download.sh script. Make sure to grant execution permissions to the download.sh script.
  4. During this process, you will be prompted to enter the URL from the email. Do not use the “Copy Link” option but rather make sure to manually copy the link from the email.
  5. Once the model/s you want have been downloaded, you can run the model locally using the command provided in the Quick Start section of the Llama repository.

Windows users have a step-by-step guide for downloading and running the Llama model using Nvidia GPU’s CUDA Toolkit and cloning the relevant GitHub repository.

After following these steps, you can create a powershell function that can quickly run prompts with `llama “prompt goes here”`.

Conclusion

Running Llama 2 locally is becoming easier with the release of Llama 2 and the development of open-source tools designed to support its deployment across various platforms. Whether you are on a Mac, Windows, Linux, or even a mobile device, you can now harness the power of Llama 2 without the need for an Internet connection. As Llama 2 continues to evolve, we can expect even more exciting developments in the near future.

Microsoft Autogen: Orchestrating and Automating LLM Workflows

In a world where large language models (LLMs) are becoming increasingly crucial, Microsoft researchers are introducing AutoGen, a framework that simplifies the orchestration, optimization, and automation of workflows for LLM applications.

Introduction

AutoGen promises to drive a new wave of innovation, offering robust technology to develop Large language models (LLMs) applications utilizing multiple agents. Developed at Microsoft, AutoGen is a robust framework that allows the integration of LLMs, human inputs, and tools to develop agents that can work together to solve tasks.

According to Doug Burger, a Technical Fellow at Microsoft, “Capabilities like AutoGen are poised to fundamentally transform and extend what large language models are capable of. This is one of the most exciting developments I have seen in AI recently.”

The Power of AutoGen

One of the most challenging aspects of LLM applications is the intricate design, implementation, and optimization of workflows. AutoGen simplifies this process, providing a framework for the automation and optimization of LLM workflows.

With AutoGen, you can create customized agents that leverage the advanced capabilities of LLMs like GPT-4. Moreover, it integrates with humans and tools, and supports the automation of chats between multiple agents.

How to Use AutoGen

Building a complex multi-agent conversation system with AutoGen involves two simple steps:

  • Defining a set of agents with specialized capabilities and roles.
  • Defining the interaction behavior between agents, such as the reply when an agent receives messages from another agent.

AutoGen makes the whole process intuitive and modular, allowing agents to be reusable and composable.

Capabilities of AutoGen Agents

The agents in AutoGen can leverage LLMs, tools, humans, or a combination of these elements. This means you can configure the role of LLMs in an agent, ensure human intelligence and oversight through a proxy agent, and execute code/functions driven by LLM with the agents.

Key Features of Microsoft Autogen

AutoGen has several distinguishing features:

    • Automated Workflow Generation: AutoGen eliminates the need for manual coding, making it easy to create, modify, and optimize workflows.
    • Workload Mapping and Scheduling: AutoGen helps in mapping the computational workloads to the available resources and schedules them for optimal efficiency.
    • Insightful Analytics: AutoGen comes with powerful analytics, offering real-time visibility into the performance of workflows, which aids in smart decision-making and future planning.
    • Scalability: AutoGen is built to handle large-scale workflows effortlessly, unbounded by the number of tasks or the size of the datasets involved.
    • Efficiency: Designed to automate and optimize, AutoGen drastically cuts down the time required for set-up and performance tuning.
    • Flexibility: With AutoGen, adapting workflows to new tasks becomes less of a challenge, thanks to its caregiving ability for flexible and dynamic adaptation.
    • Integration: AutoGen facilitates easy integration with a range of external tools and platforms, further amplifying its effectiveness in diverse application contexts.
    • Security: Ensuring secure processing of data, AutoGen adheres strictly to the principles of data privacy and follows standardized security protocols.

Benefits of AutoGen

AutoGen’s agent conversation-centric design offers numerous benefits. Not only does it naturally handle ambiguity and feedback, but it also enables effective coding-related tasks and allows users to opt in or out via an agent in the chat.

Above all, AutoGen supports automated chat and diverse communication patterns. It makes it easy to orchestrate a complex, dynamic workflow and experiment with versatility.

Getting Started With Microsoft AutoGen

AutoGen is freely available as a Python package that can be easily installed via pip. Just run pip install pyautogen to get started. With just a few lines of code, you can enable powerful conversational experiences between LLMs, tools, and humans.

Check out the examples page for a wide variety of tasks that can be automated with AutoGen’s multi-agent framework. The docs provide sample code snippets for each example so you can quickly get up and running.

You can also browse the Github repo to see the full codebase.

Installation

AutoGen requires Python >= 3.8 and has minimal dependencies by default. You can install extra dependencies based on the features needed, for example:

pip install "pyautogen[blendsearch]"

See the Installation page for full details.

The FAQ covers configuring LLMs for inference.

Quickstart

The quickstart guide provides a simple example to try AutoGen’s multi-agent conversation for a stock data plotting task:

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json

# Load LLM endpoints 
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")

assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})

user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stock price change YTD.")

This automatically runs a conversation between the Assistant and UserProxy agents to accomplish the task.

See twoagent.py for the full code.

Conclusion

As LLM applications become increasingly complex, frameworks like AutoGen are poised to become indispensable. AutoGen is more than just a robust framework—it’s a powerful tool that simplifies and optimizes the design, implementation, and automation of LLM workflows, thereby helping developers to create next-generation applications.

AutoGen is an open-source project under active development and encourages contributions from individuals of all backgrounds. With it, the future of LLM applications looks promising.

Google DeepMind’s Advances in General-Purpose Robotics Learning

In our increasingly interconnected world, robots excel as specialists but lag behind as generalists. To address this, Google DeepMind, in collaboration with 33 academic labs, have developed an innovative set of resources aimed at advancing general-purpose robotic learning. This blog post explores the strides Google DeepMind has made in this field, with a focus on the Open X-Embodiment dataset and RT-1-X robotics transformer model.

The Dawn of Robotic Generalists: Open X-Embodiment dataset and RT-1-X model

Traditionally, robots are trained per task, per robot type, and per environment with variations requiring retraining from scratch. Google DeepMind, in a game-changing move, has released resources that can significantly enhance a robot’s ability to learn across different types. These resources include the Open X-Embodiment dataset and the RT-1-X robotics transformer model.

Through pooling data from 22 robot types, Google DeepMind and its partners have been able to create a highly diverse dataset. This led to the development of the RT-1-X model, trained on this dataset, which displays skill transfer across various robot embodiments.

A Leap Forward in Performance: The Benefits of Multiple Embodiment Training

Training a single model using data from numerous embodiments results in improved performance across many robots, compared to models trained on data from individual embodiments. This fact was confirmed when the RT-1-X model was tested in five different research labs, yielding a 50% success rate improvement on average across five commonly-used robots. Additionally, the RT-2 visual language action model tripled its performance on real-world robotic skills when trained on data from multiple embodiments.

Open X-Embodiment Dataset: A Step Towards Robotic Mastery

The Open X-Embodiment dataset, comprising data from 22 robot embodiments, is a critical step towards training a generalist model capable of controlling various types of robots. This dataset, created in collaboration with over 20 academic research institutions, is a monumental achievement and the most comprehensive robotics dataset to date.

Introducing RT-X: A General-Purpose Robotics Model

The RT-X model combines two of Google DeepMind’s robotics transformer models, demonstrating how a diverse, cross-embodiment dataset enables significantly improved performance. In tests, the RT-1-X model trained with the Open X-Embodiment dataset outperformed the original model by 50% on average.

Emergent Skills in RT-X

Experiments demonstrated that co-training with data from different platforms imbues the RT-2-X model with additional skills not present in the original dataset. This equips it to perform novel tasks, demonstrating the power of diverse training data.

The Emergent Skills in RT-X are skills that the RT-2-X model was not capable of previously, but was able to learn by combining data from other robots into the training. The RT-2-X model was three times as successful as the previous best model for emergent skills, demonstrating that combining data from other robots into the training improves the range of tasks that can be performed even by a robot that already has large amounts of data available. This is relevant to the topic because it shows how combining data from multiple embodiments can lead to better performance across many robots than those trained on data from individual embodiments, and how this can lead to the development of more useful helper robots by scaling learning with more diverse data and better models[1].

Responsible Advancements in Robotic Research

Google DeepMind’s work shows that models which generalize across embodiments can lead to significant performance improvements. Future research could explore combining these advancements with self-improvement aspects to enable the models to improve with their own experience. Another potential area of exploration could be how different dataset mixtures may affect cross-embodiment generalization.

By advancing robotics research in an open and responsible manner, we are one step closer to a future where general-purpose robots make our lives easier, more efficient, and more enjoyable.

Closing Thoughts

The work being done by Google DeepMind and its partners represents an unprecedented paradigm shift in general-purpose robotics. By developing datasets and models that can generalize across many robot embodiments, they’re not only enhancing robotic performance but significantly broadening the range of achievable tasks. This advancement in technology has vital implications for various industries, from healthcare to manufacturing and beyond. Moreover, as general-purpose robots become more capable, we can anticipate a considerable positive impact on societal productivity and efficiency. We stand at the cusp of a future where robots can seamlessly adapt to an array of tasks, transforming not only the way we work but also how we live.

Mistral 7B: An Open-Source LLM Pushing the Frontiers of AI

The field of artificial intelligence has seen rapid advances in recent years, particularly in the domain of large language models (LLMs). These models, with their vast parameters and ability to understand and generate natural language, are unlocking new possibilities for AI. One of the most promising new LLMs is Mistral 7B, an open-source model developed by startup Mistral AI. With 7.3 billion parameters, Mistral 7B represents the cutting-edge of generative AI capabilities.

We’ll provide an overview of Mistral 7B and its key features. We’ll explore how its sliding window attention mechanism provides enhanced context understanding. We’ll also discuss benchmark performance, comparisons to other models, open-source accessibility, and fine-tuning capabilities. Mistral 7B demonstrates the potential of large language models to empower new AI applications and use cases. As an open-source model, it signals a shift towards greater openness and customization in the AI field.

Overview of Mistral 7B

Mistral 7B is an open-source large language model (LLM) developed by Mistral AI, a startup in the AI sector. It is a 7.3 billion parameter model that uses a sliding window attention mechanism. Mistral 7B is designed to revolutionize generative artificial intelligence and offer superior adaptability, enabling customization to specific tasks and user needs. Some key features of Mistral 7B include:

Parameter size: Mistral 7B is a 7.3 billion parameter model, making it one of the most powerful language models for its size to date.

Sliding window attention mechanism: Mistral 7B uses a sliding window attention mechanism, in which each layer attends to the previous 4,096 hidden states.

Open-source: Mistral 7B is an open-source model released under the Apache 2.0 license, which means it can be used without restrictions.

Fine-tuning capabilities: Mistral 7B can be fine-tuned for specific tasks, such as chat or instruction datasets, and has shown compelling performance.

Mistral 7B has been compared to other large language models, such as Llama 2 13B and Llama 1 34B, and has outperformed them on many benchmarks. It has also approached CodeLlama 7B performance on code while remaining good at English tasks. Mistral 7B’s raw model weights are distributed with Bittorrent and on Hugging Face.

Key Features and Capabilities

Mistral 7B is a language model released by Mistral AI team. It is a 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, outperforms Llama 1 34B on many benchmarks, and approaches CodeLlama 7B performance on code while remaining good at English tasks. It uses Grouped-query attention (GQA) for faster inference and Sliding Window Attention (SWA) to handle longer sequences at smaller cost. Mistral 7B is easy to fine-tune on any task and can be used without restrictions. It can be downloaded and used anywhere with their reference implementation, deployed on any cloud using vLLM inference server and skypilot, and used on HuggingFace. Mistral 7B is fine-tuned for chat and outperforms Llama 2 13B chat.

histograms

Sliding Window Attention for Enhanced Context

Vanilla attention

Causal attention mask

First off we have Vanilla attention. What this basically means is that attention is how information is shared between tokens in a sequence. In vanilla transformers, attention follows a causal mask: each token in the sequence can attend to itself and all the tokens in the past. This ensures that the model is causal, i.e. it can only use information from the past to predict the future.

Sliding window to speed-up inference and reduce memory pressure

The number of operations of attention is quadratic in the sequence length, and the memory pressure is linear in the sequence length. At inference time, this incurs higher latency and smaller throughput due to reduced cache availability. To alleviate this issue, we use a sliding window attention [1,2]: each token can attend to at most W tokens in the past (here, W=3).

Sliding window attention

Rolling buffer cache

We implement a rolling buffer cache. The cache has a fixed size of W, and we store the (key, value) for position i in cache position i % W. When the position i is larger than W, past values in the cache are overwritten.

Benchmark Performance and Comparisons

The benchmarks are categorized by their themes. This data shows the performance of different models on various metrics. The models are LLAMA 2 7B, LLAMA 2 13B, Code LLAMA 7B, and Mistral 72B along with several metrics.

table

Some possible analysis that we can make from this data are:

  • It compares pretrained models like LLAMA and Mistral to a finetuned model (Code LLAMA). Finetuning generally improves performance on specific tasks.
  • LLAMA and Mistral are large pretrained models with 7B and 13B parameters. More parameters usually leads to better performance, as seen by LLAMA 13B outperforming LLAMA 7B.
  • Performance varies significantly across datasets. For example, all models perform very poorly on the math dataset compared to the other NLU tasks. This suggests these models still struggle with mathematical and symbolic reasoning.
  • The finetuned Code LLAMA model does much better on the HumanEval dataset compared to the pretrained models. This dataset seems to benefit more from finetuning towards a specific type of text.
  • Mistral outperforms LLAMA on most datasets, suggesting Mistral is a better pretrained model overall. The GSM8K dataset shows the biggest difference, indicating Mistral has advantages for conversational tasks.

In summary, the data compares different NLP models across a variety of tasks and datasets. It highlights model size, finetuning, and choice of pretrained model as key factors influencing performance. More analysis could examine specific model architectures, training data, etc. to further understand differences.

Open-Source Accessibility

Mistral is open sourced under the Apache License 2.0 license. You can try it for free with Perplexity Labs. This is a new addition of open source models such as LLAMA, Falcon and PaLM.

Fine-Tuning for Customization

One of the key strengths of Mistral 7B is its ability to be fine-tuned for specific tasks or datasets. While the base model demonstrates strong general performance, customization through fine-tuning allows it to excel at more specialized applications.

Early testing shows that Mistral 7B fine-tunes well and is able to follow instructions clearly after fine-tuning. It appears to be a robust and adaptable model overall. This makes it well-suited for fine-tuning on tasks like conversational AI, classification, summarization, and more.

Given Mistral 7B’s strong performance on code tasks already, there is significant potential to fine-tune it for specialized coding and software engineering applications. We can expect to see fine-tuned versions of Mistral 7B for code generation, bug fixing, and other coding domains in the near future.

The ability to easily customize and adapt Mistral 7B to specific use cases makes it a versatile option for organizations and developers. Fine-tuning unlocks its full potential while retaining its general intelligence capabilities.

Falcon 180B Takes Flight: Exploring the Capabilities

The field of natural language processing is advancing at a rapid pace, empowered by the development of ever-larger and more capable AI models. The latest entrant aiming to push boundaries in this space is Falcon 180B – a newly released open-source language model boasting 180 billion parameters.

With its vast scale and cutting-edge design, Falcon 180B represents an exciting milestone in AI’s quest to reach human levels of language proficiency. We’ll take a closer look at this powerful new system, examining its origins, capabilities, and performance benchmarks.

Developed by Innovation Institute (TII) in Abu Dhab, Falcon 180B demonstrates remarkable prowess on natural language tasks. It currently ranks at the top of the Hugging Face Leaderboard for open-source large language models, even surpassing models from leading AI labs like Meta and Google in certain tests.

Image

Under the hood, Falcon 180B implements a novel training methodology focused on Constitutional AI principles like safety, honesty and truth-seeking. This rigorous approach has yielded a model with state-of-the-art natural language understanding, while also aligning its goals and incentives with human values.

As AI continues its relentless march forward, models like Falcon 180B underscore the rapid progress being made. In this article, we’ll analyze its strengths, benchmark its abilities, and assess its implications for the future of language AI. The flight of Falcon 180B is just beginning – let’s explore where its wings may take us next.

Model Specifications

The recently released Falcon-180B is a language model that has 180 billion parameters and is trained on 3.5 trillion tokens. It is a causal decoder-only model trained on a causal language modeling task, which means it predicts the next token. Falcon-180B is the most powerful open LLM and ranks first on the Hugging Face leaderboard for open access LLMs. It is on par with GPT-4 and Google’s PaLM 2

Setup & Performance

To swiftly run inference with Falcon-180B, you will need at least 400GB of memory. The hardware requirements for training Falcon-180B are QLoRA 160GB 2x A100 80GB, and for inference, GPTQ/int4 320GB 8x A100 40GB1. The size of Falcon-180B is 102GB for falcon-180B-q4_K_M.gguf and 138GB for falcon-180B-q6_K.gguf. To put its size into perspective, Falcon-180B consists of parameters that are 2.5 times larger than Meta’s LLaMA 2 model.

Data Use

The data in the table you sent shows the sources of data used to train the Falcon 180B model. The table is titled “Data source” and has 6 rows and 3 columns. The first column, titled “RefinedWeb”, lists the different sources of data used to train the model. The second column, titled “Fraction”, shows the percentage of the total training data that each source represents. The third column, titled “Tokens”, shows the number of tokens from each source used to train the model. According to the table, 75% of the training data for Falcon 180B comes from a massive web crawl in English, representing 750 billion tokens. 7% of the training data comes from a European crawl, representing 70 billion tokens. 6% of the training data comes from ebooks, representing 60 billion tokens. 5% of the training data comes from conversations on platforms like Reddit, StackOverflow, and HackerNews, representing 50 billion tokens. Another 5% of the training data comes from code, representing 50 billion tokens. Finally, 2% of the training data comes from technical sources like arXiv, PubMed, and USPTO, representing 20 billion tokens.

Image

However, there is a concern about the limited representation of code in the training mix, as it only comprises 5%. Code is seen as highly valuable for boosting reasoning, mastering tool use, and empowering AI agents. In fact, GPT-3.5 is finetuned from a Codex base. Without sufficient coding benchmark numbers and considering the limited code pretraining, it is assumed that the model may not perform well in coding-related tasks. Therefore, claims of being “better than GPT-3.5” or approaching GPT-4 may not be justified without incorporating coding as an integral part of the pretraining recipe, rather than an afterthought in finetuning.

Not only that but, it is suggested that it is time to explore the use of Mixture of Expert (MoE) for models with a capacity of 30B+. While there have been MoE LLM models of less than 10B, it is essential to scale up significantly in order to advance the field.

Testing

To evaluate the performance of the model, we conducted testing by providing it with a straightforward LeetCode question, specifically the task of inverting a binary tree in Java. This is the result

Commercial Use

Here are the key points of the Falcon 180B license:

  • Grants a royalty-free, worldwide, non-exclusive copyright and patent license to use, reproduce, distribute, and create derivative works of Falcon 180B (Sections 2 and 3)
  • Requires distribution agreements for the model to incorporate enforceable Acceptable Use Policy and hosting restrictions (Section 4)
  • Requires compliance with the Acceptable Use Policy, which may be updated by TII (Section 5)
  • Requires public statements about derivative works to credit TII and Falcon 180B (Section 6)
  • Contributions are under the terms of the license, unless stated otherwise (Section 7)
  • Separate license required for “Hosting Use” like offering Falcon 180B through an API (Section 9)
  • Provides the model on an “as is” basis, disclaiming warranties (Section 10)
  • Limits liability of contributors (Section 11)
  • Allows offering warranties/liability only on own behalf, not other contributors (Section 12)

In summary, it enables broad use of Falcon 180B but with restrictions like the Acceptable Use Policy, attribution, and no hosting without permission, while limiting contributor liability. All in all Falcon 180B does not appear to be open source.

Impact

The release of Falcon 180B represents an exciting milestone in the development of large language models. While not open source, its impressive capabilities highlight the rapid progress in AI. However, as adoption grows, it will be important to ensure these models are used responsibly and their benefits shared as widely as possible.

More openness and contribution from the AI community would be ideal to improve Falcon 180B. But the proprietary nature of the model and its specialized computational requirements currently limit participation. Running the 180 billion parameter model requires high-end GPUs, making consumer access difficult. And as we experienced, Falcon 180B has limitations in areas like coding, so open development could help strengthen its skills.

I hope the popularity of models like Falcon 180B accelerates work on optimized deployment. It’s disappointing quantized inference on CPUs isn’t more feasible yet. Supporting reduced precision and int8 or 4 bit quantization on CPUs would make these large models far more accessible. Theoretically, with a powerful CPU and enough RAM, Falcon 180B could already run acceptably for some applications. But full open source CPU support for ultra-low precision would be a gamechanger.

It’s unclear why quantized CPU inference receives so little focus compared to GPUs. There don’t appear to be fundamental technical barriers. And affordable, high-core CPUs can match GPU pricing. Unlocking fast int8 and lower quantization on CPUs would enable more participatory AI development. As large language models continue advancing, we need quantization and optimizations that keep pace so more users can benefit from these breakthroughs.

4 Ways to Use Stable Diffusion for Free

Money doesn’t grow on trees, but AI images sure seem to! Stable Diffusion has exploded in popularity for its ability to generate incredibly detailed and varied images with just a text prompt. But while big tech companies charge an arm and a leg for access to similar AI tools, Stable Diffusion is totally free and open source.

We’ll share four sneaky ways you can get your hands on this amazing AI art generator without spending a penny. Forget hiring a professional illustrator or buying licenses for expensive design software – with a bit of creativity, anyone can take advantage of Stable Diffusion’s magic, for free!

So grab your thinking caps and get ready to meme, dream, and scheme as we explore the wondrous world of free Stable Diffusion. Let’s get creating!

Clipdrop

Clipdrop.co is a website that offers a suite of tools for modifying images using AI technology. Some of the tools available on the website include:

Stable Diffusion XL to generate high-resolution realistic images with AI; Uncrop to uncrop photos to any image format; Reimagine XL to create multiple variants of an image with Stable Diffusion; Stable Doodle to transform doodles into real images in seconds; Cleanup to automatically remove objects, people, text and defects; Remove Background to accurately extract main subjects; Relight to relight images beautifully; Image Upscaler to instantly upscale images 2x or 4x and enhance details; Replace Background to seamlessly teleport anything anywhere with AI; and Text Remover to erase text from images. With this impressive range of AI-powered editing tools available for free, Clipdrop.co makes it easy for anyone to unlock the potential of Stable Diffusion.

Note: For users on the free tier, Clipdrop’s Stable Diffusion XL allows you to generate up to 400 images per day, but these will bear a watermark. If you require watermark-free images, consider upgrading to a paid subscription.

Leonardo

Embrace the freedom of creative exploration with Leonardo. This advanced AI platform stands out in the creative industry by integrating the latest in AI technology, without sacrificing the human touch.

At every stage of content creation, Leonardo offers fine-grain control to ensure your creative vision is perfectly realized. With its cutting-edge features, it leads in model fine-tuning, prompt adherence, training speed, inference pace, and multi-image prompting capabilities. Leonardo tackles common challenges like image degradation head-on by offering custom upscaling, and is committed to ongoing improvements and enhancements.

Leonardo isn’t just free, it’s unconditionally free. No expiration date, no hidden charges. It offers a daily quota of tokens for you to use in your creative projects. And for an added layer of benefits, Leonardo provides paid subscriptions that include an increased token allowance, faster image generation, and access to premium features.

But how does Leonardo work? It offers a range of meticulously fine-tuned models for a broad content generation spectrum. Premium users have the privilege of refining their own set of models using a handful of images, allowing for the creation of unique styles and content types. Furthermore, these customized models can be shared with others on the platform, promoting a culture of collaboration and innovation.

This is just the beginning for Leonardo, with promises of even more impressive features down the line. For more about their subscription plans, just click the “Upgrade” button in the top left corner of the page. Leonardo is all about empowering creators and innovators, so why wait? Transform your creative process today.

Stable Diffusion Online

Delve into the realm of AI-based creativity with Stable Diffusion Online. This platform provides a comprehensive suite of tools for generating photo-realistic images from text prompts using cutting-edge AI technology. With a user-friendly interface and the latest Stable Diffusion model, creating high-quality images of anything you can envision has never been easier or faster!

This GPU-enabled website offers quick image generation capabilities. Simply type in a text prompt, hit Generate, and watch as Stable Diffusion Online brings your imagination to life in mere seconds. As such, it’s ideal for those moments when you need a quick creative output without compromising on quality.

Stable Diffusion Online values your privacy. It operates without collecting or using any personal information and does not store your text or resultant images. Hence, you can focus on your creative process without worrying about the security of your data.

The platform presents no restrictions on the type of content you can enter, and its Stable Diffusion Playground allows users to generate images without any coding required. Should you face any user traffic-related errors, simply retry until successful.

Further enhancing its services, Stable Diffusion Online recently introduced a new Prompt Database feature. Here, users can explore and draw inspiration from over 9 million Stable Diffusion prompts submitted by users worldwide. This feature not only provides a rich source of creative inspiration but also fosters a global community of innovative thinkers. Transform your creative endeavors with Stable Diffusion Online – a gateway to AI-powered artistic brilliance!

Ideogram

Enter the innovative world of AI-generated image creation with Ideogram. Founded by former Google Brain researchers and backed with considerable seed funding, Ideogram utilizes the power of Stable Diffusion to offer a suite of tools for generating photo-realistic images from text input.

One of Ideogram’s major selling points is its ability to reliably generate text within images. Need lettering on signs or company logos? Ideogram has you covered. The company offers various preset image generation styles on its web app, including a unique “typography” style. This mode unleashes the power of text rendering, providing an array of colors, fonts, sizes, and styling options.

That’s not all. Other preset styles include 3D rendering, cinematic, painting, fashion, product, illustration, conceptual art, ukiyo-e, and more. You can select multiple styles at once and apply them to your creations, affording a degree of customization that is truly impressive.

Ideogram is currently available for sign-up in beta, with its Discord server and web app already buzzing with the amazing creations of its user base. The examples demonstrate the high quality of lettering and images that can be generated, showcasing a significant advancement compared to other state-of-the-art options. Join the creative revolution with Ideogram, the next generation of AI-powered image creation

Conclusion

The world of AI and image generation is vast and constantly evolving, with Stable Diffusion at the forefront of this creative revolution. We’ve explored four ways you can access and utilize this powerful tool for free, but remember, this is by no means exhaustive.

With an enormous open-source community, there are likely even more resources out there offering free access to Stable Diffusion, each with its own unique advantages. It’s evident that Stable Diffusion’s flexibility and wide scope of application have fostered an active, thriving community of creators and innovators.

That being said, initiating Stable Diffusion models can be technically demanding, and not everyone possesses the hardware capabilities to run it. This is precisely why free resources like Clipdrop, Leonardo, Stable Diffusion Online, and Ideogram are so invaluable. They democratize access to this cutting-edge technology, overcoming technical and hardware barriers and enabling us all to harness the full potential of Stable Diffusion.

So, push the bounds of your creativity, experiment with these platforms, and above all else – have fun creating! Remember, the only limit is your imagination. We can’t wait to see where your creative journey with Stable Diffusion takes you next.

WizardCoder the AI models that beats GPT-4 at coding

The field of AI coding assistants took a major leap forward this week with the announcement of a new model called WizardCoder. WizardCoder represents a breakthrough in instruction-following and code generation capabilities. Early benchmark results indicate that WizardCoder can surpass even the formidable coding skills of models like GPT-4 and ChatGPT-3.5. This impressive performance stems from WizardCoder’s unique training methodology, which adapts the Evol-Instruct approach to specifically target coding tasks. By fine-tuning advanced Code LLMs like StarCoder using newly generated code instruction datasets, the researchers have produced a model that appears poised to set a new bar for AI programming. The release of WizardCoder promises to further accelerate the integration of AI into software development workflows.

You can try it here

Comparison With Other Models🔥

The following figure shows that our WizardCoder-Python-34B-V1.0 attains the second position in this benchmark, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).

WizardCoder

Comparing with Closed Source Models

WizardCoder

While the preliminary results for WizardCoder are extremely promising, it is important to note that its capabilities have yet to be independently verified outside of the lab environment. The researchers emphasize that the performance is reproducible given access to the same training data and computing resources. However, until WizardCoder is released for public testing, some skepticism remains warranted when comparing its results to established models like GPT-4 and ChatGPT that have been rigorously benchmarked. Real-world coding tasks involve complexities not fully captured by current benchmarks. Additional rigorous testing will be needed to determine if WizardCoder can maintain its superiority once deployed for general use. For now, we await with cautious optimism the public release of this potentially game-changing AI assistant.

Testing The Model

You can go ahead and try the Streamlit demo. Here we tested Leetcode 141, Linked List Cycle and to solve it in python using WizardCoder 34B, which is based on the Code Llama architecture. WizardCoder 34B passed the Leetcode test case for detecting cycles in a linked list. When we submitted the solution, it beat 79% of other users’ submissions on runtime and 97% on memory usage.

This demonstrates WizardCoder’s proficiency on algorithmic coding challenges like those on Leetcode. By leveraging the Code Llama foundation and fine-tuning on programming tasks, WizardCoder is able to generate optimized solutions that outperform many human coders. Passing the Linked List Cycle test case shows WizardCoder’s capabilities on classical computer science problems like cycle detection in pointers/references. The strong runtime and memory results highlight WizardCoder’s efficiency gained through deep learning and reinforcement learning techniques.

This was our end result:

class Solution:
    def hasCycle(self, head: ListNode) -> bool:
        slow = head
        fast = head

        while fast and fast.next:
            slow = slow.next
            fast = fast.next.next

            if slow == fast:
                return True

        return False

Future of Open Source

The development of powerful AI systems like WizardCoder that can generate high-quality code has very interesting implications for the future of open source software:

  • It could greatly expand the number of people able to meaningfully contribute to open source projects. With an AI assistant handling much of the actual coding, participation may open up to those with domain expertise but limited programming skills.
  • New open source projects could potentially be launched much more rapidly by leveraging AI to generate core code components. This increased velocity could lead to faster innovation.
  • An abundance of AI-generated code could negatively impact some of the learning and skill development that comes from contributing to open source today. Maintaining coding proficiency may require additional effort.
  • There may be risks from low-quality or insecure code if proper oversight and testing of AI outputs is not maintained, undermining the reliability of some open source projects.
  • The economics and incentives around open source may shift if AI can replace or devalue certain types of human contributions. New models may emerge.
  • Overall, the open ethos of sharing knowledge and collaborating could be strengthened as AI lowers the barriers to participating in open source. But managing the impacts of increased automation on open source communities will also be an important challenge.

The promise and perils of AI-powered code generation will likely inspire lively debate as these technologies evolve. Maintaining the values and benefits of open source in an AI-enabled future will require insight, vision and cooperation from all involved.

Companies like Meta seem to be committed to open source

The reported plans for Meta’s GenAI team to develop and open source an AI model comparable to GPT-4 in capability suggest the company remains committed to advancing open source artificial intelligence. Despite facing criticism from some alignment researchers concerned about potential harms, Meta seems intent on contributing Llama-3 to the open source ecosystem. They appear to believe the benefits of enabling widespread research and innovation with the model outweigh the risks. This choice aligns with Meta’s long track record of open sourcing key technologies like PyTorch and fairseq to empower both internal and external AI development. While increased capabilities like that of GPT-4 do raise important societal questions, Meta’s stance underscores their continued devotion to open source as a crucial means of driving progress in AI. Other tech companies and researchers may make different judgments on model access, but Meta’s provision of open resources has demonstrably accelerated innovation across the field.

Meta Unveils LLama 2 Code

Meta has unveiled a new AI tool called Code Llama that leverages the company’s Llama 2 large language model to generate, complete, and debug code. Code Llama aims to streamline developer workflows by automating routine coding tasks.

According to Meta, Code Llama can produce code based on text prompts, fill in incomplete code snippets, and identify and fix errors in existing code bases. In addition to a general Code Llama model, the company has released Code Llama-Python, which specializes in Python, and Code Llama-Instruct, which understands instructions in natural language.

Code Llama outperformed other publicly available AI coding assistants in benchmark testing, Meta claims. The different versions of Code Llama are tailored to specific use cases and are not interchangeable. Meta notes that the base Code Llama and Code Llama-Python should not be used for processing natural language instructions.

Code Llama is available in three model sizes, including a compressed version designed for low-latency applications. The tool will be released under the same community license as Llama 2, allowing free use for research and commercial projects.

Code Llama represents Meta’s entry into a rapidly evolving space of AI coding assistants. Competitors include GitHub’s Copilot, Amazon’s CodeWhisperer, and Google’s AlphaCode. The launch of Code Llama highlights the growing role of AI in augmenting and automating software development. According to Meta, these tools will allow programmers to focus more time on creative, strategic tasks.

How Code Llama Works

Code Llama is a code-specialized version of Llama 2 that has been further trained on code-specific datasets. It can generate code and natural language about code from both code and natural language prompts. It supports popular programming languages such as Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash. Code Llama can be used for code completion, code generation, and debugging. It is designed to help programmers write robust and well-documented software.

Code Llama models provide stable generations with up to 100,000 tokens of context. The models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens. Having longer input sequences unlocks new use cases for Code Llama, such as providing more context from a codebase to make the generations more relevant. It also helps in debugging scenarios in larger codebases, where staying on top of all code related to a concrete issue can be challenging for developers. When faced with debugging a large chunk of code, developers can pass the entire length of the code into the model.

Image

Benchmark Testing Against Other AI Coding Tools

Code Llama’s benchmark performance is impressive compared to other models. It outperforms open-source, code-specific language models (LLMs) and even surpasses Llama 2. For example, Code Llama 34B achieved a score of 53.7% on the HumanEval benchmark and 56.2% on the Mostly Basic Python Programming (MBPP) benchmark. These scores are the highest among other state-of-the-art open solutions and are on par with ChatGPT.

The advantage of Code Llama lies in its enhanced coding capabilities. It is a code-specialized version of Llama 2, trained on code-specific datasets and sampled for longer. It can generate code and natural language about code, complete code, and assist in debugging. It supports popular programming languages like Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.

Code Llama offers three different sizes with 7B, 13B, and 34B parameters, each trained with 500B tokens of code and code-related data. The smaller models (7B and 13B) have fill-in-the-middle (FIM) capability, enabling code completion out of the box. The 34B model provides the best results and superior coding assistance. However, the smaller models are faster and more suitable for low-latency tasks like real-time code completion.

In summary, Code Llama’s benchmark performance is excellent, surpassing other models in code-specific tasks. Its enhanced coding capabilities, support for multiple programming languages, and different model sizes make it a powerful tool for developers, improving efficiency and productivity in coding workflows.

Best Postgres GUIs to Try in 2023: Streamlining Your Database Management

0

Greetings database lovers! Do you dream of effortless Postgres management? Are you exhausted by the soul-crushing tedium of slogging through command lines and SQL scripts? Have you ever angrily shaken your fist at pgAdmin, lamenting its user hostility? Well suffer no more my data-wrangling friends! The GUIs of your salvation have arrived.

In 2023, there’s no reason to endure clunky outdated tools or bang your head against manual database administration. The future is here, and it’s visually pleasing with intuitive UX! Ditch those 1990’s era GUIs and step into a new age of elegant Postgres management.

We’ve vetted and curated the latest and greatest options to simplify your Postgres workflow. Whether you’re a developer, analyst or IT specialist, these user-friendly GUIs will save you time and sanity. Say goodbye to convoluted interfaces that require a PhD to navigate! You’re busy and important, so we found the most streamlined tools that won’t make you pull your hair out.

Life’s too short for fighting wonky databases and dated software, so come with us on a tour of Postgres brilliance! We promise blissful productivity and minimal frustration. You’ll be managing Postgres faster than you can say “declarative referential integrity constraints.” OK, that’s enough database humor – let’s get GUI-ing!

pgAdmin

One of the most popular and fully-featured PostgreSQL GUIs. Open source and free to use. Supports multiple database connections, query building, data editing, and much more.

pgAdmin is arguably the best graphical user interface (GUI) for working with PostgreSQL databases. It offers users an intuitive way to manage PostgreSQL servers, databases, schemas, tables, columns, indexes, constraints, triggers, functions, and more.

One of pgAdmin’s standout features is its ability to visually design and manage database objects through a desktop-style interface. Adding a new table, editing a column’s data type, building a query – these tasks are made simpler in pgAdmin compared to working directly with SQL scripts. The GUI allows you to see all the objects in a server or database in an expandable tree view, making it easy to navigate complex databases.

pgAdmin also simplifies many administration tasks. You can easily add and configure new PostgreSQL servers from within the interface. Backup and restore operations are supported with just a few clicks. pgAdmin provides a syntax-highlighted SQL query tool for executing queries against your databases. There is also built-in support for visually designing and executing batch jobs.

Performance monitoring and diagnostics are another strong suit of pgAdmin. You can view active queries, session information, lock information, and more. This insight helps you optimize your PostgreSQL implementation.

In summary, pgAdmin’s combination of visual database management, administration tools, and performance diagnostics make it a leading choice for managing PostgreSQL in a graphical way. The open source tool is well-supported and trusted by thousands of PostgreSQL users.

DBeaver

Free and open source multi-platform database tool for developers and database administrators. Supports PostgreSQL and many other databases. Has SQL editor, data viewer, ER diagrams, and more.

DBeaver stands out as one of the best open source PostgreSQL GUIs available today. It provides database developers and administrators with a robust toolset for managing PostgreSQL databases and working with data in a visual way.

One of DBeaver’s key strengths is its wide database support – it can connect to over 80 different database engines beyond just PostgreSQL. This makes it easy to use the same interface when working with different databases. The UI is cleanly designed and intuitive for navigating database objects.

For regular users, DBeaver allows you to query, analyze, and export data in a spreadsheet-like interface. For advanced users, it includes an excellent SQL editor with auto-complete, syntax highlighting, and query parameter support. Database administrators will appreciate features like schema/data migration, user management, and monitoring active connections.

DBeaver also simplifies database development tasks. Developers can visually design database schemas, build queries across multiple tables, and generate analytical reports. The tool integrates well with source control like Git for team collaboration.

In summary, DBeaver excels as a cross-platform, universal database tool that offers a full suite of features needed by database developers, analysts, and administrators alike. Its focus on usability makes it one of the top graphical choices for managing PostgreSQL databases.

Valentina Studio

Free GUI tool for PostgreSQL, MySQL, MariaDB, SQLite, and other databases. Cross-platform and easy to use. Good for query building, data browsing, and database administration.

DataGrip

Commercial GUI by JetBrains. Advanced IDE with code completion, on-the-fly error checking, version control integration, and more features. 30-day free trial available.

Postico

Mac-only PostgreSQL GUI client. Simple and elegant interface focused on commonly used database operations like query building, data browsing, and table structure management. Free with paid Pro version also available.