How to 10x Your LLM Prompting With DSPy

Tired of spending countless hours tweaking prompts for large language models (LLMs), only to see marginal improvements? Enter DSPy, a groundbreaking framework that’s set to revolutionize how we work with LLMs. We’ll explore how DSPy can help you dramatically enhance your LLM prompting efficiency and effectiveness.

DSPy isn’t just another tool in the AI developer’s toolkit—it’s a paradigm shift. By separating the logic of your AI pipeline from the specifics of LM prompts and weights, DSPy introduces a level of abstraction that allows for unprecedented optimization. Whether you’re working with powerhouses like Claude 3.5 and GPT-4o or more compact models like Gamma and Llama3-8b, DSPy offers a systematic approach to maximizing their potential.

Beyond hand-crafted prompts: Say goodbye to tedious prompt engineering. DSPy automates prompt optimization with sophisticated algorithms, letting you focus on the core logic of your AI system.
Unlocking the full potential of LLMs: DSPy doesn’t just improve prompt quality, it optimizes the entire LLM pipeline. Achieve significant performance gains, even with challenging tasks.
A new paradigm for AI development: DSPy shifts the focus from painstaking prompt tuning to building robust and adaptable AI systems.

We’ll look into how DSPy can help you achieve higher quality outputs, avoid common pitfalls, and streamline your workflow.

DSPy Basics

Using DSPy effectively for solving a new task is all about iterative machine learning with LLMs. You start with initial choices, which will likely be imperfect, and refine them over time. Here’s a brief overview:

DSPy is all about iterative development, starting simple and adding complexity over time.
DSPy’s optimizers automate the generation of optimized prompts, instructions, and even LM weights.
The community Discord server is a great resource for support and guidance.

How does DSPy help with prompting?

Imagine you’re trying to teach an LLM (like GPT-4o). You want it to do a specific task, like summarizing a news article.

Traditional prompting is like giving a bunch of random commands, hoping one sticks. You might say “Summarize!”, “Tell me the key points!”, “Give me a short version”, etc. Some commands might work, some won’t, and it’s hard to know which ones are best.

DSPy is like having a special training tool for your language model. It takes your desired task (summarizing an article) and your examples (news articles with summaries you like), then it automatically figures out the best “commands” (prompts) to give your language model.

Here’s how it works:

Define your task: Tell DSPy what you want your language model to do (e.g., summarize an article).
Give it examples: Provide some examples of what you want (e.g., news articles and their summaries).
DSPy optimizes: DSPy figures out the best way to communicate your task to the language model, using the examples you provided. This might involve crafting new prompts, adding specific instructions, or even adjusting the model’s internal settings.

So, instead of guessing and trying different prompts, DSPy does the heavy lifting for you, making your language model much better at performing the task. It’s like having a personal trainer for your language model.

Example

Here is an example of how you might achieve this in DSPy. Here we used an external document to show somewhat of a RAG use case. I prefer using Groq since you can get free api credits as well as fast interface.

import dspy
# Replace 'document.txt' with your actual file path
document = './document.txt'
key = "my-key"
lm = dspy.GROQ(api_key=key)

with open(document, 'r') as file:
    document = file.read()

dspy.configure(lm=lm)
summarize = dspy.ChainOfThought('document -> summary')
response = summarize(document=document)
print(response)

print(response.summary)

Using MultiChainComparison

MultiChainComparison is a tool for fine-tuning language models (LLMs) by providing them with example completions. These completions act as a training set, guiding the LLM to understand and generate responses aligned with the desired output format and style. By feeding the LLM with relevant completions, you can improve its accuracy and consistency in generating answers for specific tasks. Say for example we wanted to use JavaScript or css with LLMs. You could for example pass examples of code and ask an LLM to do something similar or tailor the solution towards your own code.

class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

# Example completions generated by a model for reference
completions = [
    dspy.Prediction(rationale="What is a key advantage of using non-blocking methods.", answer="higher throughput"),
    dspy.Prediction(rationale="Give example to illustrate blocking and non-blocking calls.", answer="File System"),
    dspy.Prediction(rationale="Provide an example of a non-blocking asynchronous operation.", 
                    answer="```python\nimport asyncio\n\nasync def main():\n    async with asyncio.open_connection('www.example.com', 80) as (reader, writer):\n        writer.write('GET / HTTP/1.0\n\n'.encode())\n        await writer.drain()\n        data = await reader.read(100)\n        print(f'Received {data!r}')\n\nasyncio.run(main())\n```")
]

# Pass signature to MultiChainComparison module
compare_answers = dspy.MultiChainComparison(BasicQA)

# Call the MultiChainComparison on the completions
question = 'Provide an example of a non-blocking asynchronous operation in Python using asyncio. Use an api request example'
final_pred = compare_answers(completions, question=question)

print(f"Question: {question}")
print(f"Final Predicted Answer (after comparison): {final_pred.answer}")
print(f"Final Rationale: {final_pred.rationale}")

Question: Provide an example of a non-blocking asynchronous operation in Python using asyncio. Use an api request example
Final Predicted Answer (after comparison): `asyncio.get(aiohttp.ClientSession()).get('api.example.com')`
Final Rationale: Answer: `asyncio.get(aiohttp.ClientSession()).get('api.example.com')`

(This is an example of using the aiohttp library to make a non-blocking asynchronous API request using Python's asyncio library.)

Using TypedChainOfThought

from dspy import InputField, OutputField, Signature
from dspy.functional import TypedChainOfThought
from pydantic import BaseModel

class CodeOutput(BaseModel):
    code: str
    api_reference: str

class CodeSignature(Signature):
    function_description: str = InputField()
    solution: CodeOutput = OutputField()

# Here's where the fix happens:
def format_output(output):
    return f"```json\n{output.json()}\n```"

# Tell cot_predictor to use format_output
cot_predictor = TypedChainOfThought(CodeSignature)

prediction = cot_predictor(function_description="Write a function that adds two numbers.")

# Now the output should be correctly formatted
print(prediction)

Prediction(
    reasoning='{\n  "code": "def add_numbers(a, b): return a + b",\n  "api_reference": "https://docs.python.org/3/library/functions.html#sum"\n}',
    solution=CodeOutput(code='def add_numbers(a, b): return a + b', api_reference='https://docs.python.org/3/library/functions.html#sum')
)

In the realm of software development, producing structured and predictable outputs is crucial for maintaining code quality and improving developer productivity. The code snippet provided demonstrates how to leverage the DSPy library to achieve this, particularly when working with Large Language Models (LLMs). By structuring the output in JSON format, this approach not only enhances the readability and usability of the generated code but also facilitates better integration with other tools and systems.

Effective prompting is a vital skill for maximizing the potential of LLMs. With tools like prompt generators and frameworks like DSPy, developers can craft clear and concise prompts that yield high-quality, relevant, and consistent outputs. As LLMs evolve into more sophisticated agents, mastering the art of prompting will be essential for fully capitalizing on their capabilities and enhancing developer productivity.