Comparing How AI Bots Code: GPT-4, Bing, Claude+, Co-Pilot and Bard.

As technology advances, more and more tasks are being automated. One area that is rapidly evolving is computer programming. Recently, several AI bots have been developed that can write code, freeing up programmers to work on other tasks. In this article, we will compare four of the most advanced AI bots: GPT-4, Bing, Claude+,Bard, and GitHub Co-Pilot. We will examine how they work, their strengths and weaknesses, and how they compare to each other.

Testing the AI Bots for Coding

Before we dive into comparing these four AI bots, it’s essential to understand what an AI bot for coding is and how it works. An AI bot for coding is an artificial intelligence program that can automatically generate code for a specific task. These bots use natural language processing and machine learning algorithms to analyze human-written code and generate new code based on that analysis.

To start off we are going to test the AI on a hard Leetcode question, after all, we want to be able to solve complex coding problems. We also wanted to test it on a less well known question. For our experiment we will be testing Leetcode 214. Shortest Palindrome.

GPT-4: The Latest and Greatest

GPT-4 is the latest AI bot for coding, developed by OpenAI, which claims to be more powerful than any other AI bot for coding. It uses a transformer-based language model making it the most extensive AI bot for coding to date. GPT-4 can generate high-quality code for a wide range of programming languages and applications.

Here is GPT-4’s answer in Python:

 def shortestPalindrome(self, s: str) -> str:
        if not s:
            return s
        
        # Find the longest palindrome prefix
        def kmp_table(s: str) -> List[int]:
            table = [0] * len(s)
            i, j = 1, 0
            while i < len(s):
                if s[i] == s[j]:
                    table[i] = j + 1
                    i += 1
                    j += 1
                elif j > 0:
                    j = table[j - 1]
                else:
                    i += 1
            return table
        
        # Combine the original string and its reverse with a special character '#'
        s_extended = s + '#' + s[::-1]
        table = kmp_table(s_extended)
        
        # The last element of the table is the longest palindrome prefix
        longest_palindrome_prefix = table[-1]
        
        # Add characters to the beginning to form a palindrome
        return s[longest_palindrome_prefix:][::-1] + s

The submission passes all test. It beat 47% of submissions on runtime and 8% on memory.

Strengths of GPT-4

GPT-4 is a highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quick. Overall it got the answer right and passed the test.

Bing: The Search Engine That Can Code

Bing is a search engine developed by Microsoft, earlier this year they integrated GPT as a new way to search. Bing uses deep neural networks and natural language processing to analyze human-written code and generate new code based on that analysis. Bing is not designed to generate complete programs but rather small code snippets, as of now you only get 20 follow up questions per chat.

Here is Bing’s answer in Python:

def shortestPalindrome(self, s: str) -> str:
        if not s:
            return ""
        rev_s = s[::-1]
        for i in range(len(s), -1, -1):
            if s[:i] == rev_s[len(s)-i:]:
                return rev_s[:len(s)-i] + s

The submission passed all the test. It beat 47% of submissions on runtime and 37% on memory. This code looks a lot more simpler than GPT-4. It beat GPT-4 on memory and it used less code! Bing seems to have to most efficient code so far, however it gave a very short explanation of how it solved it. Nonetheless, best so far.

Claude+: The Anthropic AI Bot

Claude+ is an AI bot developed by Anthropic. Released in March 2023 a new version of Claude was introduced. Unlike other AI chatbots, Claude aims to offer a safer AI with less toxic, biased, and hallucinatory responses to make the rapid developments in AI positively affect the world. This AI seems to be less talked about overall, so let’s see if it’s a hidden gem.

Here is Claude+ code in Python:

def shortestPalindrome(self, s: str) -> str:
        n = len(s)
        # Find the longest suffix that is a palindrome
        suffix = ''
        for i in range(n-1, -1, -1):
            if s[-i:] == s[-i:][::-1]:
                suffix = s[-i:]
                break
                
        # Find the longest prefix that is a palindrome
        prefix = ''
        for i in range(n):
            if s[:i+1] == s[:i+1][::-1]:
                prefix = s[:i+1]
                break  
                
        # The palindrome is prefix + suffix + reverse of rest of the string
        return prefix + suffix + s[:n-len(prefix)-len(suffix)][::-1] + s[n-len(prefix)-len(suffix):]

The code does not pass the submission test. Only 1/121 test passed. Ouch. This one seemed promising, but looks like Claude is not that well suited for programming.

Bard: The AI Bot from Google

Now we move on to Bard. Like many of the others, Bard was recently released. Although its been mocked and laughed at let’s see how it fairs up with it coding. After all Google is one of the leading tech companies in AI.

Here is the Bard’s code for solving the problem:

def shortestPalindrome(self, s):
        lps = [0] * len(s)
        for i in range(1, len(s)):
            j = lps[i - 1]
            while j > 0 and s[i] != s[j]:
                j = lps[j - 1]
            if s[i] == s[j]:
                j += 1
            lps[i] = j
        
        palindrome = ""
        i = len(s) - 1
        
        while i >= 0:
            if lps[i] == 0:
                palindrome = s[i] + palindrome
            else:
                palindrome = s[i:lps[i]] + palindrome
            i = lps[i] - 1
        return palindrome

So to start off I had to manually insert the “self” arg in the function, since Bard didn’t include it. From the result of the test, Bard’s code did not pass the submission test. Passing only 2/121 test cases. An unfortunate result, but it’s safe to say for now Bard isn’t much of a coding expert.

GitHub Co-Pilot

Although not technically similar to the rest, we thought we’d include co-pilot on this list to see how it stacks up to the rest. Co-pilot is the most popular coding assistant as of now and is now being used by over a million developers.

I will start by creating a comment on VSCode and asking it to solve the same leetcode question.

Here is the code Co-Pilot returned:

 def shortestPalindrome(self, s: str) -> str:
        if not s:
            return s
        n = len(s)
        for i in range(n-1, -1, -1):
            if s[:i+1] == s[:i+1][::-1]:
                return s[i+1:][::-1] + s
        return s

This passes all the test. It scored better than 30% of submissions on runtime and 37% on memory.

Conclusion

All in all it seems that Bing has the underhand here all together. Ironically Bing and Co-Pilot are powered by GPT under the hood. I gave the AI’s all the same prompt, “Solve Leetcode 214. Shortest Palindrome”. Of course I could have asked it follow up questions, but I wanted to get the first response only. It is also unclear if any of these models have been per-trained on leetcode data. I only tested these since as of April 2023 they seem to be the most effective at programming. There are some open source models such as Alpaca, Llama, Vicuna, GPT-J, but so far none of them seem to come close to the closed source models. What are your thoughts? Which have you had best experience with programming and what have you found effect when it comes to prompting?