AI-UX: Designing Coding Collaborations

Jeff Axup, Ph.D.
NYC Design
Published in
12 min readAug 30, 2023

--

Artist: Jeff Axup, “Thinking Together”, 2023, Medium: Midjourney on pixels.

Summary: AI is on the cusp of enabling anyone to be a programmer and produce their own tools and products. There are several UX approaches for how to design these new programming tools, many of which remain currently unexplored or underutilized.

Programming is entering a renaissance where non-technical people are now more capable of programming projects, and also less intimidated to get started. Simultaneously, software engineers are able to code faster and more efficiently than before. However, the AI programming tools we have now are often a bi-product of having a generalized language model capable of coding, instead of a special-purpose programming tool. Also, legacy IDEs (Integrated Development Environment: a program used to write and manage code) set the bar fairly low by just making auto-complete systems marginally smarter. The future is a more comprehensive integration of the AI with the entire IDE, and permitting the user to interact at varying levels of abstraction, from lines of code to an entire application.

There are three approaches that computers could take to help you code.

1. In a chat window, give me code blocks to paste into my IDE.

  • This is what ChatGPT4 does. You tell it what language you are using, and tell it what you are trying to accomplish, and then it generates a block of code to paste into the IDE.
  • This approach presents problems because ChatGPT doesn’t have access to your overall program and programming environment. It could suggest things that don’t work due to the context it is missing, which happens frequently in practice. Also, it doesn’t know the current state of your code. For example, if it provides a few different possible code changes, it doesn’t know which one you actually implemented, and whether you did it correctly. It also has no way to test if the changes you made actually had the desired result in your actual application and server setup.
  • Last but not least, it means the user is constantly copying and pasting code blocks back and forth between a browser and the IDE. Sometimes this results in errors, particularly when it is sections of code that need to be positioned correctly, and not the entire file at once.

2. In the IDE, give me suggestions as I write (code-completion tips).

  • This is what the standard Co-Pilot for Visual Studio Code (VSC) does currently.
  • Suggestions are provided with a limited scope, such as offering to write code comments, or suggesting short blocks of code to write.
  • This works well for experienced programmers who already know what to code and what to ask for, but just don’t want to type it out.
  • It does not work well for novice programmers, or users who want to ask for larger goals (think: “Write an entire web application to show recent stock prices for the SP500 in a table”, instead of simply “Write a function to connect to the stock API service.”
  • The current version of Copilot is suffering from an outdated mental-model. IDEs are accustomed to auto-completing small things, like a variable name that is half-written for example. Instead, it should be thinking of completing the entire application.

3. In the IDE, create and change the entire application based on the goals I have for it.

  • Some steps in this direction are being taken by Microsoft in their newly released beta of the GitHub Copilot Chat plugin for VSC.

This does far more than suggest code. GitHub Copilot chat is not just a chat window. It understands what code a developer has typed, what error messages are shown, and it’s deeply integrated into the IDE. A developer can get in-depth analysis and explanations of what code blocks are intended to do, generate unit tests, and even get proposed fixes to bugs. — GitHub Copilot Chat Waitlist

  • It has a chat box next to the code in the IDE, where the user can ask questions separately from where they are actually writing code, which has some definite benefits.
  • There is a “move proposed new code from chat to code window” button, which is exactly what is needed. Apparently it is automatically aware of error messages, and also presumably the current state of the code.
  • A scenario showing Copilot Chat being used to revise a block of code is provided in this Youtube video. Further details are provided in their documentation.
  • Fresh on the heels of Copilot Chat was another announcement from Meta, of the the availability of an open-source version of Code Llama — an LLM tuned to work on code. Presumably this will drive a new wave of open-source IDE plugins similar to Copilot.
  • So it seems that all three methods above are converging on a unified solution inside of the IDE via plugins. However, there may be an opportunity for a “high-level application development interface” where the user doesn’t have to see code at all. The user would operate at a higher level of abstraction, simply describing the product behavior they want, and iterating on the results.

Potential Design Improvements

ChatGPT4 wants me to manually replace one block of code with another. This requires locating the target block in a long file and copying in the new code without making any mistakes.

Replace the new code for me

  • The AI needs to be aware of the current state of my code, and be able to change it for me.
  • Currently ChatGPT has a mental model of what the current state of your code is, but it’s not integrated with your IDE, so it doesn’t really know. Often the user chooses not to do what the AI suggests, and the AI has no idea that the intended change was never made, which can lead to errors.
  • Manual labor by the user (such as the example above) should be replaced with automatic code insertion and replacement. That said, the user should still be in the loop, able to review or approve changes, reverse actions taken, and request further modifications. It is likely that being actively involved in reviewing changes will result in the user actually learning how to code along the way, which is an added benefit.
  • This has already been solved by GitHub Copilot Chat and it brings in to question how “general purpose” chat services will be able to compete with custom-built AI services inside of apps designed to support specific use cases. Arguably Copilot is using ChatGPT4 under the hood, so perhaps this is the likely model moving forward. Copilot Chat has a “insert into my code” button which automates this process while retaining user control.
ChatGPT4 is surprised by a server error resulting from its code. It suggests a further change to resolve it. Sometimes these “proposed fixes” can go on for a long time with no solution and become increasingly unlikely to be a solution over time.

Test the new changes and make sure they work before showing it to me

  • The AI needs to be able to run code, not just write it. ChatGPT 4 is capable of running some code by itself to test it. However, this mostly applies to simple local code, and it can’t test on your actual server or using advanced libraries yet.
  • Nothing should be shown to the user until has gone through a reasonable testing process, to see if it is valid, runs, and meets the goals of the user. Human time is valuable and it shouldn’t be wasted with poor quality suggestions that don’t work. To do this, the AI needs access to your testing environment and be able to analyze results it receives from running code on its own.
  • It also needs to be able to “see” the results of running code (e.g. using vision: did the text appear pink or did the extra border actually disappear). Currently if it tests the code it can only see the results programmatically. Tools such as Selenium can already run Chrome in “headless” mode and export images of the actual web page rendered which enables real visual verification with computer vision models.
  • It is not clear how much Copilot Chat supports this currently. It will copy the code over to your active file and let you run it. Apparently it is aware of any resulting errors automatically. It probably does not yet support automatically testing several different solution paths before letting you select one, and it doesn’t seem to be able to run the code by itself or do much in a multi-modal fashion.
ChatGPT want to programmatically capture a screenshot of a web page for testing, but it is unable to either check the directory to see if it was created, nor view the image and analyze the results itself, so it is asking me to.

Be willing to do it all for me, but give me the option to review, approve, and learn

  • The AI should start by assuming it needs to do the entire project end-to-end. Meaning that it writes the entire application, web site, or feature. It should move ahead with that presumption and then let the user review the outcome and tweak it as needed.
  • Everything should be reversible. Any steps that the AI chose to take may need to be removed if they don’t end up being a good solution path.
  • While the user may want everything done for them, they may want it explained and validated along the way. The user may want to learn how to do the actions themselves, or they may want to understand it enough to look for logical problems or alternate solution paths that are better. Consequently, transparency and demonstrating the logical flow will be a necessary default behavior.
  • As discussed, ChatGPT4 (chat) doesn’t have any context for this and can only do piecemeal suggestions and guess at the current state of your code. Copilot Chat is much better positioned to support this. You can have iterative discussions about what code you want to move ahead with with and start testing. It isn’t clear that it can create new files or write an entire application for you yet. The focus still seems to be on “blocks of code” as opposed to “change the application to have this new feature that does this”, but this may be changing.

Make it easy to iterate with you as my goals change and we learn what is possible or desired

  • Many AI systems still make it hard to iterate over time. Some don’t have an ongoing interaction model at all. Some don’t store conversations and allow them to be re-started. Some don’t let you engage with outputs and modify them slightly to improve them. Some forget the things you originally told it to do.
  • Humans will change their goals over time, and that is a good thing. Once we achieve something, we immediately get bored by it and set our goals for something higher. Sometimes we find a goal is more difficult than expected, and decide to aim for something more immediately achievable. AIs should embrace this and provide interfaces to refine, make gradual progress, and change course when needed.
  • Github Copilot Chat has a memory of prior development and seems well-positioned to help with this. However, it seems like it may be very focused on an individual file or sections of code, rather than on high-level project goals, developing a project across multiple files, or working at the “features of this application” type of level.

Scenario: The Future Of Programming

Artist: Jeff Axup, “Coding Gedankenexperiment”, 2023, Medium: Midjourney on pixels.

Sometimes we have to tell a story about the future, and then work backwards from that goal, to the tools we need to achieve it. The following is such a thought experiment:

  • Me: I log in to my IDE, which connects to my remote server, and shows my remote files. There is a thin chat box stretching across the top of the IDE. It accepts spoken, textual, or visual input:
  • Me: Speaking: “I want to start a new application in Python to track SP500 stocks and show recent prices in a table. I drew a quick sketch of the layout I want. I’ll snap a pic and share it via your mobile app.”
  • AI: Verbally and with subtitles in chat window: “Certainly, I have created a new file called stocks.py and added it to your usual development directory. Let me know if you’d like to change the name. Here is a quick pic of what the result looks like.”
  • Me: I see the new file pop up in the IDE, as well as a window on the side showing an image of what the new front page looks like after loading. It already has a lot of code in the file, including connecting to an external stocks API, generating a table, and generating HTML.
  • Me: I click a ‘Review’ button and it launches the app and loads it in a separate browser window. I review the current design and click around some of the interactive elements to test it.
  • Me: Speaking: “You used Alpha Vantage for the stocks data, but I’d prefer if you used Yahoo instead. Also, the page is loading slowly, so please test the run time, and try to optimize any calculations and replace any use of external files with something faster. Also, there needs to be a login system, so create a username and password for me, and let me know how to log in.”
  • AI: “Sure, generating that now. Your new username is ‘jeff’ and your password is ‘password123’. Most of the slowness was coming from API calls, so I have offloaded that to a separate file that is scheduled to run once a day in the morning instead of each time you load the front page.”
  • Me: See the new code appear in the main file, with new changes highlighted in a different color. I can see a history of my previous chat requests as well. I click ‘Review’ again and try running the code and logging in. I notice the page is even slower to load.
  • Me: Typing: “Good job on the log in system, however the page is slower now. Please revert back to the previous code you made regarding the speed optimization changes, but leave everything else.
  • AI: “At once Sir.”
  • Me: See all of the code updated below the chat bar. The code that was removed is still shown, but it is in light gray, and it is crossed-out to show the action that was taken for potential review later. This “recent-change-feedback” will disappear after my next request, but it will remain in the logs for potential re-use in the future if needed.
  • Me: Typing: “Please add a pie chart animation showing the monthly change in different industry sectors over the last 2 years, and place it in the bottom right corner. Each pie segment should have a label with a line pointing to the pie segment.
  • AI: “‘Of course. There are several different open-source charting packages available. I tried implementing it using three different ones. I couldn’t get the animation to work in one of them, and one of them couldn’t do labels with lines. The last attempt using Matplotlib was successful and you can view it now. Let me know if you also want to see the other failed experiments.

The above scenario is near-future sci-fi, meaning that there are no products to my knowledge that do it yet. It dramatically raises the bar on how much automation the IDE does, the multi-modal nature of input and output, and also the AI making assumptions and doing some tasks automatically. The user has the option to revert or revise, after the system has progressed in a logical direction on its own. Perhaps more importantly the user isn’t really writing code, but is instead directing the AI on what to write code for. The AI also purposefully has a bias towards making progress as soon as it can, in order to move the task along more rapidly. It can actually try and test different hypotheses to find the best solution, and it can actually “see” the results of the code visually, not just test to see if it runs without errors.

Conclusion

Before ChatGPT4 or Copilot Chat, a novice programmer probably would have taken several weeks to find and implement a solution to the above goal. Possibly they would have given up due to “insurmountable” complexity and problems encountered along the way.

Currently with ChatGPT4, the above task probably takes half a day. Not all of it is done for the user, and errors will likely be encountered along the way, which result in extra troubleshooting steps with the AI.

In the future scenario painted above, the task probably takes 10 minutes. So we are seeing an acceleration, or exponential growth curve, in the decreasing amount of time it takes to complete a project. This has a lot of potential for humanity, and it would largely eliminate the need to be a programmer, so that the user can focus on their larger goals, such as building a company instead.

My opinions are my own and not related to any current or past employers. You should make your own life, design and investing decisions. I hope you find my ideas thought-provoking.

--

--

Jeff Axup, Ph.D.
NYC Design

UX, AI, Investing, Quant, Travel. 20+ years of UX design experience.