The New Bing With ChatGPT: What Does a UX Expert Think?

16 min readMar 10, 2023

Artist: Jeff Axup, “User With Bing Questions”, 2023, Medium: DALL-E on pixels.

Summary

A UX review was conducted of the new Bing “chat” service based on ChatGPT. Significant changes were made to ChatGPT before it was integrated into the Bing search engine, and not all of them were improvements.
Out of 16 attributes reviewed, Bing only received a ✅(good) rating for 4 of them.
ChatGPT was also recently reviewed using the same methodology, and some comparisons are made below — it fared quite a bit better.
Many of the changes made to ChatGPT while introducing it it into Bing make it more difficult to start using the tool, and remove useful features such as recording multiple conversations. It has also seemingly been hurriedly wedged into the existing search interface, leaving some navigation problems. More concerning is that it currently suggests the next step in the conversation, which is often wildly erratic and doesn’t get the user closer to the desired answer. It also has issues of deleting conversations after submitting feedback, and not having a memory of prior conversations.
Suggestions for how chat-bots should change their design and interaction style are provided at the end.

Methodology: For an overview of how I perform UX reviews and the types of traits it is advisable to look for, see my previous article: What does a UX expert think of the design of ChatGPT?

Note: Microsoft says that Bing is an “early preview”, which makes it unclear whether it is in its “final form” currently. ChatGPT is in a similar “research prototype” phase. For the purposes of both reviews I have presumed that the prototypes show their intended design direction, and can be evaluated as real products in their current form.

Personas

👤 General search users
👩‍💻 Software developers
🤓 Startup folks
👨‍🎓 Younger demographic that plays with new apps/tech (aka students)
👨‍🎤 Edgy creative types (aka artists)
👨‍💼 News media / investors

The above personas are not an exhaustive list, but they do give an idea of some of the demographics that should be targeted, and who should be able to get immediate value out of the product and not experience problems.

Use Cases

I tested the product using the top five most common and important use cases I could think up, as well as some common heuristic evaluation methods.

UC-1: Access the product and get fully onboarded.

The initial invite calls the product an “early preview” and makes no mention of any requirements to start using it. They do request feedback which is good.

The exploration leads to an overview page, with example links going to a Bing search page — so it’s clear they intend to have it be a side-by-side tool providing context and advanced support for typical search queries. This opened just fine on my Chrome browser on a Mac laptop.

Bing purposefully designed it to make it look like you can chat, however when I click the button to start you get a message, which confusingly tells me that I need access (I already have it) and that I need to download Microsoft’s Edge browser to use it. So they have put hoops for the users to jump through on purpose.

After downloading the Edge browser, installing it, getting logged in, and wading through a bunch of onboarding advertisements, you get another invite to explore Bing if you do a web search first.

Clicking on ‘Explore’ gets us to a new page that looks startling similar to ChatGPT, and is positioned as a new ‘chat’ tab next to search in the Bing interface. So unlike the initial teaser UI which was “search AND chat”, this is “search OR chat”.

Upon further poking around in the IA (information architecture / navigation), it becomes clear they are still trying to figure out how to integrate the new feature into the main search engine. This is a basic usability problem that shouldn’t exist in a demo being used by millions of users.

UC-2: Understand what the product is for and how to begin using it.

Microsoft changed the home page for chat from what ChatGPT originally had. Originally it had 9 tables cells, with the right 6 devoted to explaining capabilities and limitations. Bing removed those and placed a generic message below saying that it may make mistakes.

It does give some ideas about how to start communicating with it and the examples are clickable.

UC-3: Use it for a personal task and achieve a goal.

I will use the same example queries used in my previous review of ChatGPT for consistency.

Bing has made some changes to how it communicates with the user in comparison with ChatGPT:
• It says it is “Searching for” things. So it is positioning itself as a search engine, whereas ChatGPT is more like a person chatting with you.
• ChatGPT just starts progressively typing out an answer, which is similar to how a real person would respond. Bing gives you a “generating” prompt, which feels much more artificial. Eventually it does type out the answer, but it feels slower than ChatGPT due to the initial “searching” and “generating feedback” messages before you see a response starting.
• Bing is very up-front about where it is getting its information from, and it’s somewhat alarming how it often only uses a handful of sources to generate an answer. In this case (see below) it is heavily leaning on bankrate.com. It is admirable that it tries to site its sources, but it’s also concerning that it isn’t doing a more comprehensive review before arriving at an answer.
• Bing often tries to direct you to web-sites as your next step. It also often provides short-cut responses to its answers. This does simplify interaction, but it also misses the point of encouraging a self-guided and personalized conversation to better understand topics. So this product is far more search-results and web-page destination focused.
• Bing tries to guide your conversational flow. It typically asks a question at the end of each answer, which ChatGPT did not. (See long-term use case section below for more on this.) The questions it asks frequently guide the conversation into odd paths, and sometimes are completely off topic.

Continuing on with the same evaluation tasks as in my ChatGPT review, I then tried to compare two of its answers from the previous response:

Unlike ChatGPT, it got confused by this and responded about the wrong items, which is odd because they are supposedly using the same back-end.

So I tried again with a clarification of what I had meant:

First time using ChatGPT, and I’ve been rejected! I guess it can’t handle a user telling it that it is incorrect. I submitted feedback about this problem and it wiped my chat history as thanks for reporting the problem.

UC-4: Use the product long-term for repeated use.

Bing departs from ChatGPT by providing only one chat session at a time, and not trying to create a historical record of other conversations to return to later (unlike ChatGPT).

I first tried exploring giving some feedback, but to my surprise, when I clicked ‘Cancel’ it refreshed the entire page and my previous conversation was lost. Also, even though I am using their own browser that they required me to install, it is still buggy. Clicking on the ‘Feedback’ button repeatedly broke the button and it stopped producing the popup below.

The conversations in Bing are guided to some degree. When it finishes a response it typically asks a new question. This may mimic normal human interaction better than simply giving an answer. However, it is also frequently the annoying part of normal human interactions. For example, with a sales-person: you ask a question, get an answer, and then they are trying to direct you towards one product or another, when all you want to do is think about it and ask your own follow-up questions. Also, some of the “quick-responses” it gives are oddly specific and wouldn’t be typical answers. “I have two cats and hardwood floors”? Really?

In other examples I found that Bing was really trying to lead the conversation in specific directions that weren’t really relevant, such as asking my opinion: “What is your perspective on nuclear deterrence policy? And what are some alternatives or complements to it?” A teacher with a specific agenda for a lesson might do this, but the chat-bot should probably let me find my own agenda. Talking with Bing is like trying to talk to a some condescending professor who thinks you need to demonstrate your understanding for the upcoming test.
After leaving Bing for a few days, and going back to the window and typing in a question, it simply hung, with no response. Perhaps understandable, but it’s clear that the design team isn’t focusing on long-term use, ongoing relationships, or responsiveness.

Bing gives a ‘stop responding’ button, but produces no response, if a question is asked in a chat window that has been open for a few days. Any major search engine would not do this.

UC-4: As a developer, use the Bing search API to create other products.

I am not going to expressly evaluate this use case, as I am not a developer. It does appear to be available as part of the Bing Search API and it is going up in price. ChatGPT offers a similar solution, which is apparently fairly affordable.

Results Summary

Ratings Key: | ❌ = Bad | 🤔 = OK | ✅ = Good |

❌ UC-1: Access the product and get fully onboarded.
Result: Microsoft had an opportunity to simply add the chat interface to the side of their typical Bing search results. Instead they have a non-interactive (no chat) version of it, which just leads into a sales-funnel for their proprietary browser “Edge”. They also chose to make it a requirement to install and use Edge, in order to use Bing. This purposefully puts a usability barrier in the way of starting to use the new product. It is also putting business goals before user goals. This is exactly what I would expect from Microsoft, and not expect from more user-centric companies such as Amazon.
The number of steps needed to simply get to the “home page” for the product is very high, and it forces the user to do a task they don’t necessarily want to do (install a new browser). Compare this to the relative ease of access of ChatGPT.
Basic navigational questions such as how to integrate chat into the standard IA, and whether to put chat in the side-bar next to search or on its own page have not been determined yet and provide a clunky experience.
✅ UC-2: Understand what the product is for and how to begin using it.
Result: The was about on-par with the original ChatGPT. It offers examples, and perhaps reduces some of the details on how to interact with it effectively, but it’s easy enough to get started with a conversation.
❌ UC-3: Use it for a personal task and achieve a goal.
Result: I expected this task to pass with flying colors after trying the same query with ChatGPT. To my surprise, it misunderstood one of my follow-up questions, and then refused to continue the conversation after I disagreed with it. Not only is this both rude (rejecting a user asking a valid well-intentioned question), but also the opposite of what a patient teacher with limitless time on their hands would do. This bot doesn’t have good manners and it will cut you off at a moment’s notice.
❌ UC-4: Use the product long-term for repeated use.
Result: Bing seems to view a chat as a one-time event similar to a search result. The design doesn’t support the use case of “having different conversations on different topics and going back to continue those threads later.” Add to this that it has bugs and interaction problems around easily losing your entire conversation or swiping back to the search tab by accident (2 fingers up or down) while you’re reading a response.
Bing feels like interacting with a search engine, while ChatGPT feels like talking with a butler or concierge. While there are dangers to unwarranted personification, I think I prefer a “polite human” to a “cold machine” in terms of chat-bot interactions (see Discussion section below).
🤔 UC-5: As a Developer, use ChatGPT APIs to interactively do a task for an external user.
Result: I can’t comment on the usability of their APIs and services, but it is available as part of the overall Bing Search API.
🤔 Personas
Result: It isn’t clear to me that the needs of many of the primary personas are being met with this product. Students might be able to use it for some thing, but it can’t necessarily be trusted on its own, it doesn’t actually act like a good teacher, and you can’t go back to previous discussions to dig deeper. Programmers and startup folks may find it useful to leverage at the API-level for other applications and products. Artists probably love it due to wild set of answers and erratic conversation flow. The media would find it extremely easy to write critical articles based on the quality of responses and lack of polite demeanor of the bot.
❌ Use Cases
Result: 3 out of 5 use cases were pretty much failures from a design perspective, although I was able to complete them in a substandard way.
❌ Happy Path
Result: It is pretty clear that the happy path was not on anyone’s mind at Microsoft. They purposefully put barriers in the way of using the product and used it as a way to drive adoption of a little-used browser which will produce an artificial and short-term uptick in users of that product. Eventually Bing may be placed on the side of standard search queries and be accessible to anyone on any browser doing a search, but that’s not how they choose to launch it.
✅ Mental Model
Result: The mental model is fairly simple and thus it doesn’t run into many problems. You ask a question, you get an answer, along with homework assignments and suggestions for where to take the conversation next. It is ephemeral and it may opt out of talking with you about particular topics. A little exploration by the user rapidly makes this clear.
🤔 Navigation
Result: Getting into the product to evaluate it is not simple, as described above. Once you get into it, there isn’t much to do other than talk, and they removed the ability to list multiple conversations. Retaining this would have complicated the navigation, but made the product much more useful and more similar to an actual private tutor. The product has a bad habit of “hanging” and not processing new requests immediately if you leave it open for a day.
🤔 Task Completeness
Result: It is possible that a user might get all the answers they need from an interaction, but it is equally likely that the bot won’t understand some questions, won’t reply in the way intended, and may refuse to continue talking with you. So it might get a pass, but it certainly wouldn’t get an award for “best personal tutor”.
✅ Business Goals
Result: It appears that Microsoft wanted to show that they owned part of the best technology around and were willing to sacrifice additional users in order to drive adoption of a new browser. Business goals shouldn’t really come at the expense of a good initial user-experience and usability of a product, instead they should be balanced in order to meet all goals equally.
❌ Emotional Response
Result: Personally I am more likely to go back to ChatGPT than Bing. Bing’s search page is cluttered with advertising and tabloid-style news, and then the chat interface is also more busy and more mechanical. We all had teachers we liked and teachers we didn’t, and Bing needs to research a bit more about the latter. Teachers shouldn’t refuse to answer questions when they are challenged, and they should have a memory of what I’ve discussed with them in the past.
🤔 Feedback
Result: ChatGPT supports a thumbs-up or down on every answer, while Bing doesn’t offer this. There is a feedback button in the corner, which isn’t too bad and does support including a screenshot. It is however buggy, and doesn’t always pop up, and it disposes of the entire current conversation history when you cancel or submit, which is really not appropriate.
🤔 Perceived Value
Result: See the use cases section above. It is possible that some answers will be valid and users will come away with questions answered and have new ideas. However, it seems likely that users would get more value if it supported multiple simultaneous conversations, more accurately understood questions, and did less of trying to artificially guide the conversation.
✅ Growth Plan
Result: As shown above, it appears the plan is to add Bing Chat to the side of the standard search window. If that brings value to users then it should immediately result in millions of daily users.

Discussion / Ideas

Embrace Personification: Personas and Roles of Chat-bots
It is common to identify target personas to design an interface or product for. It is not common to identify a target persona for the interface. At the risk of overly anthropomorphizing our technology, I think it actually might be helpful to the user because we are already calling it a “chat” and “conversations” and formatting it like texting on our phones. There is no going back at this point — embrace that this is a conversation with a human, and that it needs to conform to societal customs for polite behavior. Embrace that the user will automatically form a mental model of the role of the entity they are communicating with, be that a teacher, tutor, advisor, butler, secretary, concierge, therapist, or friend. The moment a user starts an ongoing series of questions, they will perceive it as a conversation, notice the story that it tells, and start judging their conversational partner. This is a feature, not a bug — run with it.
Currently Bing has a “preview feature” for “conversational style” (see below). While this might be technically feasible, the question becomes whether the user really wants that or understands it, and whether a real human would act in this way. Would you expect the personal tutor assigned by your parent to ask you if they should be “more creative” or “more balanced”?
Instead, perhaps the user should choose the persona they want the chat-bot to embody instead, and then stick to that — like a voice you choose for Waze guidance. Most students should interact with the persona of a helpful and patient personal tutor during school hours. In their off-time, perhaps they might choose a persona with informal hiphop slang for entertainment value, but it wouldn’t assist their learning goals as well. Business professionals might choose a persona of an executive assistant. More importantly, the chat-bot actually needs to stick to these roles and act the part. A professional tutor doesn’t just take offense at being challenged and then refuse to discuss the topic further — that would be unprofessional.
ChatBot designers should look up a “miss-manners” book on polite human behavior (or a Dale Carnegie class) and in most cases have their bots behave that way. It’s OK if the bot introduces themselves as “an artificial-human”, or a “simulation”, or a “digital assistant”, but they should still strive to be as human as they can be. In summary: we don’t want them to impersonate a real human, but we still want them to be human.

Imagine if a tutor asked you if they should be more “balanced”.

The Role of Memory and Relationship-Building
Human history (and pre-history) is based around stories. Those stories relate to human interactions, communications, and events. Being able to remember people’s faces, names, life events, goals, preferences and other nuances is an important part of EQ and successful social interactions.
Currently Bing Chat (and to a lesser extent ChatGPT) doesn’t have much of a memory of what came before. It loses track of what was last discussed after a day of being away. It gives the wrong suggested responses based on new questions (see below). ChatGPT was able to compare and contrast things from the previous discussion, but Bing failed this when I tested it.
If Bing Chat wants to be taken seriously as an advisor, it needs to know who you are. It needs a complete, accurate history of everything you’ve talked about and every preference you’ve stated. It needs to pick up where you last left off. It needs to learn from you and slowly evolve into the perfect assistant for you over time. It needs to gain your trust and maintain its reputation. It needs to become more accurate, insightful, and personalized over time. Many of these things are probably currently feasible and are actually much less complex than making a coherent argument about nuclear deterrence policies.

Bing doesn’t remember what it was talking about. After leaving it for a day it got very confused. It couldn’t process new questions (e.g. orange food), then it provided suggested responses that related to the prior topic instead (e.g. abolished immediately), then it claimed it didn’t know what I was referring to, then it suggested “animal testing” as a further topic, which is completely random. If a human did this to me I would ask to talk to a manager, or think they were mentally unstable.

Conclusion

This article has combined a UX expert and heuristic review of Bing (often contrasted with ChatGPT), with a broader discussion of how the product interacts with users, and what it might be able to do in the future.

Community Questions

Do you think Bing Chat offers a satisfactory chat experience in comparison with a human tutor?
Do you prefer the ChatGPT prototype or the Bing prototype and why?

My opinions are my own and not related to any current or past employers. You should make your own design and investing decisions. I hope you find my ideas thought-provoking.