Imagine a world where your virtual assistant not only understands your typed queries but can also see images you show it, hear your voice, and respond in kind. OpenAI is turning this vision into reality with the introduction of new voice and image capabilities in ChatGPT, offering users a more intuitive and interactive way to communicate with the AI.
The What
OpenAI is rolling out voice and image functionalities to ChatGPT, allowing users to engage in voice conversations and show images to the AI for a more enriched interaction. Users can snap pictures of landmarks, their fridge contents, or a math problem and have a live conversation with ChatGPT about it.
These features are being introduced to Plus and Enterprise users initially, with voice available on iOS and Android and images accessible on all platforms.
The voice capability is powered by a new text-to-speech model, capable of generating human-like audio, and utilizes Whisper, OpenAI’s speech recognition system, to transcribe spoken words into text.
The image understanding feature is fueled by multimodal GPT-3.5 and GPT-4, applying language reasoning skills to a wide range of images, from photographs to screenshots and documents containing both text and images.
The Why
The integration of voice and image capabilities in ChatGPT is a step towards building AGI (Artificial General Intelligence) that is safe, beneficial, and gradually available.
OpenAI aims to address the challenges and risks associated with voice and vision technology, such as impersonation and privacy concerns, by deploying these features responsibly and refining them based on real-world usage and feedback.
OpenAI has collaborated with Be My Eyes, an app for blind and low-vision people, to understand the uses and limitations of vision technology. The goal is to make vision both useful and safe, respecting individuals’ privacy, and assisting users in their daily life by enabling the AI to see what the user sees.
Read About: ChatGPT is now available on Android
What OpenAI says
OpenAI stated, “We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.”
The company emphasized the importance of gradual deployment and refining risk mitigations over time, saying, “We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future.”
Addressing the challenges and safety measures, OpenAI mentioned, “We’ve taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.”
It also highlighted the collaboration with Be My Eyes and the commitment to transparency about the model’s limitations.
Conclusion
The advent of voice and image capabilities in ChatGPT marks a significant milestone in the journey towards more interactive and intuitive AI. OpenAI’s careful and considered approach to deploying these features demonstrates a commitment to user safety and privacy.
As we anticipate broader access to these functionalities, the future of AI communication seems boundless, bringing us closer to a world where our virtual assistants can truly see, hear, and speak.
Read About: 66% of ChatGPT users said they use it for this