2021 is bringing us an acceleration of Artificial Intelligence (AI) evolution, which will undoubtedly change every single aspect of our lives in some way or another. Let’s just say, AI isn’t going anywhere, and hopefully, neither are we. Here are the most significant changes so far.
This AI is the largest language model that has ever been created; it generates human-like text on demand. OpenAI first described GPT-3 in a research paper that was published in May 2020, but the software is now being drip-fed to a few selected techy people that have requested access to a private beta version of it. The tool will probably be turned into a commercial product later on in 2021. So what is it exactly, and how does it work?
In short, it’s a very powerful language tool with the ability to churn out convincing streams of text when prompted with an opening sentence. What makes this different from past language generators is that this particular model has 175 billion parameters (which are the values that a neural network optimises during training).
The tool generates short stories, songs, press releases, technical manuals… you name it. As reported by the MIT Technology Review, Mario Klingemann, an artist who works with machine learning, shared a short story called The importance of being on Twitter, that was written in the style of Jerome K. Jerome, and started with: “It is a curious fact that the last remaining form of social life in which the people of London are still interested is Twitter. I was struck with this curious fact when I went on one of my periodical holidays to the sea-side, and found the whole place twittering like a starling-cage.” Klingemann says all he gave the AI was the title, the author’s name and the initial “It.” Pretty deep for a machine, wouldn’t you think?
Writing poetically isn’t the only thing that GPT-3 can do though, it can actually generate any kind of text, including code, which might be the most important thing to consider here. The tool can be tweaked so that it produces HTML rather than natural language, and web developer Sharif Shameem demonstrated that he could programme it to create web-page layouts by simply giving it prompts like ‘a button that looks like a watermelon’. This might have web developers a little unnerved.
That all being said, it is just a tool, and has still some fine tuning needed. It’s prone to spewing sexist and racist language, which is a rather large problemo if you ask me. GPT-3 mainly seems to be good at synthesising text found elsewhere on the internet, and lacks much common sense. However, a tool like this has enormous potential, and will be very useful when developed further.
Evidently, AI and robotics lack common sense and are trained on text input, but now, common sense is being flipped on its head. To hold GPT-3’s hand, a group of researchers from the University of North Carolina, Chapel Hill, have designed something that they call ‘vokenisation’, which gives language models like GPT-3 the ability to ‘see’.
Vokenisation, in AI lengo, is named as such because the words that are used to train language models such as the GPT-3 are known as ‘tokens’, so researchers decided to call the image associated with each token in their visual-language model a ‘voken’. Vokeniser is what they call the algorithm that finds vokens for each token, hence, vokenisation is what they call the process.
Combining language models with computer vision has been rapidly growing within AI research. With GPT-3, which is trained through unsupervised learning and requires no manual data labelling, and then image models, which learn directly from reality and don’t rely on the world of text can, for example, label a sheep as white by recognising that the sheep is white in real time.
However, the act of combining these two models is complicated—you can’t just mush the two AIs together in a robotic form, it needs to be built and trained from scratch with a visual-language data set. By compiling images with descriptive captions, an AI model may be able to then recognise objects and also see how they relate to each other, using verbs and prepositions.
In basic terms, the skills of AI senses are expanding by overlapping text and image. This will undoubtedly require an obscene amount of text input and data, however, this is the first step that a system has taken towards gaining ‘human-like’ intelligence, or more realistically, a flexible intelligence. It’s a pretty big deal.