What We Talk About When We Talk About Robots
If you listen to our podcast and find us going on about something you’ve never heard of or don’t quite understand, here’s a basic glossary of terms and tools (in no particular order) that we use a lot. We welcome suggestions on terms to add.
The study of computers and creativity together. This can include computers being creative, using computers to aid in human creativity, and using computers to better understand or measure creativity.
Creating data randomly, through some algorithm, rather than measuring or creating something manually.
The study of how computers can get better at producing some specific output data based on their input without any instructions from the programmer. Typically, a program is “trained” on one set of data, shown the input and the output it should have, and then “tested” on some similar data that it hasn’t seen and isn’t shown the expected outputs.
Artificial Neural Network
A computer program based on the brain in which some data is given to the program and some data is output. In between input and output, individual “neurons” decide whether to change the data they get, how to change it, and how much to change it based on some initial settings and some initial instructions for learning set by the programmer.
Recurrent Neural Networks (RNN)
A particular type of neural networks that uses memory to update itself and attempt to improve each time it runs.
An RNN created by Max Woolf that takes text as its inputs and outputs, learning what letter or character should come after a given character.
A type of text-based neural network, created by Google researchers, that specializes in learning and remembering the most important words in a text. These networks are useful for generating longer texts that stay on a topic and have consistent writing patterns.
A Transformer-based language model created by OpenAI. Using Transformers, this a pre-trained neural network that’s been fed a very large amount of text data. Three sizes have been released, and even the smallest is very big, having been trained on 117 million parameters (textgenrnn uses fewer than 500 parameters). We like to use tools like Talk to Transformer, Write With Transformer, and Max Woolf’s Collaboratory notebook to add our own text to GPT-2 for custom generations.
Text generation, often accomplished through the use of Markov chains, to guess the next word based on previous words (like the way your phone suggests words while you text). This process uses the previous word or the previous several words and some text (like your own texting history) to guess what the most probable next word should be.
Predictive text interfaces made using the Voicebox app (made by Jamie Brew) available for free on botnik.org for users to train on any text or combination of texts.
Generative Adversarial Networks (GAN)
A particular type of neural networks in which two networks are trained together, one to generate new “fake” data based on the original data and the other to determine whether a piece of data was from the original data or one of a “fake”. Both get better the more they “play” this game, resulting in a network that is good at both making new data that is similar to the real dataset and good at determining whether or not some data belongs to the original group.
A particular GAN made by Andrew Brock, Jeff Donahue, and Karen Simonyan that’s gained popularity for the incredibly realistic synthetic images it’s generated as well as the highly disturbing images that have been generated by GANbreeder (by Joel Simon), a site where multiple BIGGAN generations can be blended together over time by many users to produce new images.
A dataset, the thing a machine learning program learns from. Often, this refers to a text dataset, but Justin, as someone who mostly deal with language as data, will call any dataset a corpus.
An algorithm that selects or creates data in an arbitrary way (that is to say, the data itself does not impact what is picked or created). Often this is simply picking a random number or picking an item from a list appearing at that random numbered position.
A text generation tool made by Kate Compton and built on JSON files intended to allow people without much technical ability to create stories and pieces of fiction with randomized elements. Used as a Twitter bot-building tool on cheapbotsdonequick.com, a website by George Buckenham.