Select Page

AI Glossary

Note: Some of the following definitions were written with the help of ChatGPT and may appear as verbatim or revised output.

AI-generated text detector: A tool or system designed to identify text that has been generated by a computer program rather than being written by a human. May analyze text characteristics such as fluency, word frequency, punctuation patterns, and sentence length.

Artificial intelligence: Computer-performed tasks previously only the domain of human intelligence and creativity. Divided into groups from weakest to strongest ability: artificial narrow intelligence (ANI), artificial general intelligence (AGI) and artificial superintelligence (ASI). The term “artificial intelligence” was coined in the 1950s (Williamson & Eynon, 2020). 

Artificial neural network (ANN): A type of machine learning model inspired by the structure of the human brain. It consists of interconnected nodes, or artificial neurons, that process information by passing signals from one to another. ANNs can be used to solve a variety of tasks, including image classification, speech recognition, and language translation. 

ChatGPT: Conversational AI chatbot based on Generative Pre-trained Transformer (GPT). Prototype released on Nov. 30, 2022 and available to the general public free of charge temporarily during research phase. 

Computational linguistics: The scientific field concerned with the computational aspects of the human ability to process and generate natural language. Involves the development of algorithms and models that can analyze, generate, and understand natural language text and speech. 

Deep learning models: Deep learning models are a type of artificial neural network designed to learn and make decisions by analyzing and interpreting complex data inputs. They are called “deep” because they are composed of multiple layers of interconnected nodes, with each layer responsible for extracting different types of features and patterns from the data.

Generative language models: Models that generate words in response to open-ended requests.

GPT-3: GPT-3 (short for “Generative Pre-trained Transformer 3”) is a third-gen language generation model developed by OpenAI. According to Panchotia (2021), GPT-3 is trained on 175 billion parameters–over 400 billion tokens from the Internet, over 60 billion tokens from books, 3 billion tokens from Wikipedia, and more.

Hallucinating: AI is said to be hallucinating when it generates fictional information. A known example of a hallucination would be a source that looks real enough but can’t be verified. Hallucinations, like all AI-generated text, are simply the result of predicting the next word and may appear credible.

Machine-generated text [Also, synthetic text]: Text produced by a computer program rather than produced by a human. Can be generated using a variety of techniques: text-generation algorithms, language processing models like GPT, or simple rule-based systems that substitute words or phrases according to a set of predefined rules.

Machine-learning models: Algorithms (a set of instructions that can be followed for solving a problem), mathematical models (step-by-step procedures for specific tasks), or both used together.  trained on data to recognize certain types of patterns for certain types of tasks. The models use statistical and computational methods to learn patterns and relationships in the data and then use this knowledge to make predictions or take actions in new situations. There are different types of machine learning models, and which one is used depends on the type of problem to be solved and the type of data available for training. Some examples of machine-learning models and associated purposes are as follows:

  • Autoencoders are used for unsupervised representation learning
  • Convolutional Neural Networks (CNNs) are used for computer vision tasks. 
  • Decision Trees and Random Forests are used for classification and regression.
  • Generative Adversarial Networks (GANs) are used for generative modeling.
  • Gradient Boosting is used for classification and regression.
  • K-Nearest Neighbors (KNNs) are used for classification and regression. 
  • Naive Bayes is used for classification.
  • Principal Component Analysis (PCA) is used for dimensionality reduction.
  • Recurrent Neural Networks (RNNs) are used for sequential data processing.
  • Support Vector Machines (SVMs) are used for classification and regression. 
  • Transformer is used for natural language processing tasks.

Natural Language Processing (NLP): Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human (natural) languages. It involves the development of algorithms and models that can analyze, understand, and generate human language. Examples of NLP at work: spam detection, Google Translate, Alexa, sentiment analysis (extracting sentiment and emotions from text).

Neural language modeling (NLM): A type of artificial neural network (ANN) that predicts probabilities of next words in a sequence based on previous words. NLMs often include an encoding layer to represent input text as a continuous vector, followed by one of more recurrent or self-attention layers to capture the dependencies between the words, and a decoding layer to produce the predicted next word or sequence of words. 

Statistical language modeling: Defined as “the development of probabilistic models that able to predict the next word in the sequence given the words that precede it” (Brownlee, 2017)

Text spinner: Software used to automatically rewrite or “spin” existing pieces of text in order to create new, unique versions of the text.

Transformer language model: A neural network machine learning model proposed in 2017 that allows predictions based on context rather than sequential measures

Token: In natural language processing (NLP), a token is a unit of text (a word, word part, or individual character, including punctuation) that is used for the purpose of analysis. Tokens are typically created by breaking up a body of text into smaller units, and can be words, phrases, or other elements of the text. In NLP, tokens are often used as the basic unit of analysis, and various algorithms and techniques are applied to the tokens in order to extract meaning, understand the context in which they are used, and perform other types of analysis. 
Tokenize: The process of breaking paragraphs and sentences into smaller units and assigning them numerical values. Tokenizing a text dataset is an early NLP step to process text input. Tokenization libraries include: Keras, Gensim, SpaCy, Natural Language ToolKit (NLTK), TextBlob.

Thanks to the Center for eLearning Initiatives at Penn State Erie, the Behrend College for their support in compiling this list.