The terminology of AI

From algorithms and deep learning to neural networks and data models, let’s dive into the essential terms you’ll need for a thorough grasp of AI.

Language can be both a clarifier and a confounder. Whether it's corporate buzzwords or teenage slang, words can enlighten or obscure, forming in-groups and sometimes skirting clear communication. Language constantly evolves, borrowing and adapting across domains. Jargon, intended as a shorthand for intricate ideas, can instead cloud them. The term “synergy” serves as a prime example, its clear meaning often diluted through ambiguous use.

On the other hand, love them or hate them, linguistic shifts across analogous contexts can be effective ways to compress meaning. “Bandwidth,” once a tech term, now hints at personal capacity. “Cascade” has migrated from waterfalls to strategic interplays in a business context.

AI jargon might seem cryptic at first. However, understanding often arises from linking the expressions to the known ideas at their roots. In this glossary, we’ll demystify AI terms, exploring their origins and relationships, guiding you through this novel linguistic terrain with assurance.

Activation Function

In the realm of neural networks, neurons are computational units that take one or more inputs and produce an output. The concept is inspired by biological neurons, where certain stimuli can activate or “fire” a neuron to signal other neurons. To mimic this behaviour computationally, neural network models needed a mechanism to decide when a neuron should be activated given its inputs. This mechanism was termed the “activation function” as it determines the neuron’s output based on its input, essentially deciding the activation level of the neuron.

An activation function in artificial neural networks is thus a mathematical function applied to a neuron’s output before it is passed to the next layer. It introduces non-linearity into the network, allowing the network to learn complex, indirect relationships in the data. Common activation functions include the sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU).

The activation function is inspired by biological neurons, where certain stimuli can activate or “fire” a neuron to signal other neurons.

AI ethics

AI ethics is a multidisciplinary domain concerned with studying and providing guidelines for the moral behaviour of AI systems and their creators. It addresses a range of topics including fairness, transparency, accountability, bias mitigation, privacy, human rights, and the broader societal impacts of AI. The goal is to ensure AI technologies benefit humanity while minimising harms and unintended consequences.

Alignment

In the context of AI, alignment refers to the process of designing AI systems, especially powerful ones, to have objectives, behaviours, and outcomes that are beneficial and in line with human values, intentions, and goals. An AI system is said to be “aligned” if its actions and decisions can be trusted to be in the best interest of humans and not lead to harmful or unintended consequences.

Architecture

The term “architecture” traditionally refers to the design and structure of buildings. In the computing realm, as the design and organisation of complex systems became crucial, the term was co-opted to describe the high-level structuring of software or hardware systems. It captures the blueprint or framework upon which various components of a system are constructed and integrated.

For AI models, especially neural networks, the architecture specifies the arrangement and connections of nodes or neurons, layers, and other components. Different architectures, such as convolutional neural networks (CNN) or recurrent neural networks (RNN), are optimised for different types of tasks.

Artificial General Intelligence (AGI)

AGI refers to a type of artificial intelligence that possesses the ability to understand, learn, and perform any intellectual task that a human being can. Unlike narrow or specialised AI, which is designed and trained for a specific task, AGI can transfer knowledge from one domain to another, adapt to new tasks autonomously, and reason through complex problems in a generalised manner.

Algorithm

The term “algorithm” is derived from the name of 9th-century Persian mathematician al-Khwarizmi, whose influential book on algebra contained the word al-jabr in the title and was the source of the term algebra itself.

An algorithm is a well-defined sequence of steps or set of rules designed to perform a specific task or solve a particular problem. Algorithms are sometimes contrasted with heuristics, which are general strategies or mental shortcuts for quick and efficient decision-making. Heuristics are based on experience, intuition, or common sense, rather than step-by-step procedures.

Here is an example of a very simple algorithm written in prose, to find the largest number in a list of numbers of random order:

  1. If there are no numbers in the list, then there is no highest number.

  2. Assume the first number in the list is the largest.

  3. For each remaining number in the list, if this number is larger than the assumed largest, update the assumed largest to the current number.

  4. The largest number is the last assumed largest when you've gone through the whole list.

How is an algorithm expressed?

Algorithms can be expressed in various forms, from natural language to pseudocode, flowcharts, or specific programming languages.

What might a heuristic approach to the same problem involve?

By contrast, tackling the same problem using a heuristic approach might mean quickly identifying a number that's large, but with no guarantee it's the largest. For instance, if the list of numbers was long, a heuristic might just involve skimming and selecting a number that appears big in its vicinity, rather than thoroughly comparing it to every other number.

Artificial Intelligence

The terms artificial and intelligence were combined for the first time in their modern use during a conference at Dartmouth College in 1956 by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon. Combining notions of being human-made rather than naturally occurring, and the capacity to acquire and apply knowledge, Artificial Intelligence refers to the capability of machines or software to perform tasks in a manner that simulates the abilities of humans, animals or other forms of life.

Attention

Inspired by human selective focus on sensory or salient details, attention in deep learning refers to a mechanism that enables neural networks, especially in sequence-to-sequence tasks, to focus on specific parts of the input when producing an output. The attention mechanism produces a weighted combination of input features where the weights determine the amount of ‘attention’ each feature receives when generating a particular output.

Autoregression

“Autoregression” is derived from two components: “auto-”, which means self, and “regression”, which is a statistical method to predict the value of a dependent variable based on one or more independent variables. Combined, the term points to a regression model where the current value of a series is predicted based on its own past values.

Backpropagation

“Backpropagation” is short for “backward propagation of errors”, with the original definition of propagate being to breed specimens (of a plant or animal) by natural processes from the parent stock. It is a supervised learning algorithm used for training artificial neural networks, particularly feedforward (forward propagation) networks. It is a method of computing the gradient of the loss function with respect to all the weights in the network. The algorithm works by propagating the error backwards through the network, from the output layer to the input layer, adjusting the weights along the way to minimise the error. This process is repeated multiple times until the network converges to a set of weights (or parameters) that produce the desired output for a given input.

Batch size

Batch size is a hyperparameter of gradient descent that determines the number of training examples utilised in one iteration. The word “batch” originates from the computational practice of processing data in chunks, rather than one data point at a time or the entire dataset at once. Batching is typically used for computational efficiency and to adjust for memory constraints. So, in other words, for each forward and backward pass of a training algorithm, the batch size specifies the number of samples the model is exposed to.

Biases (as in, weights & biases)

In this usage, the word “bias” has its roots in the idea of shifting or offsetting something from a baseline or reference. In the context of neural networks and many other machine learning algorithms, biases are additional parameters alongside weights that are learned during the training process, and that can help shift the output up or down, similar to the constant (or y-intercept) in linear equations. In neural networks, biases play a crucial role in ensuring the model isn't strictly tied to the origin and can fit the data more flexibly by offering an added degree of freedom to adjust the model's output independent of its input.

Bias (in data)

The term “bias” in data has its origins in the broader notion of prejudice, or a disproportionate weight for or against an idea or thing, usually in a way that is closed-minded, or unfair. When this concept was transferred to machine learning and data science, it was used to refer to the presence of systematic (non-random) errors, skews, or patterns in the data that deviate from a true or representative distribution of the underlying phenomenon being studied. When a machine learning model is trained on biased data, it can lead to unfair, discriminatory, or incorrect model predictions or conclusions.

Bias (in algorithm design)

Bias in algorithm design refers to the inclinations, preferences, or systematic patterns that are intentionally or unintentionally introduced into an algorithm due to choices made by its designers. For instance, an algorithm designed to recommend movies might be biased towards recent releases if it heavily weights the release date, thereby potentially overshadowing classic movies that might be more relevant to a user's tastes. These biases can manifest as consistent tendencies in the algorithm's behaviour, performance, or decisions, which may favour certain outcomes, data types, characteristics, or conditions over others.

Convolutional Neural Networks (CNN)

CNNs are a type of specialised deep neural network developed predominantly for visual tasks. CNNs utilise convolutional layers outfitted with filters to examine local patches of data, enabling them to adeptly identify and extract layered spatial features from images. In mathematics, the term “convolution” pertains to an operation that merges two functions to create a third. Within the scope of CNNs, this denotes the process by which input data, typically images, are amalgamated with a filter or kernel to produce a feature map.

A CNN sequence to classify an image.

The key feature of a CNN is its ability to autonomously extract and learn hierarchical features from images. This is achieved by using the convolutional layers and filters to detect diverse patterns, followed by pooling layers that reduce spatial dimensions. These networks have become highly regarded in the computer vision domain, setting benchmarks in tasks such as image classification, object detection, and image segmentation.

The structure of CNNs is inspired by studies of the visual cortex in animals. These studies revealed that certain groups of neurons in the cortex were responsive to particular areas of the visual field. This led to the concept of employing local receptive fields, similar to filters, to analyse an image for specific features.

Common Crawl

Common Crawl is a non-profit organisation that crawls the web and freely provides its archives and datasets to the public. Common Crawl's datasets are massive, containing petabytes of information representing a broad snapshot of the web.

In the world of the internet, “crawling” refers to the act of systematically browsing the web to collect information about websites and their pages. The adjective “common” in this context suggests accessibility and shared use, implying that the data from this crawl is meant for the broader public.

Compression

In computing, compression refers to encoding data in a way that requires fewer bits. This enables more compact storage and faster transmission. The term traces its origins to the physical process of compressing matter into a more concentrated form. Similarly, data compression condenses the informational content into a compressed representation. Common compression techniques include Run-Length Encoding (RLE), Huffman Coding, and Lempel-Ziv (LZ). In deep learning, compression techniques are used to reduce the size of large models and datasets for more efficient storage, sharing and deployment.

Computer vision

Computer vision is a subfield of artificial intelligence focused on enabling machines to interpret and understand visual data such as images and videos. Inspired by biological vision systems, it seeks to automate tasks that rely on visual perception. Key techniques include image classification, object detection, image segmentation, and activity recognition. Computer vision powers many practical applications from autonomous vehicles and facial recognition to medical image analysis and robotics. The term first appeared in the 1960s as researchers began using computers to process and gain insights from images.

Context window

In linguistics, context is the surrounding text or talk of an expression which can influence its interpretation. A context window, in terms of AI, typically pertains to the number of words or tokens around a specific word (or token) in a sequence that the model considers when making predictions or representations. This is especially relevant in natural language processing models, where the meaning of a word often depends on its neighbours.

Conversational AI

Conversational AI refers to technologies that enable automated systems to communicate with humans using natural language. Also known as conversational agents or chatbots, they interpret human inputs like text or voice, determine meaning and intent, and respond sensibly through dialogue. Conversational AI combines natural language processing, machine learning and dialog management to model human conversation patterns and domain knowledge for useful applications like virtual assistants, customer service chatbots and voice interfaces.

The term evolved as researchers sought to create interfaces that move beyond menu-based interactions towards more flexible and intuitive human-machine communication.

Corpus

Latin for 'body', in linguistic and AI contexts, a corpus represents a large and structured collection of texts. Corpora (the plural form) are often used to train models in natural language processing, providing them with rich linguistic data to understand patterns, semantics, and structures of languages.

CPU

CPU is an abbreviation for central processing unit. Originating from the idea of centralised operations and processing tasks, the CPU is the primary component of a computer responsible for interpreting and executing program instructions and coordinating other components. It carries out the fundamental calculations and logical operations that drive computation.

CPUs contain cores, with each core able to independently run machine code instructions in parallel. More cores allow parallel processing of more computational threads. In AI, while much of the heavy lifting for model training has migrated to Graphics Processing Units (GPUs), CPUs remain vital for various tasks and orchestrating broader processes. CPUs work closely with GPUs and specialized AI chips to handle the intense mathematical computations required for deep learning.

Decoder

In its base form, to decode is to convert coded data back into its original form. In machine learning, a decoder is a component of certain models like autoencoders and sequence-to-sequence models that converts the internal representation back into the original input format. It effectively operates in reverse of the encoder, reconstructing meaningful outputs from the encoded state or latent space. For example, in language translation models, the decoder generates translated sentences from the encoder's compressed semantic representation in one language. The complementarity between the encoder and decoder within these generative models gives rise to the term autoencoder.

Deep fake

Borrowing the word “deep” from deep learning, deep fakes are synthetic media generated by deep learning techniques that very convincingly replace a person in an image or video with someone else's likeness. Deep fakes leverage powerful generative models like GANs and autoencoders to manipulate visual and audio content. The resulting falsified media can mislead and deceive if used for purposes like political sabotage or revenge porn, or make it appear someone said or did something they did not. Deep fakes illustrate both the creative potential and risks of AI generative technologies.

Deep learning

Deep learning is a subset of machine learning that utilises neural networks with multiple layers (hence deep) to extract hierarchical features from data. Each deeper layer builds on the previous layer’s output. This layered, hierarchical feature learning enables modeling of highly complex functions. It has shown remarkable results, especially in tasks like image and speech recognition.

The term was coined in 1986 by Rina Dechter to refer to certain graph-based machine learning models. The modern usage emerged around 2006 when Geoffrey Hinton and Ruslan Salakhutdinov showed that many-layered artificial neural networks could be pre-trained one layer at a time, treating each layer in turn as an unsupervised restricted Boltzmann machine, then fine-tuned using supervised backpropagation. This breakthrough enabled practical training of deep neural networks.

Epoch

Rooted in the idea of a distinct period or era, in deep learning, an epoch refers to one complete pass through the entire training dataset during the training of a neural network model. So, if a dataset has 1000 examples, and the batch size is 100, then it will take 10 iterations to complete 1 epoch. The model's weights are updated after each batch using optimisation techniques like gradient descent. Training neural networks typically requires many epochs to iterate through the data multiple times and minimise the loss function to acceptable levels.

Previous
Previous

Types of AI and machine learning