Set theory is the dominant foundation for mathematics. The idea is that everything else in mathematics—numbers, functions, etc.—can be written in terms of sets, so that if you have a consistent description of how sets behave, then you have a consistent description of how everything built on top of them behaves. If predicate logic is the machine code of mathematics, set theory would be assembly language.
Set theory is defined by a collection of axioms, which describe the behavior of its only predicate symbol, ∈, a mutated version of the Greek letter epsilon. The interpretation of x∈y is that x is a member of (also called an element of) y. There is also the symbol ∉ (is not an element of), where x ∉ y is defined to mean ¬(x∈y); and ∋ (contains), where y ∋ x is the same as x ∈ y. Any set contains zero or more elements, and has no other properties other than what elements it contains. Formally, this means that if two sets contain the same elements, they're the same set. To make this explicit, we have the
- Axiom of Extensionality
- ∀x ∀y (x = y ⇔ ∀z (z ∈ x ⇔ z ∈ y)).
Extensionality is the axiom that defines what a set is. The rest of the axioms tell us what possible sets exist. There are two main traditions for filling in the collection of sets that exist, known as naive set theory and axiomatic set theory. Naive set theory essentially assumes the existence of any set that you can describe, either by listing its elements explicitly, or by giving a rule (i.e. a predicate) that tells whether a particular object is in the set or not. Because the naive approach quickly leads to paradoxes, hard-core logicians instead favor axiomatic set theory, in which no set is presumed to exist unless you can generate it using a specific list of axioms. Axiomatic set theory usually assumes that there are no objects in the universe except sets (so that all quantifiers apply only to sets); other useful mathematical objects like numbers or functions must be represented as sets before they can be used. Naive set theory is more generous about what can be included in a set, so that you can have, for example, a set of baseball players or planets or philosophical concepts without having to somehow reduce these objects to sets.
Despite the fear and trembling that axiomatic set theory inspires in some readers, the difference between axiomatic set theory and naive set theory is not great, and with one noteworthy exception naive set theory can be thought of as just a pretty front-end to axiomatic set theory. So for these notes we will adopt the approach of presenting set theory from a naive perspective, but bring in standard axioms as needed to justify the existence of particular sets we may encounter. The axiomatization we will use is the standard Zermelo-Fraenkel set theory with the Axiom of Choice, usually abbreviated ZFC.
1. Set notation and some simple sets
Sets whose elements are known are written by listing the elements inside curly braces, e.g. { 1, 2, 3 }, { Groucho, Chico, Harpo, Zeppo, Karl }. The smallest set is the empty set, the set with no elements. This can be written either as { } or using the special symbol ∅ (which is actually a mutated Greek letter Phi). One of the ZFC axioms says that ∅ exists:
- Axiom of Existence
- ∃x ∀y y ∉ x.
With just the axiom of extensionality, it is possible that no sets at all exist; the axiom of existence gives us at least one set (but possibly no others).
2. Operations on sets
Given two objects x and y, there is a set { x, y } that contains both x and y but nothing else. The existence of this set is asserted by the
- Axiom of Pairing
- ∀x ∀y ∃z ∀q (q ∈ z ⇔ (q = x ∨ q = y)).
There's a general rule of thumb that any expression with more than three quantifiers is incomprehensible to human beings, but the way to interpret this one is that if you give me x and y, I can find z = { x, y }. The reason I know that z = { x, y } is that for any q, either (a) q = x and q is in z, (b) q = y and q is in z, or (c) q is not in z. This means that z contains only x and y and nothing else, as we wanted.
Note that pairing two identical objects yields a one-element set: { x, x } = { x }. This follows from extensionality: sets only know whether they contain something or not, and there is no notion of containing the same element more than once, or of containing multiple copies of an element. It also follows from extensionality that result of pairing two non-identical objects is an unordered pair: { x, y } = { y, x }, since the set only knows that it contains both x and y, but not which comes first. To encode an ordered pair (x, y) with a first element x and a second element y (which may even be equal to each other), the usual trick is to encode the ordering with some additional structure, e.g. (x, y) = { { x }, { x, y } }.
Once we have pairing, we can apply it repeatedly to generate an infinite collection of sets starting with { }: e.g. { { } , { } } = { { } }, { { }, { { } } }, { { { { { { } } } } } }, and so forth. But we can only generate sets with one or two elements. To get bigger sets, we need more powerful operations.
The union of two sets x and y is a set x ∪ y that contains every element that appears in either x or y: for example, { 1, 2, 3 } ∪ { 2, 4, 6 } = { 1, 2, 3, 4, 6 }. Union is to sets what OR is to predicates: we could define x ∪ y as the set z for which ∀q (q ∈ z ⇔ (q ∈ x ∨ q ∈ y). But the formal definition is a little more complicated. The reason is that sometimes we like to be able to take the union of more than two sets at once. Suppose we have a set S = { A₁, A₂, A₃, ... }; then the union of S, written ∪S, contains every object that appears in at least one of S's elements. This more general notion of union includes the pairwise version because of the Axiom of Pairing: we can always rewrite x ∪ y as ∪{x,y}. To assert that unions always exist, we have the
- Axiom of Union
- ∀x ∃y ∀z (z ∈ y ⇔ ∃q (z ∈ q ∧ q ∈ x)).
In reading the statement of the axiom, you should think of x as the set we are taking a union over, y as the union ∪x, z as some element that might or might not be in y, and q as an element of x that contains z and thus puts z in the union. It may help to think of this general union operation in terms of family trees: if the members of x are x's children, then the members of y = ∪x are x's grandchildren (and q stands in for the child that connects x and a particular grandchild z).
(Exercise: show that our first characterization of x ∪ y is equivalent to the more general definition given the definition x ∪ y = ∪{x,y}.)
Between pairing and union, we can build sets of arbitrary (though finite) size: just build a tree of pairings, and then take unions to flatten them out. For example, { 1, 2, 3, 4 } = { 1, 2 } ∪ { 3, 4 }.
3. Subsets and power sets
A set x is a subset of a set y, written x ⊆ y, if every element of x is also an element of y. Formally,
- x ⊆ y ≡∀z (z ∈ x ⇒ z ∈ y).
- Warning
- Don't confuse ∈ ("is in", "is an element of", or "is a member of") with ⊆ ("is a subset of"). It's possible for a set A to be a subset of B without being an element of B. For example, { 0 } is a subset of { 0, 1 } but it's not an element of { 0, 1 }. It is, however, an element of { { 0 }, 1 }. On the other hand, { 0 } is not a subset of of { { 0 }, 1 }, because { 0 } has an element 0 that isn't one of the two elements { 0 } and 1 of { { 0 }, 1 }. Nested braces matter.
As seen in the definition, the predicate-logic analogue of x ⊆ y is implication: membership in x implies membership in y.
Given any set x, there is a set ℘(x), the power set of x, that has as its elements precisely all the subsets of x. (The funky symbol here is intended to be a cursive P; you can write P(x) if you can't do a cursive P for some reason). The existence of the power set is asserted by the
- Axiom of Power Set
- ∀x ∃y ∀z (z ⊆ x ⇔ z ∈ y).
The power set of a finite set with n elements will have 2n elements: for example, ℘ { 1, 2 } = { { }, { 1 }, { 2 }, { 1, 2 } }. The power set of the empty set is not empty, as it contains the empty set itself: ℘ { } = { { } }.
4. Set-builder notation
So far we have seen lots of ways to make big sets by combining small sets. There are also operations on sets for extracting particular subsets, or for transforming a set into a new set by mapping over its elements. These operations are written using set-builder notation, also called set comprehension. This is also the point at which naive set theory parts company with axiomatic set theory, and gets into trouble because of it.
We can construct a subset of a given set that contains only those elements that satisfy a given predicate. The notation for this is { x ∈ y | Px }, and means the set of elements of y that satisfy P. The existence of this subset is asserted by the
- Axiom Schema of Specification
- ∀x ∃y ∀z (z ∈ y ⇔ (z ∈ x ∧ Px)).
This is an "axiom schema" instead of an axiom, because it defines an infinite family of axioms, one for each choice of P.
Naive set theory adopts the simpler approach of getting rid of the parent set x, and allowing you to define a set { x | Px } to consist of all objects that satisfy a given predicate, e.g. { x | x is prime }. This is a safe operation provided that there is some set that contains all the values of x for which Px is true, or if there is some default "universe" (again a set) that the set comprehension is implicitly restricted to. But you can get into trouble if you try to build sets that are too big: Russell's_paradox constructs the set S = { x | x ∉ x }, and asks if S is an element of S (either answer yields a contradiction). One of the reasons why ZFC is so careful not to allow you to define sets in arbitrary ways is to avoid precisely this paradox, which arises as soon as you can construct a set of all sets. Such a set of all sets does not exist in standard set theory, although there are variants that allow something like it, while avoiding Russell's Paradox using other restrictions. To avoid this problem, it's best to use the notation { x | Px } only if x is restricted to values that you know are elements of some set, e.g. integers or baseball players (when suitably encoded as sets).
Specification allows for some other specialized set operations. For example, the intersection x ∩ y of two sets x and y consists of all elements that appear in both x and y; it is to sets what AND is to predicates. It can be defined formally as { z ∈ x ∪ y | z ∈ x ∧ z ∈ y }; and like union there is a generalization ∩x which consists of all elements that appear in every element of x (definition left to the reader). A similar operation is set difference, where x - y (sometimes written x \ y) is defined as { z ∈ x | z ∩ y }. Given a particular universe U, the complement of a set x is U - x; this acts like negation. Note that the complement is not defined except in the context of some universe.
5. Proving things about sets
Before throwing in any more axioms, let's take a brief break to talk about strategies for proving statements about sets:
5.1. To prove that x is an element of A
Use the definition of membership in A and show that x satisfies it. Examples:
- Theorem
- 2 ∈ { x | ∃y x = 2y }.
- Proof
- Let y = 1, then x = 2y.
- Theorem
4 ∈ ({ x | ∃y x = 2y } ∩ { x | x > 1 }).
- Proof
From the definition of intersection, x is in ({ x | ∃y x = 2y } ∩ { x | x > 1 }) if and only if x is in both sets. In the case x = 4 we can show (a) 4 ∈ { x | ∃y x = 2y } because 4 = 2*2, and (b) 4 ∈ { x | x > 1 } because 4 > 1.
5.2. To prove A is a subset of B
Use the definition: A ⊆ B if and only if ∀x x∈A ⇒ x∈B. So pick some x and show that the implication holds, either directly or by showing every x is either not in A or in B.
- Theorem
- ∀A ∅⊆A.
- Proof
- Fix A. Expanding the definition gives ∀x x∈∅ ⇒ x∈A, which we will now prove. Choose some x. By definition of ∅, x∉∅. So x∈∅ is false and x∈∅ ⇒ x∈A is true.
Here's a somewhat more abbreviated proof that leaves out all the boilerplate stuff like "expand the defintiion of ..." and "fix x." This is closer to what a typical published mathematical proof might look like; the assumption is that the reader can fill in the details.
- Theorem
- { x | ∃y x = 4y } ⊆ { x | ∃y x = 2y }
- Proof
- Pick x and let y be such that x = 4y. Then x = 2(2y).
Here's a proof of a theorem about powersets. Keep an eye out for words like "fix" and "choose" that indicate we are diving inside universal quantifiers.
- Theorem
- ∀A∀B A ⊆ B ⇒ ℘A ⊆ ℘B.
- Proof
- Fix A and B. Recall ℘A = { C | C ⊆ A }. We'll show that any C in ℘A is also in ℘B. To do so, choose some C in ℘A. For any x∈C, x∈A and thus x∈B (since A ⊆ B). It follows that C ⊆ B. Since this holds for all C in ℘A, we have ℘A ⊆ ℘B.
5.3. To prove that A = B
This can be done directly by showing ∀x x∈A ⇔ x∈B and applying Extensionality, but it is often done by showing both A ⊆ B and B ⊆ A.
- Theorem
{ x ∈ N | ∃y x = 2y ∧ x < 1 } = { 0 }.
- Proof
Call the set on the left-hand side A and the one on the right-hand side B. First we'll show A ⊆ B. Let x ∈ N. Then either (a) x = 0 and thus x ∈ B, or (b) x > 0 and x ∉ A. In either case we have x ∈ A ⇒ x ∈ B and so A ⊆ B. Now we'll show B ⊆ A; observe that x ∈ B implies x = 0 (it's the only element), and we have both 0 = 2*0 and 0 < 1. So x ∈ B ⇒ x ∈ A and thus B ⊆ A.
6. Infinite sets
We now continue with the axioms. We need one more axiom to get all the usual sets we know and love; this is the axiom of infinity. Before presenting the axiom (which is a little hairy), it may help to talk a bit about how to define the natural numbers 0, 1, 2, ... in set theory.
Zero is easy: we'll represent 0 as ∅ (i.e., { }). For larger numbers n, we'll represent n as the n-element set { 0, 1, 2, 3, ..., n-1 }. Note that the elements are defined according to the same rule, so that we really have
- 0 = { }
- 1 = { 0 } = { { } }
- 2 = { 0, 1 } = { { }, { { } } }
- 3 = { 0, 1, 2 } = { { }, { { } }, { { }, { { } } } }
and so forth. The nice thing about this representation is that many simple operations on numbers translate into simple operations on sets: n+1 = n ∪ { n }, n-1 = ∪n, and n < m if and only if n ∈ m.
Now that we can define individual natural numbers, what about N, the set of all natural numbers? With the axioms we have so far, we can't prove that N exists. The problem is that N has infinitely many members: but all of our current set operations produce only finite sets (i.e., those with a number of elements equal to some natural number) starting from finite sets. (Exercise: check this.) So we'll add another axiom, which says that either N or some superset of N exists:
- Axiom of Infinity
- ∃x (∅ ∈ x ∧ ∀y (y ∈ x ⇒ y ∪ { y } ∈ x)).
Using the representation of natural numbers we just defined, the axiom of infinity says that there exists some set x that (a) contains 0, and (b) contains n+1 whenever it contains n. This implies that x contains at least all of the natural numbers. It may also contain other sets that don't represent natural numbers, but we can prune these out using Specification to get N alone.
Of course, the other axioms still apply to our new infinite set, so we get a vast collection of other infinite sets as well. Some of these include such favorites as Z, the set of integers, which can be represented as { (x, y) | x ∈ N ∧ y ∈ N ∧ (x = 0 ∨ y = 0) }, with the interpretation that (x, 0) represents +x and (0, x) represents -x; Q, the set of rational numbers, which can be represented by ordered pairs of integers; and R, the set of real numbers, where each x in R is represented by the set of rationals that are less than or equal to x. Other structures can easily be built out of sets and ordered pairs (which are of course themselves just a particular kind of set).
7. Cartesian products, relations, and functions
Given sets a and b, their Cartesian product a × b is the set { (x,y) | x ∈ a ∧ y ∈ b }, or in other words the set of all ordered pairs that can be constructed by taking the first element from a and the second from b. If a has n elements and b has m, then a × b has nm elements. For example, { 1, 2 } × { 3, 4 } = { (1,3), (1,4), (2,3), (2,4) }.
The existence of the Cartesian product of any two sets can be proved using the axioms we already have: if (x,y) is defined as { {x}, {x,y} }, then ℘(a ∪ b) contains all the necessary sets {x} and {x,y}, and ℘℘(a ∪ b) contains all the pairs { {x}, {x,y} }. It also contains a lot of other sets we don't want, but we can get rid of them using Specification.
A subset of the Cartesian product of two sets is called a relation. An example would be the < relation on the natural numbers: { (0, 1), (0, 2), ...; (1, 2), (1, 3), ...; (2, 3), ... }. Just as sets can act like predicates of one argument (where Px corresponds to x ∈ P), relations act like predicates of two arguments. Relations are often written between their arguments, so xRy is shorthand for (x,y) ∈ R.
A special class of relations are functions. A function from a domain A to a codomain (or range) B is a relation on A and B (i.e., a subset of A × B) such that every element of A appears on the left-hand side of exactly one ordered pair. We write f: A⇒B as a short way of saying that f is a function from A to B, and for each x ∈ A write f(x) for the unique y ∈ B with (x,y) ∈ f. (Technically, knowing f alone does not tell you what the codomain is, since some elements of B may not show up at all; this can be fixed by representing a function as a pair (f,B), but it's generally not useful unless you are doing CategoryTheory.) Most of the time a function is specified by giving a rule for computing f(x), e.g. f(x) = x2.
A function f:A⇒B that covers every element of B is called onto, surjective, or a surjection. If it maps distinct elements of A to distinct elements of B (i.e., if x≠y implies f(x)≠f(y)), it is called one-to-one, injective, or an injection. A function that is both surjective and injective is called a one-to-one correspondence, bijective, or a bijection. (The terms onto, one-to-one, and bijection are probably the most common, although injective and surjective are often used as well, as they avoid the confusion between one-to-one and one-to-one correspondence.) Any bijection f has an inverse f-1; this is the function { (y,x) | (x,y) ∈ f }. Two functions f:A->B and g:B->C can be composed to give a composition g∘f; g∘f is a function from A to C defined by (g∘f)(x) = g(f(x)).
8. Other axioms
The six axioms and one axiom schema given so far are 70% of ZFC. They are enough to construct all the standard sets used in mathematics. Two additional axioms and one additional axiom scheme are include in ZFC, in order to exclude certain ill-behaved sets that might otherwise exist, and include certain sets whose existence makes life more convenient for mathematicians. If you are a computer scientist, you will most likely never need to use these axioms unless you are doing serious theoretical mathematics, as pretty much everything that is directly relevant to computer science fits comfortably inside N and its close relatives.
- Axiom of Foundation
- ∀x (x = ∅ ∨ ∃y (y ∈ x ∧ x ∩ y = ∅).
This rather technical axiom says that any nonempty set x has an element y that has no element in common with x. The reason it is included is that it implies (though the proof requires a bit of work) that you cannot have an "infinite descending chain"—a sequence of sets x₁, x₂, x₃, ..., such that xi+1 ∈ xi for each i. This means that though sets may be very wide, in the sense of containing a lot of elements, they cannot be nested too deeply; each occurrence of the empty set is wrapped in only finitely many layers of sets. It is worth noting that this is the only axiom whose purpose is showing that certain bad sets do not exist.
- Axiom Schema of Replacement
- ∀x (∀y ∀z₁ ∀z₂ (y ∈ x ∧ P(y,z₁) ∧ P(y,z₂) ⇒ z₁ = z₂) ⇒ ∃q ∀z (z ∈ q ⇔ ∃y (y ∈ x ∧ P(y,z))).
Like Specification, Replacement defines an infinite collection of axioms, one for each predicate P.
This rather hideous-looking axiom schema actually says something quite simple. Suppose that you have a two-argument predicate P that acts like a function when applied to elements y of x: if there is any z that makes P(y,z) true, then there is only one of them. For each y, let f(y) be this unique z. Then Replacement lets you construct { f(y) | y ∈ x }.
Most of the time you do not need Replacement, because if the f(y) all fit inside some set that you already have (e.g. f(y) is always a natural number), then you can use Selection instead: { z ∈ N | ∃y (y ∈ x ∧ P(y,z) }. Where Replacement is needed is to construct certain large infinite sets whose existence is not guaranteed otherwise.
Finally, we have one final axiom, which is generally accepted by mathematicians because it's convenient but is in a very real sense optional, in that the rest of the axioms are consistent with either it or its negation. It says that for any collection of sets, there is a function that picks out one element of each set in the collection.
- Axiom of Choice
- ∀x ∃f (f:x→∪x ∧ ∀y y ∈ x ⇒ f(y) ∈ y).
Another way of stating this is that given any collection x of sets, we can construct a new set by choosing one element from each set. This follows easily from Power Set and Specification for small enough sets, but for infinite sets it can cause odd results, and it is in fact impossible to prove that such a choice set always exists using only the other axioms. The most famous bizarre consequence of Choice is the Banach-Tarski_paradox, which lets you chop a sphere into five pieces and then reassemble the pieces—without any gaps or overlap—into two spheres identical to the first. The fact that most mathematicians use Choice despite such weird outcomes is a sign of just how useful the axiom can be in other contexts.
9. Further reading
See RosenBook §§2.1–2.2, BiggsBook Chapter 2, or Naive_set_theory.
PineWiki