Surviving Advanced Calculus

Welcome to your first undergraduate course in Advanced Calculus/Real Analysis! Advanced Calculus is the first (of hopefully many) math classes where you finally get to do “real math”, possibly for the first time in your life. Most students (myself included) don’t have a great grasp of what they’ll be expected to be able to do in advanced calculus on day one, maybe you’ve had an intro-to-proofs or discrete math/set theory course.

While proof based math is objectively difficult, advanced calc offers a lifeline in the form of structure. You can roughly think of everything you’ll see as an element of one of four five:

Definitions.
Elementary properties
Powerful theorems
Counter examples
corollaries and interesting variations on 3. and 4.

This article goes through these five main clusters and takes you on a short tour of what to expect and offers some guidance I wish I had when I started my mathematical adventures.

Definitions

Definitions are either the easiest, or most difficult aspect. This is an inherently technical discipline (surprise!), now is a good time to start learning how to communicate clearly and concisely (but I digress). Definitions are the foundation of math. We all have to agree on what a word means - and we’ve sculpted these definitions quite precisely over the last 50-400 years.

Definitions capture an idea and assign it inherent significance, it’s more than the term used to shorten a statement. Lots and lots of things get named, but only the most important, the absolutely fundamental definitions get reused for centuries. Don’t focus initially on memorizing the specific sequence of quantifiers (\(\forall\) and \(\exists\)), but learn the idea that that definition captures (it helps to draw a picture). The syntax will follow quickly.

Many students struggle initially with the idea of uniform continuity property of a function (also called a uniformly continuous function). The idea behind uniformly continuous is that if two points \(a\) and \(b\) in the function domain are sufficiently close, then \(f(a)\) and \(f(b)\) should also be sufficiently close, regardless of what \( a\) and \(b\) are. Go ahead and re-read that until you understand it. This property allows you to say things like “if the function is uniformly continuous, it doesn’t have a vertical asymptote”. Knowing this, it’s much simpler to look at the formal definition and start to take it apart.

Definition: uniform continuity

A function \(f: A \to B\) is uniformly continuous if

\begin{equation} \forall \epsilon > 0 \exists \delta > 0 : \forall x, y \in A: |y-x| < \delta \implies |f(y) - f(x)| < \epsilon \end{equation}

There’s a classic “mathematical statement” that can send people running towards their advisors office. But we already have a intuitive version of the statement: if two points \(a\) and \(b\) are close together, their images (\(f(a)\) and \(f(b)\) ) must close together. In the mathematical syntax, we start with \( \forall \epsilon > 0 \exists \delta > 0 \), we’re only specifying two positive numbers which will satisfy some later property. So far, so good. Next, we have \( \forall x, y \in A: \vert y-x\vert < \delta \), the first part of our condition is when any two points are close (within an \( \epsilon \) tolerance). The second part of our condition is \( \vert f(y) - f(x)\vert < \epsilon \), which is the “\(f(a)\) and \(f(b)\) are sufficiently close”. How close? the two constants we chose at the beginning, \(\epsilon\) and \(\delta\) are arbitrary - you can choose any \(\epsilon\) and there is always a \(\delta\) for which the pair of values describes “close enough”.

Elementary properties

This are the foundations of the proofs you’ll be writing. If you stare at them long enough, they look obvious, which makes proving them all that much more difficult. These tend to be properties of the construct (like the integral) or an underlying structure (like the natural numbers).

A deceptively simple example is the Archimedean property of the real numbers: for any two positive real numbers \(x\) and \(y\) there exists a natural number \(n\) such that \begin{equation} nx > y \end{equation}

This is rather short compared to our previous result and therefor may appear simple. There are a few common proof strategies, however my favorite uses the fact that there is no largest natural number. This statement is logically equivalent to the statement for any natural number, there exists a larger natural number. It’s rather elegant, and easy to remember.

It’s worth learning and understanding a proof of every elementary property you encounter - not just because it allows a later result to be true, but it offers you a simple, well understood, example to observe different proof methods and strategies.

But that leaves the question: how on earth do I prove something that seems so obviously true? Ahah! You’ve just answered your question - suppose that the obviously true statement is false, and apply the falsified (negated) statement until you reach a conclusion that is also obviously false. If this sounds loosely familiar, this strategy is called proof by contradiction, and it’s a very old method.

Elementary properties aren’t usually the most interesting interesting thing to prove, but they are absolutely fundamental. Calculus would look radically different if \(\mathbb{R}\) did not satisfy the Archimedean property.

Elementary properties come in a second flavor, features and properties that are deliberately true. What exactly does that mean? These are properties that constructs (like an integral) are specifically constructed so they will have these properties.

You may remember that the integral operator offers some linearity, namely

\begin{equation} \int_{a}^b f(x) \mathrm{d} x + \int_{b}^c f(x) \mathrm{d} x = \int_{a}^c f(x) \mathrm{d} x \end{equation}

(where \( a \leq b \leq c \) and \(f\) is integrable.) Linearity is a great example of an elementary property that integrals are designed to have, even though they’re not part of the definition. (Linearity is sometimes part of the definition, see this math.SE post.)

Powerful Theorems

Like elementary properties, most powerful theorems come in two general flavors: Simple and easy to state (“\(\sqrt{2}\) is not rational”) and those which hide complexity in definitions such as “All differentiable everywhere functions are continuous everywhere”. Let’s dive in.

Short theorems

Short theorems are deceptively hard to prove, you don’t have as much material to start with. The quantifiers used in a short theorem (\(\forall\) and \(\exists\)) give a lot away. With a for all (\(\forall\)), you are usually stuck working with properties.

The general direct proof for a \(\forall\) proof is to construct a general object (map, function, topological space, sequence, ring, etc) that has the assumed properties, then directly apply properties until you arrive at the result. If it sounds like you need brute force, it’s because you often do here, manipulating properties and expansions until you arrive at the conclusion. The alternative, there exists (\( \exists\) ) proofs can be solved by brute force, but are also valid targets for constructions.

A classic example simplified from graduate analysis for a construction is: _prove that for any two distinct real numbers \(a\) and \(b\) such that \(\lvert a-b\rvert>0\) there exists a continuous function \(f\) (defined for all real numbers) such that \(f(a)=0\) and \(f(b)=1\) and the image of \(f\) is contained in \([0,1]\). This is a stepping stone towards a Urysohn function from topology. The easiest way to prove that there exists such a function is to generate an method (“algorithm”) that always produces a function with this property, regardless of your choice of \(a\) or \(b\).

So how is this method different than a direct proof for a \(\forall\) proof? The bulk of our proof is actually a method to generate/find an object with the desired properties, which we have carefully constructed to ensure that applying the method for any input assures it returns an output (function, sequence, etc) which has the desired properties. With a for all proof, we manipulate the fundamental objects presented in the proof until they have the desired properties.

Some definitions are simply shorthand for “the following list of properties are all satisfied”. The definition of a compact set is a prime example, a set is compact if and only if it is closed and complete (a sequence of points in the set converges if and only if is is Cauchy). When presented with a theorem “Compact implies …”, you really get two properties (closed and complete) to work with. These kinds of simplifications using definitions to hide important complexity offer additional tools for you to add to your mathematical toolbox.

Definition-Heavy Theorems

Unfortunately, not all theorems can yet be expressed in an elegant form, but are important none-the-less. No field has a monopoly on long theorem statements, you can’t hide from them. Fortunately, these offer two pieces of insight.

First, you’ve got a long list of properties to work with, hopefully you’ve seen some of them together before. For example, when presented with a continuous smooth function and a bounded compact set, you may recognize that a continuous function and a compact set are the hypotheses for a set of theorems, all of which you get ‘for free’ as you have already satisfied the relevant hypotheses. Keeping a mental library of theorems and pattern matching hypotheses and conclusions can help greatly for a definition heavy theorem.

Secondly, proof by contradiction takes a different flavor when presented with a list of true properties - the logical negation of A and B is not A or not B (the not operator means to negate the following clause). On initial inspection, this doesn’t seem to be earth-shaking power. However, if you negate A and B and C, you obtain not A or not B or not C. This gives three possible statements that contain exactly one negated assumption:

not A and B and C
A and not B and C
A and B and not C

With this (a recursive application of DeMorgan’s laws) you can generate a litany of possible logical starting places for a proof by contradiction. This often helps when devising counter-examples and finding contradictions, you have more theoretical ammunition.

Counter examples

There are a lot of counter examples that have been devised over the last several hundred years. I could write a book on these as others have done. Fortunately, in Advanced Calculus there are few that are more popular than others, so we’ll look at these to give you a sense of what’s out there. Counter examples are all about breaking things, while having some other nice properties.

Counter examples accomplish two goals: show that a condition is needed (or unneeded) by demonstrating what happens when it is removed (or included), or to show that a hypothesis is insufficient to induce a conclusion. For a sample generic theorem, A and B imply C, The first method looks like \(f\) is a function that satisfies A but not B, so C is false. The second looks like \(g\) is a function that satisfies A and B, but does not satisfy C.

Dirichlet function

A classic, the Dirichlet function is defined as

\begin{equation} f(x) = \begin{cases} 1 & x \in \mathbb{Q} \\ 0 & x \notin \mathbb{Q} \end{cases} \end{equation}

The Dirichlet function is periodic, nowhere continuous, not differentiable, and nowhere Riemann integrable.

What can we do with this? It’s really good at breaking limits and providing counter examples for what can go wrong with sequences when a function isn’t continuous.

Step functions and Point Discontinuities

Another classic piece-wise function, the step function is defined

\begin{equation} u_k(x) = \begin{cases} 1 & x \geq k \\ 0 & x < k \end{cases} \end{equation}

This particular variant is sometimes called the Heaviside step function, however you can generate a different variant by changing what happens at the middle point

\begin{equation} u_k(x) = \begin{cases} 1 & x > k \\ \frac{1}{2} & x = k \\ 0 & x < k \end{cases} \end{equation}

Both functions present additional limit breaking behavior, but it’s only at a single point (at k) unlike the Dirichlet function. We can generate a perhaps more interesting example,

\begin{equation} p_k(x) = \begin{cases} 0 & x \neq k \\ 1 & x = k \end{cases} \end{equation}

This function has left and right limits and is discontinuous at k. Incidentally, it’s a function that attains a nonzero value yet has an integral of \(0\), and has zero derivative almost everywhere.

Variations and Corollaries

The final “catch-all” of interesting and noteworthy results. These generally fall into two distinct categories:

Armed with the proof, we might ask what happens if we strengthen or weaken an assumption (such as regularity, e.g. continuity to uniform continuity)? Can we subsequently strengthen (or weaken) the conclusion?

Often, a line of investigation is motivated by a specific problem with a specific set of assumptions. For example, can a result that relies on a convergent sequence be derived if it contains a convergent subsequence? Perhaps if every subsequence contains a convergent subsequence, but the limit is not unique? If a norm is replaced by one that admits an inner product, can we derive an exact relationship? These “what if” questions make for excellent homework problems to check your understanding of a concept. Once you leave the realm of classical mathematics and encounter novel problems, this approach to mathematics allows you to take one major result and obtain 6 additional, smaller results, in rapid succession.

Armed with a result, we might ask if there are any immediate consequences of adding to the set of assumptions? Where does this result fit in the hierarchy of other properties and results?

Perhaps the most well known example of this case is the Bolzano-Weierstrauss Theorem (BWT). In some texts, BWT is stated as Every bounded sequence of real numbers contains a convergent subsequence. Often a Compact space is defined as a set of points for which every sequence contained in it contains a convergent subsequence. Working on the real line, it would appear that BWT implies that every boundedinterval is a compact set. BWT can be proved in a strictly real analysis context without applying topological definitions or theorems. Yet we might wish to define a compact set using the topological definition, a set is compact if for every open cover of the set there contains a finite subcover of the set. Then, we can prove the equivalence of the topological definition and the analysis definition. Thinking back to my analysis courses, this approach is common. We prove BWT, introduce compact sets using the topological definition, prove a sequential characterization is equivalent to the topological definition, then note that if we apply the BWT, it follows that all closed intervals are compact.

Such complex chains of arguments become natural with practice, time, and reading/seeing these arguments unfold time and time again.