Lecture 13

2023-05-15 | Week 7 | edited by Matt Wang

(originally written 2022-11-07 by Matt Wang, Siddarth Krishnamoorthy, Carey Nachenberg)

Matt here! This lecture covers polymorphism (Function Palooza, slides 110 onwards) and the first bit of OOP Palooza - up to slide . It duplicates some of the content from Lecture 12 (the beginning of the polymorphism slides).

Table of Contents

Polymorphism

Polymorphism is the process of defining a function or class that can operate on many different types of values. Our goal is to express algorithms in their most abstract sense - making our code more interoperable and generalizable. While this might sound complicated, you’ve already used many polymorphism features in your CS journey - and in this class!

One common example is a vector. A vector<int>, vector<str>, and any other vector share the same operations - how can we reduce code duplication (making a VectorInt, VectorStr, …) without taking a huge performance hit? By the end of this class, you’ll be able to answer that quesiton!

We’ll explore a few polymorphism approaches, focusing on ad-hoc and parametric polymorphism in statically-typed languages. We’ll then talk about subtype polymorphism in OOP palooza!

Ad-hoc Polymorphism

Ad-hoc polymorphism is when we (as the programmer) define specialized versions of a function for each type of object we wish it to support. The language decides which version of the function to call based on the types of the arguments.

You’ve likely seen this with operator overloading:

bool greater(Dog a, Dog b) {
  return a.bark() > b.bark();
}
bool greater(Student a, Student b) {
  return a.gpa() > b.gpa();
}
int main() {
  Dog spot, penny;
  if (greater(spot, penny))
    cout << "Spot wins!\n";

  Student carey, david;
  if (greater(carey, david))
    cout << "Carey wins!\n";
}

We call it “ad-hoc” since there’s no formal structure in the language to support it.

Ad-hoc polymorphism isn’t possible in dynamically typed languages. This is because we usually don’t specify types for formal parameters in dynamic languages, so there’s no way to define multiple versions of a function with different parameter types. Instead, such languages can specialise behaviour using type reflection.

Parametric Polymorphism

With parametric polymorphism, we define a single, parameterized version of a class or function that can operate on many, potentially unrelated types. Parameteric polymorphism is implemented in two major ways, templates or generics. The syntax for the two may look similar, but they’re implemented in entirely different ways - with big implications!

People often colloquially use “generics” to cover the entire field of parametric polymorphism, and call templates “compile-time generics” and generics “run-time generics”. For this class, we’ll use the templates vs generics naming convention.

Templates

Templates do almost all of the work at compile-time. In languages that use templates (like C++ or Rust), the compiler will see all the types that are used in each template. For each invocation, the compiler will do some type-checking. Then, it will generate a concrete version of the function/class by substituting in the type parameter. Finally, it’ll continue compiling the code “as normal”. After template substitution, the code is “normal” C++ - there’s nothing special about it!

Here’s one such example:

template <typename T>
void takeANap(T &x) {
  x.sleep();
}
class Dog {
  void sleep() { /* ... */ }
};

class Person {
  void sleep() { /* ... */ }
};

int main() {
  Dog puppers;
  takeANap(puppers); // OK!

  Person carey;
  takeANap(carey); // OK!

  string val;
  takeANap(val); // error: no member named 'sleep' in 'string'
}

In the above example, since takeANap uses the sleep() method internally, only types that support sleep() can be used. If you try to use a type that doesn’t support sleep(), you’ll get a compile time error.

When you create a new template, e.g., vector, does the compiler ensure the templated code is type-safe? If so, how?

Answer Yes - since the compiler basically generates a concrete version of the function/class with the specified type, and the compiles it as it would any other class, type safety is guaranteed!

Is templated code more or less run-time efficient than an equivalent function that doesn’t use templates, but otherwise has the same logic? Why?

Answer Both implementations have the same runtime efficiency since a custom version of the function/class is generated for each distinct type, and it can be optimized just as if you wrote a dedicated function for that type.

C-style macros are sort of the OG templates. The pre-processor would basically do a textual search-and-replace of the arguments. But, the compiler doesn’t generate a new function for each parameterized type.

// C macros used to implement template-like functionality

#define swap(T,a,b) { T temp; temp = a; a = b; b = temp; }

int main() {
 int p = 5, q = 6;
 swap(int, p, q);

 std::string s = "foo", t = "bar";
 swap(std::string, s, t);
}

In Eggert’s 131, we would have learned quite a bit more about macros (especially in the context of Scheme). If you’re interested in macros, take a look at the concept of Hygienic macros, which demonstrate why C macros are not the exact same as templating.

Generics

Generics (or “runtime generics”) take a different approach. Instead of generating a new function for each parametrized type, we compile just one “version” of the generic function or class - independent of the types that actually use our generics.

Because of this, the generic can’t make assumptions about what types it might be used it! So, you can only do “generic operations” - hence the name.

Here’s an example with C#; note how similar it looks to templates:

// C# generic container

class HoldItems<T> {
  public void add(T val)
    { arr_[size_++] = val; }
  public T get(int j)
    { return arr_[j]; }
 public void beADuck(int j)
    { arr_[j].quack(); } // ILLEGAL!!

  T[] arr_ = new T[10];
  int size_ = 0;
}

HoldItems<Duck> ducky =
          new HoldItems<Duck>();
ducky.add(new Duck("Daffy"));
Duck d = ducky.get(0);

Calling quack() on the generic type would be illegal, even though we only ever use HoldItems with Duck (which implicitly has a .quack() method).

Later, code that uses the generic is then checked to make sure that the generic’s interfaces are used correctly.

Alone, this seems useless - there are very few things you can do on all types. That’s why we can bound types: adding restrictions on what types are allowed.

Here is an example of bounding in C#:

interface DuckLike {
  public void quack();
  public void swim();
}

class HoldItems<T> where T: DuckLike {
  public void add(T val)
    { arr_[size_++] = val; }
  public T get(int j)
    { return arr_[j]; }
 public void beADuck(int j)
    { arr_[j].quack(); } // LEGAL!!

  T[] arr_ = new T[10];
  int size_ = 0;
}

Here’s another example in Haskell!

qsort :: (Ord t) => [t] -> [t]
qsort [] = []
qsort (x:xs) =
  let leq = qsort [j | j <- xs, j <= x]
      geq = qsort [k | K <- xs, k > x]
  in leq ++ [x] ++ geq

Now we finally know what the Eq, Ord, etc. type classes mean - they place bounds on what types the t type variable can take on.

Why Parametric Polymorphism?

This all seems unnecessarily complicated. Why do we need this? Couldn’t we just use inheritance?

Let’s take a look at an instance where just inheritance fails us. For context, languages like C# and Java have an Object class that superclasses all other classes. Let’s assume we can do polymorphism just with subclassing:

class Duck : Object {
  public void quack() {
    Console.WriteLine("Quack");
  }
  public void swim() {
    Console.WriteLine("Paddle");
  }
}

class HoldItems {
  public void add(Object val)
    { arr_[size_++] = val; }
  public Object get(int j)
    { return arr_[j]; }

  Object[] arr_ = new Object[10];
  int size_ = 0;
}

Any object can be stored in HoldItems! So, we could add both a string and a Duck to a container, since both subclass Object. So, this code would be possible…

HoldItems items = new HoldItems();
items.add(new Duck("Daffy"));
items.add("more stuff");

string s = (string)items.get(0); // Should be illegal!

Which is really bad! And, there’s no way for us (or C#, at compile-time) to detect a bug! So, we can’t allow behaviour like this.

With generics, we tell the compiler which types our generics will use:

HoldItems<Dog> items = new HoldItems();
items.add(new Duck("Daffy")); // type error!!

Parametric Polymorphism in Dynamically Typed Languages

We can’t really have “type-parameterized” classes or functions, since variables don’t specify types!

But, duck typing occupies the same feature space. Here is an example in Python:

def qsort(lst):
  if len(lst) <= 1:     # base case
      return lst

  pivot = lst[0]
  rest  = lst[1:]
  lessEq =  [x for x in rest if x <= pivot]
  greater = [x for x in rest if x >  pivot]
  return qsort(lessEq) + [pivot] + qsort(greater)

strs = ["b","c","a"]
ints = [1,5,3,2,4]

qsort(strs)    # ["a","b","c"]
qsort(ints)    # [1,2,3,4,5]

But, note that this isn’t as safe - we hvae no idea at compile-time if it’ll work!

Classify That Langauge

Consider the following parameterized class and some code that uses it below.

class StudyList<T> {
    private values: T[] = [];

    public constructor (values: T[]) {
        this.values = values;
    }

    public add(value: T): void {
        this.values.push(value);
    }

    public get(index: number): T {
        return this.values[index];
    }

    public studySession(): void {
        for (var v in this.values)
            v.study();
    }
}
...
// The Nerd class has a study() method.
const nerds = new StudyList<Nerd>([]);

nerds.add(new Nerd("Carey"));
nerds.add(new Nerd("Paul"));
nerds.studySession();

The code results in a compiler error

TypeError: v.study is not a function

Is the languages using templates or generics?

Answer Generics! This is because even though the `Nerd` class has a `study()` method, the code does not compile. If this were a template, a version of `StudyList` would be generated with `T=Nerd` and this should work just fine. This is TypeScript!

Object-Oriented Programming

Object-Oriented Programming is a programming paradigm based on the concept of objects. While classes are often used, they’re not required (like in JS!). The core idea is that objects talk to each other with “messages” - a fancy word for method calls.

A Brief History

We covered a brief history of OOP languages in class, including:

  • Sketchpad: Developed by Ivan Sutherland, it was an oscilloscope-based CAD program from 1963. It pioneered concepts like classes, objects, inheritance, and late binding/polymorphism. It wasn’t a language but a CAD program, but it embodied many OOP concepts.
  • The record: The notion of a record or struct was developed by Tony Hoare in 1966. He noticed that many things in the real world can be modeled as objects with a set of related fields, and that objects of the same category or type (e.g., Dogs) often had similar sets of fields, defining classes of objects.
  • Simula: Between 1962-1966, Ole-Johan Dahl and Kristen Nygaard, two scientists in Norway developed the Simula language to make it easier to build military simulations. One big leap with Simula was that it defined objects as having both data fields (like Hoare) and functions/methods to operate on that data. Simula pioneered the following items:
    • Classes have both functions and data
    • Objects can be instantiated from Classes
    • Use of the dot notation for object member access (e.g., doggo.bark())
    • Inheritance
    • Virtual functions/procedures to facilitate polymorphism/dynamic binding
    • Dynamic binding (we’ll talk about this later)

Take a look at some Simula classes to see how close they look to modern C++, Java, etc.:

CLASS POINT(X,Y); REAL X, Y;
  COMMENT***CARTESIAN REPRESENTATION
BEGIN
  BOOLEAN PROCEDURE EQUALS(P); REF(POINT) P;
    IF P =/= NONE THEN
      EQUALS := ABS(X-P.X) + ABS(Y-P.Y) < 0.00001;
  REAL PROCEDURE DISTANCE(P); REF(POINT) P;
    IF P == NONE THEN ERROR ELSE
      DISTANCE := SQRT( (X-P.X)**2 + (Y-P.Y)**2 );
END***POINT***
POINT CLASS COLOREDPOINT(C); COLOR C;
BEGIN
  BOOLEAN PROCEDURE EQUALS(Q); REF(COLOREDPOINT) Q;
    ...;
END***COLOREDPOINT**
REF(POINT) P; REF(COLOREDPOINT) CP;
P :- NEW POINT(1.0,2.5);
CP :- NEW COLOREDPOINT(2.5,1.0,RED);
CP.C := BLUE;
  • Smalltalk: Meanwhile, Alan Kay (who is a UCLA emeritus prof!) was at Xerox PARC developing the DynaBook - a precursor to the modern laptop. Alan wanted to design a language that could be used to implement an OS, apps, everything. Inspired by his own background in math and biology, SketchPad, Hoare’s records, and Simula, he came up with a broad conception of OOP, and developed SmallTalk. While most languages have adopted Simula-like syntax, Kay’s ideas about OOP have continued to shape the field, and Kay is credited with coining the term OOP. In Smalltalk:
    • Everything is an object
    • Objects send messages to other objects to request behaviors (like cells in an organism, or nodes on the arpanet - the precursor to the internet)
    • Each object decides how/if to respond/behave when it receives a message
    • Objects may pass messages to other objects to get their work done
    • Methods may return objects as their result to a sender

Here’s an example smalltalk program below.

"Snippet of Smalltalk code that sings '99 bottles of beer'"
99 to: 1
   by: -1
   do: [:j | Transcript show: (j printString),' bottles of beer on the wall ';cr].

Transcript show: 'No more bottles of beer on the wall!;cr.

This code implements a loop by sending the object 99 (EVERYTHING is an object in Smalltalk) a message with three parameters: (1) to, (2) by, and (3) do. The first parameter tells the 99 object what it should iterate to. The second parameter tells 99 how much to increment during each iteration, and the final parameter is a block of code to run during each iteration.

Think of this code as being:

Number n(99);
n.to_by_do(1,-1,lambda to print stuff)

To briefly conclude our history, in the 70s and 80s, Bjarne Stroustrup created C With Classes, later known as C++, by marrying ideas from Simula with C. And then Java, C#, Scala, Kotlin and Swift followed in the 2000s and 2010s.

Essential Components of OOP

So what are the essential components of OOP?

  • Classes: A class is a blueprint for creating new objects – it defines a public interface, code for methods, and data fields.
  • Interfaces: An interface is a related group of function prototypes that we want one or more classes to implement.
  • Objects: An object represents a particular “thing” - like a circle of radius 5 at location (1,3). Each object has its own interface, code, and field values.
  • Inheritance: A derived class inherits either the code, the interface, or both from a base class. The derived class can override the base class’s code or add to its interface.
  • Subtype Polymorphism: Code designed to operate on an object of type T (e.g., Person) can operate on any object that is a subtype of T (e.g., Student).
  • Dynamic Dispatch: The actual code that runs when a method is called depends on the target object, and can only be determined at runtime.
  • Encapsulation: Encapsulation is the bundling of a public interface and private data fields/code together into a cohesive unit.

Encapsulation

Encapsulation is the guiding principle behind the OOP paradigm. There are two facets to encapsulation:

  • We bundle related public interface, data, and code together into a cohesive, problem-solving machine.
  • We hide data/implementation details of that machine from its clients – forcing them to use its public interface.

Encapsulation has many benefits:

  • Simpler programs: We have reduced coupling between modules, simplifying our programs
  • Easier Improvements: We can improve implementations without impacting other components
  • Better Modularity: We can build a class once and use it over and over in different contexts

A big part of software engineering is anticipating future changes. One reason for OOP’s success is that it helps people build software that is resilient to many types of change.

Classes, Interfaces, Objects

  • Class: A class is a blueprint that specifies a public interface, code, and fields that make up a type of object.
  • Interface: An interface is a related group of function prototypes that describes behaviors that we want one or more classes to implement.
  • Object: An object is a distinct value/entity, often created from a class blueprint – each object has its own logical copy of data and methods.