Skip to main content Link Search Menu Expand Document (external link)

Lecture 07

2022-10-17 | Week 4 | by Siddarth Krishnamoorthy

Hi! Siddarth here. This lecture note covers the first 41 slides of data palooza.

Table of Contents

Data Palooza

In the next few lectures, we will be covering the internals of how many languages manage data (including types, variables and values). We won’t be covering specific languages, but will instead look at patterns that occur across many languages. Before that, let’s do a brief introduction of some of the terms we will be discussing.

Variables and Values

A variable is the symbolic name associated with a location that contains a value or a pointer. A value is a piece of data with a type (usually) that is either referred to by a variable, or computed by a program. For a concrete example, consider the statement

a = 42

Here a is a variable, and 42 is a value.

What are the facets that make up a variable?

  • names: variables almost always have a name
  • types: a variable may (or may not) have a type associated with it
  • values: a variable stores a value (and its type)
  • binding: how a variable is connected to its current value
  • storage: the slot of memory that stores the value associated with the variable
  • lifetime: the timeframe over which a variable exists
  • scope: when/where a variable is accessible by code
  • mutability: can a variable’s value be changed?

What are the facets that make up a value?

  • names: variables almost always have a name
  • types: a value will always have a type associated with it
  • values: a variable stores a value (and its type)
  • binding: how a variable is connected to its current value
  • storage: the slot of memory that stores the value
  • lifetime: the timeframe over which a value exists
  • scope: when/where a variable is accessible by code
  • mutability: can a value be changed?

Lifetime and scope seem similar, but they are not the same. Lifetime refers to the existence of the variable, whereas scope refers to the accessibility of a variable. It is entirely possible for a variable to be out-of-scope but still be alive.

Variable names

What must a language designer consider when deciding variable naming rules for a language? There are multiple possible choices here, and there isn’t one single correct answer.

  • Almost all languages stipulate that names should contain valid characters
  • Almost all languages stipulate that names should not be the same as keywords or constants
  • Most languages have a rule that disallows spaces in variable names
  • Some languages have rules about special characters in names, some enforce length restrictions, and some even enforce some sort of case sensitivity rule.

Practically, variable naming conventions are more important than the naming rules of languages. These conventions are usually designed to enforce some degree of standardisation across a codebase. A good example of a standard would be Google’s standard.

Why do most loops idiomatically use i or j for loop variables? It all goes back to Fortran, the first standardised programming langauge. Fortran had a quirky rule where if you didn’t explicitly declare a variable type, it would the first letter of the name to determine the type. Variables starting from a-h and o-z defaulted to floats, while variables starting from i-n defaulted to integers.

Variable storage

Variables and values are stored in one of three different places: the stack, the heap or the static data area.

stack
Layout of program memory. Function parameters and local variables are stored on the stack. Variables dynamically allocated during runtime lie in the heap. Static members and globals are stored in the static data area.

Usually local variables and function parameters are stored on the stack. Dynamically allocated objects and values are usually stored on the heap. Most languages store global and static variables in the static data area. Of course, you can also have combinations of these; for example, a pointer that is stored on the stack, but whose value lies on the heap.

Variable types

What can you infer about a value, given its type?

  • The set of legal values it can hold
  • The operations we can perform on it
  • How much memory you need
  • How to interpret the bytes stored in RAM
  • How values are converted between types

Variable lifetime and scope

Every variable and value has a lifetime, over which they are valid and may be accessed. Note that it is possible for a variable to be “alive” during execution, and yet not be accessible. Some languages give the programmer explicit control over the lifetime of a value, while others completely abstract this away.

void bar() {
  ...
}

int main() {
  int x = 5;
  bar();
  ...
}

A variable is in scope if it can be explicitly accessed by name in that region.

void bar(int *ptr) {
  cout << x;    // ERROR! x isn’t in-scope here!
  cout << *ptr; // Even though its value can be accessed
}

void foo() {
  int x;
  cout << x;    // x is in-scope here
  bar(&x);
}

There are two primary approaches to scoping: lexical (or static) scoping and dynamic scoping. We will cover these topics in the next lecture.

Variable binding

As we have seen in Lecture 1, different languages have different ways of binding variable names with values. For example a C++ variable name directly refers to the storage location holding it’s value, while in Python, a variable holds an “object reference”, which then in turn points to the actual value. We will cover the major binding approaches in detail later on.

Mutability

If a variable is immutable, then it’s value cannot be changed once assigned, and vice versa if a variable is mutable. Most languages have some form of immutability. In C++, variables can be made immutable using the const keyword. In Haskell, all variables are by default immutable. Immutability might seem like it makes the language less flexible (and it does to some extent), it also allows for the program to be less buggy, and can also allow for some compiler optimisations to take place.

Types

In this section, we will take a deep dive into typing and type checking. You should hopefully get a much better idea of the implications of a type system.

Before proceeding, it’s worth asking if are types even necessary to create a programming language? Yes! It is possible to have a language with no types. Assembly languages are one such example of languages with no type system. They just have a register that holds a 32 (or 64) bit value. The value could represent anything (an integer, float, pointer, etc.). BLISS is another example of a language with no types.

MODULE E1 (MAIN = CTRL) =
BEGIN
FORWARD ROUTINE
    CTRL,
    STEP;
ROUTINE CTRL =
BEGIN
    EXTERNAL ROUTINE
        GETNUM,     ! Input a number from terminal
        PUTNUM;     ! Output a number to terminal
    LOCAL
        X,          ! Storage for input value
        Y;          ! Storage for output value
    GETNUM(X);
    Y = STEP(.X);
    PUTNUM(.Y)
    END;
ROUTINE STEP(A) =
    (.A+1);
END
ELUDOM

But virtually all modern languages have a type system, since it makes programming so much easier and safer.

Must every variable have a type in a typed language? No! In Python for example, variables are not associated with types.

# Python
def foo(q):
  if q:
    x = "What's my type?" # string
  ...
  x = 10 # int

In general, many dynamically typed (don’t worry, we will cover them in this lecture) languages don’t associate variables with types. However, note that a value is always associated with a type.

Types can broadly be classified into two categories, primitive and composite. Primitive data types are a set of types from which other types are constructed. Composite data types are those types which can be constructed using primitive and composite data types.

Examples of primitive data types:

  • Integers
  • Booleans
  • Characters
  • Enumerated types (Enums)
  • Floats
  • Pointers

Examples of derived data types:

  • Arrays
  • Structs
  • Variants/Unions
  • Objects
  • Tuples
  • Containers

There can also be data types that don’t fall into either category.

  • Functions
  • Generic types
  • Boxed types

A boxed type is just an object whose only data member is a primitive (like an int or a float)

class Integer {
 public:
   int get() const { return val_; }
 private:
   int val_;
};

Type checking

Type checking is the process of verifying and enforcing constriants on types. Type checking can occur during compile time (static) or during run time (dynamic). The language can also specify the degree of strictness for type checking (strong and weak type checking). We will go into more detail on all of these topics.

type-checking
Type checking approaches. Type checkers can be classified on the basis of whether they run during compile time or during run time. Type checkers can also be classified on the basis of strictness (strong or weak).

Static typing

Static typing is the process of verifying that all operations in a program are consistent with the types of the operands prior to program execution. Types can either be explicitly specified (like in C++) or can be inferred from code (like in Haskell). For inferred types, if the type checker cannot assign distinct types to all variables, functions, and expressions or cannot verify type safety, then it generates an error. But if a program type checks, then it means the code is (largely) type safe, and few if any run time checks need to be performed.

To support static typing, a language must have a fixed type bound to each variable at the time of definition.

Type Inference

Type inference refers to the automatic detection of types of expressions or variables in a language. Consider the following example:

void foo(_____ x, _____ y) {
  cout << x + 10;
  cout << y + " is a string!";
}

What would be the types of x and y? int and string, right! The compiler can infer that y is an string from the fact that we perform the operation y + " is a string", which is only valid if y is a string. What about x? It seems like x can only be an int. But consider this example:

void foo(_____ x, _____ y) {
  cout << x + 10;
  cout << y + " is a string!";
}
void bar() {
  double d = 3.14;
  foo(d,"barf");
}

Here, it makes more sense for x to be a double. So type inference is actually a complex constraint satisfaction problem.

Many statically typed languages now offer some form of type inference. C++ has the auto keyword

// C++ type inference with auto
int main() {
  auto x = 3.14159; // x is of type int
  vector<int> v;
  ...

  for (auto item: v) { // item is of type int
    cout << item << endl;
  }

  auto it = v.begin(); // it is of type vector<int>::iterator
  while(it != v.end()) {
    cout << *it << endl;
    ++it;
  }
}

Go similarly infers the type of variables from the right-hand-side expression

// GoLang type inference
func main() {
  msg := "I like languages"; // string
  n := 5 // int
  for i := n; i > 0; i-- {
    fmt.Println(msg);
  }
}

Conservatism

Static type checking is inherently conservative. This means that the static type checker will disallow perfectly valid programs that never violate typing rules.

This is due to the fact that for Turing-complete languages, it is not possible to have a sound (meaning all incorrect programs are rejected), decidable (meaning that it is possible to write an algorithm that determines whether a program is well-typed), and complete (meaning no correct programs are rejected) type system. This follows from Gödel’s incompleteness theorems. (Don’t worry if you don’t understand this, it won’t be on the exam :))

For an example, consider the following program:

class Mammal {
public:
 virtual void makeNoise() { cout << "Breathe\n"; }
};
class Dog: public Mammal {
public:
 void makeNoise() override { cout << "Ruff\n"; }
 void bite() { cout << "Chomp\n"; }
};
class Cat: public Mammal {
public:
 void makeNoise() override { cout << "Meow!\n"; }
 void scratch() { cout << "Scrape!\n"; }
};

void handlePet(Mammal& m, bool bite, bool scratch) {
 m.makeNoise();
 if (bite)
   m.bite();
 if (scratch)
   m.scratch();
}

int main() {
 Dog spot;
 Cat meowmer;
 handlePet(spot, true, false);
 handlePet(meowmer, false, true);
}

The compiler generates an error during the compilation of this code

error: no member named 'scratch' in 'Mammal'

even though the code only asks Dogs to bite and Cats to scratch.

Pros of static type checking

  • Produces faster code (since we don’t have to type check during run time)
  • Allows for earlier bug detection (at compile time)
  • No need to write custom type checks

Cons of static type checking

  • Static typing is conservative and may error our perfectly valid code
  • It requires a type checking phase before execution, which can be slow

Dynamic typing

Dynamic type checking is the process of verifying the type safety of a program at run time. If the code is attempting an illegal operation on a variable’s value, an exception is generated and the program crashes. Here are some examples of dynamic type checking at run time.

def add(x,y):
 print(x + y)

def foo():
 a = 10
 b = "cooties"
 add(a,b)

outputs

TypeError: unsupported operand type(s) for +: 'int' and 'str'
def do_something(x):
 x.quack()

def main():
 a = Lion("Leo")
 do_something(a)

outputs

AttributeError: 'Lion' object has no attribute 'quack'

Most dynamically typed languages don’t require explicit type annotations for variables. This means that a given variable name could refer to values of multiple types over the course of execution of the program.

def guess_my_number(secret_value):
 val = get_number_from_user()

 if val > 100:
   val = "Your guess is too large!"
 else:
   ...

How does a program written in a dynamically typed language detect type violations? Usually the compiler/interpreter would use a type tag, an extra piece of data stored along with each value that indicates the type of the value.

def add(x,y):
 print(x + y)

def foo():
 a = 10
 b = "cooties"
 add(a, b)

So in this example, the compiler would store the fact that the value a is pointing to is an int with value 10, and b is a string with value cooties.

Sometimes statically typed languages also need to perform run time type checks. This is most often seen during down-casting (when casting an object from a child class to a parent class). Consider the following example.

class Person { ... };
class Student : public Person { ... };
class Doctor : public Person { ... };
void partay(Person *p) {
  Student *s = dynamic_cast<Student *>(p);
  if (s != nullptr)
    s->getDrunkAtParty();
}
int main() {
  Doctor *d = new Doctor("Dr. Fauci");
  partay(d);
  delete d;
}

dynamic_cast does a run time check to ensure that the type conversion (from Person* to Student*) we are performing is valid. If it isn’t valid, dynamic_cast will return nullptr.

In strongly typed languages, variants would also require dynamic type checking.

Pros of dynamic type checking

  • Increased flexibility
  • Often easier to implement generics that operate on many different types of data
  • Simpler code due to fewer type annotations
  • Makes for faster prototyping

Cons of dynamic type checking

  • We detect errors much later
  • Code is slower due to run time type checking
  • Requires more testing for the same level of assurance
  • No way to guarantee safety across all possible executions (unlike static type checking)

Duck typing

Duck typing emerges as a consequence of dynamic typing. With static typing, we determine the what operations work on a particular value/variable based on the type of the value/variable. But a consequence of dynamic typing is that variables no longer have a fixed type. This means that we can pass a value of any type to a function, and as long as the type of the value implements all of the operations used in the function, the code should work fine.

Consider the following example.

# Python "duck" typing
class Duck:
 def swim(self):
  print("Paddle paddle paddle")

class OlympicSwimmer:
 def swim(self):
  print("Back stroke back stroke")

class Professor:
 def profess(self):
  print("Blah blah blah blah blah")

def process(x):
 x.swim()

def main():
 d = Duck()
 s = OlympicSwimmer()
 p = Professor()
 process(d) # Paddle paddle paddle
 process(s) # Back stroke back stroke
 process(p) # throws AttributeError

Since Duck and OlympicSwimmer both implement the swim method, the code works. When it gets to Professor, which doesn’t implement swim, that’s when you get the error. The values you pass to process only need to implement swim, they do not have to be related to each other. Duck typing has some interesting applications.

Supporting enumeration

In Python, you can make any class iterable by just implementing the __iter__ and __next__ methods.

# Python duck typing for iteration
class Cubes:
  def __init__(self, lower, upper):
    self.upper = upper
    self.n = lower
  def __iter__(self):
    return self
  def __next__(self):
    if self.n < self.upper:
      s = self.n ** 3
      self.n += 1
      return s
    else:
      raise StopIteration

for i in Cubes(1, 4):
  print(i)             # prints 1, 8, 27

Making any class printable

In Python, you can make any class printable (using print) by just implementing the __str__ function

# Python duck typing for printing objects
class Duck:
 def __init__(self, name, feathers):
  self.name = name
  self.feathers = feathers

 def __str__(self):
  return self.name + " with " + \
         str(self.feathers) + " feathers."

d = Duck("Daffy", 3)
print(d)

Testing for equality

In Python, if you add the __eq__ method to any class, you can make it’s objects “comparable”.

# Python duck typing for equality
class Duck:
 def __init__(self, name, feathers):
  self.name = name
  self.feathers = feathers

 def __eq__(self, other):
  return (
    self.name == other.name and
    self.feathers == other.feathers
  )

duck1 = Duck("Carey",19)
duck2 = Duck("Carey",19)

if duck1 == duck2:
 print("Those are the same duck!")

Other dynamically typed languages also offer duck typing!

Ruby

# ruby duck typing
class Duck
 def quack
  puts "Quack, quack"
 end
end

class Dog
 def quack
  puts "Woof... I mean quack!"
 end
end

animals = [Duck.new,Dog.new]
animals.each do |animal|
 animal.quack()
end

JavaScript

// JavaScript duck typing
var cyrile_the_duck = {
  swim: function ()
          { console.log("Paddle paddle!"); },
  color: "brown"
};

var michael_phelps = {
  swim: function ()
          { console.log("Back stroke!"); },
  outfit: "Speedos"
};

function process(who) {
  who.swim();
}

process(cyrile_the_duck);  // Paddle paddle!
process(michael_phelps);   // Back stroke!

It is possible to have something similar to duck typing in statically typed languages (like C++) as well. In C++, this is done using templates.