Lecture 03

2023-04-10 | Week 2 | edited by Matt Wang

(originally written 2022-10-03 by Boyan Ding)

This is Matt! We continue from last lecture and start at slide 44 on the Intro to FP deck to slide 81 (note that slides 39-43 were covered in discussion). Let us know if you have questions!

Table of Contents

Functions

Our First Haskell Function

Haskell functions have three core components:

  1. (optional): type information about the function’s parameters and return value
  2. the function name and parameter names
  3. an expression that defines the function’s behaviour

Here’s a simple function in C++:

String insult(String name, String smell) {
  String s = name + " smells like " +
             smell + " and doesn't floss.";
  return s;
}

And here’s how we’d define the same function in Haskell:

insult :: String -> String -> String
insult name smell =
  name ++ " smells like " ++ smell ++ " and doesn't floss."

Some things to note:

  • there’s no explicit return since the right-hand side of a function is what it returns!
  • spaces and indentation are used to define code blocks; no braces necessary (think Python)
  • String -> String -> String may seem strange - it seems like there’s no difference between parameters and return types. We’ll talk about this later when we discuss currying.

In class, Carey said that ghci does not allow multi-line functions. This is true if you just hit the enter key. However, you can use :{ and :} to “capture” multiple lines. Here’s an example:

ghci> :{
ghci| insult name smell =
ghci|   name ++
ghci|   " smells like " ++
ghci|   smell ++
ghci|   " and doesn't floss."
ghci| :}
ghci> insult "carey" "cheese"
"carey smells like cheese and doesn't floss."

Optional Type Declarations

Type declaration is optional because of Haskell’s type inference! How does it work?

insult :: String -> String -> String
insult name smell =
  name ++ " smells like " ++ smell ++ " and doesn't floss."
  • we can only concatenate (++) lists, so we know name and smell are some type of list
  • " smells like " and " and doesn't floss." are lists of characters (recall that String = [Char])
  • since we can only concatenate lists of the same type, name and smell are Strings!!
  • the concatenation of Strings is String, and thus insult returns a String

So … why would we even use type declarations? Unlike who we’re insulting, it’s good hygiene (and, best practice)!

  • type declarations can provide better compiler warnings and errors
  • type declarations make your code easier to read
    • arguably the most important trait in software
    • the person reading it may be you six months later!
  • and, it can help you think about how to write the function itself!

So, we recommend that you annotate your functions - or at least, the user-facing/top-level ones!

Main Functions and (avoiding) Monads

This is intentionally hand-wavy. We won’t be covering monads or impure FP at all in this class.

What we haven’t talked about yet is the main function - in other words, how can we have Haskell run a function when we open a file?

(and, while we’re at it, how can we get user input - which is a side effect?)

To do it, we’ll create a function called main, and annotate it with the IO() “type” (really - a system and monad). We then add this strange do block, and write what seems like imperative code:

-- insult.hs
insult :: String -> String -> String
insult name smell =
  name ++ " smells like " ++ smell ++ " and doesn't floss."

main::IO()
main = do
  putStr "What is your name? "
  name <- getLine
  putStrLn (insult name "unwashed dog")

We can then call our main function and get some input.

ghci> :load insult
[1 of 1] Compiling Main             ( insult.hs, interpreted )
Ok, one module loaded.
ghci> main
What is your name? Carey
Carey smells like unwashed dog and doesn't floss.

Is this hand-wavy? Yes! IO (and monads, a way to deal with side effects) is not a focus for this class. If you’re interested, we suggest:

Someone in class asked if this example runs as a “main function”. It doesn’t - you need a bit more legwork (and an understanding of Haskell’s package system); more on that in GHC’s docs on the main package.

More Examples!

Let’s see some other examples!

isBigger :: Double -> Double -> Bool
isBigger a b = a > b
average :: [Double] -> Double
average list =
  (sum list) / fromIntegral (length list)

In class, Carey mentioned that fromIntegral is used to cast the value of length to be divisible, and said it maps from Integral -> Double. This is mostly true, though it’s actually more powerful than that - to see why, try :t fromIntegral!

This last example showcases the power of Haskell’s type system:

get_last_item :: [any_type] -> any_type
get_last_item lst =
  head (reverse lst)

This function works for a list of any type! So, Haskell doesn’t care what type it is. The any_type is a type variable, which we can think of as a generic or template type. We’ll refine this notion in a later lecture.

What’s cool is that we didn’t have to do anything special - it just works out of the box! And, we didn’t even need to tell Haskell that it’s generic: the type system figured it out by itself!

(out of scope of this class) Someone in class asked a great question: is this polymorphism (or generic programming) at runtime, or is a copy of this function generated for each type? In Haskell, it’s the former: there is one version of the language that “just works” for all types. This is similar to how Java does it. In other languages, like C++ or Rust, generics are compile-time: a copy of the function is made for each type that uses it (this is called monomorphization).

Practice Problem

In class, we went over a brain teaser. Consider:

f :: Int -> Int
f x = x^2

g :: Int -> Int
g x = 3 * x

What do the following two lines do?

y = f g 2
z = f 5 * 10

Well, it turns out:

  • y = f g 2 is a syntax error!
    • Haskell evaluates functions from left to right (keyword: left-associative)
    • so, f g 2 is the same as ((f g) 2), which doesn’t make sense - since f doesn’t operate on functions
  • f 5 * 10 = 250
    • in Haskell, functions have higher precedence than operators, so they’re always evaluated first
    • in other words, this is (f 5) * 10
-- f(g(x))
compute_f_of_g x = f (g x)

-- f(x * 10)
compute_f_of_x_times_ten x = f (x * 10)

If you’re curious, the Haskell spec defines operator precedence (and, how associativity works).

Local Bindings

This time, the majority of topics we cover are so called “syntactic sugar” in Haskell. They can make the programming easier. When we write code, sometime it is helpful to create “temporary variables” to simplify the code. In Haskell, it is achieved with let and where constructs.

The let keyword

Here’s an example that demonstrates how to use let:

-- let.hs
get_nerd_status gpa study_hrs =
  let
      gpa_part = 1 / (4.01 - gpa)
      study_part = study_hrs * 10
      nerd_score = gpa_part + study_part
  in
      if nerd_score > 100 then
           "You are super nerdy!"
      else "You're a nerd poser."

The let construct consists of two parts:

  • the first part is between the let and the in; here, you define one or more “bindings” to associate a name with an expression
    • for example, the third line of the function binds the name gpa_part to the expression 1 / (4.01 - gpa).
  • the second part follows in; it contains an expression where the bindings are used.

The bindings are immutable variables - you can’t change them!

The where keyword

The where keyword is similar; it just changes the syntax. With where, you first write the code, then specify the bindings after the where keyword. The equivalent example is:

-- let.hs
get_nerd_status gpa study_hrs =
  if nerd_score > 100
     then "You are super nerdy!"
     else "You're a nerd poser."
  where
      gpa_part = 1 / (4.01 - gpa)
      study_part = study_hrs * 10
      nerd_score = gpa_part + study_part

When would you use let and when do you use where? A nice rule of thumb is:

  • when defining bindings (variables) for a single expression (e.g. the example here), either one is fine
  • when defining bindings for use across multiple expressions, use where

The second case is mainly for guards and case statements. We haven’t learned them yet but we will see them shortly. It’s also more of an opinion - :)

Nested functions with let or where

We can also use let and where to define nested functions:

-- nestfunc.hs
whats_the_behavior_of name =
 if name == "Carey"
    then behaves_like name "twelve year-old"
    else behaves_like name "grown-up"
 where
   behaves_like n what =
      n ++ " behaves like a " ++ what ++ "!"

This is very helpful in breaking down bigger problems into small, reusable functions – and helps us with naming!

Nested functions can use all of their enclosing function’s bindings. So, we can make our example more concise by using name:

-- nestfunc.hs
-- Function to describe someone's behavior
whats_the_behavior_of name =
 if name == "Carey"
    then behaves_like "twelve year-old"
    else behaves_like "grown-up"
 where
   behaves_like what =
      name ++ " behaves like a " ++ what ++ "!"

Lazy execution with let and where

In the previous lecture, we touched on Haskell’s lazy evaluation for lists. This also applies to let and where: bindings are not evaluated when they are defined; only when they are actually used!

potentially_slow_func arg =
  let
      val1 = really_slow_function arg
      val2 = very_fast_function arg
      val3 = pretty_fast_function arg
  in
      if val3 > 100 then val1
                    else val2

Haskell’s behavior when running the code above is:

  • in the let block, Haskell binds val1, val2, and val3 to the expressions - but doesn’t evaluate them!
  • when the function body runs, the if condition requires val3.
  • so, Haskell evaluates the expression associated with val3, calling pretty_fast_function.
  • then, only one of val1 and val2 will evaluate.
  • so, if pretty_fast_function returns 20, then we go to the else branch. To compute val2 very_fast_function is called. We will never call really_slow_function in this case!

Note that Haskell doesn’t have to re-evaluate the binding every time it’s used - after doing it once, it can just cache it!

Laziness can be confusing with IO, since the sequence of IO actually matters!. Haskell’s solution is to disallow IO (and other side-effects) to happen in normal code. Try to think about what will happen if the code below is valid in Haskell.

-- Which getLine will run first?!?!
laziness_example arg =
  let
      val1 = getLine
      val2 = getLine
  in
      if arg > 100 then val1 + val2
                   else val2 * val1

Control Flow

We’ve been using some control flow features already, but we haven’t formalized them. Let’s do that now!

if-then-else

The general syntax of the if statement (or more precisely, expression) is:

if <expression> then <expression>
                else <expression>

For example:

ageist_greeting age =
  if age > 30 then "Hey boomer!"
              else "Hey fam!"

A key insight is that every if statement must have an accompanying else. If there wasn’t an else, and we didn’t take the if - our function wouldn’t return anything. That’s not allowed!

Think about how if-then-else works with types - if the two branches had different types, we’d break some of Haskell’s type rules. This formalizes the above argument - why is that?

Guards

Guards are a compact verison of the if-then expression (but really, are just syntactic sugar).

It allows us to write

if <condition> then <expr>

as

  | <condition> = <expr>

We can define a function as a series of one or more guards, like

somefunc param1 param2
  | <if-x-is-true> = <run-this>
  | <if-y-is-true> = <run-that>
  | <if-z-is-true> = <run-the-other>
  | otherwise = <run-this-otherwise>

A real example in Haskell:

major_guesser salary
  | salary > 150000 = "CS"
  | salary > 120000 = "EE"
  | salary < 30000 = "Any major at USC"
  | otherwise = "Probably Poli-sci"

Note:

  • we don’t need the equal sign after defining the function!
  • each guard begins with the pipe character: |
  • after that, the guard contains two parts:
    • the first part is a boolean expression; we want to evaluate to see if it is True.
    • the second part is the “payload” expression. It will return if the expression is True.
  • Haskell evaluates all of the guards from top to bottom, and returns the payload of the first guard that is True.

In guards, otherwise acts like a wildcard. In fact, otherwise evaluates to True (try it in ghci)!

Let’s take a look at some examples:

-- Factorial using if/then
fact n =
  if n <= 0 then 1
            else n * fact (n-1)

-- Factorial using guards
fact n
 | n == 0 = 1
 | otherwise = n * fact (n-1)

-- Find sm0l-est value in a list
sm0lest lst
 | lst == [] = error "empty list"
 | length lst == 1 = first
 | otherwise = min first (sm0lest rest)
 where
   first = head lst
   rest = tail lst

Note that in the last example, the values defined in the where clause are used by multiple guards.

We can use guard to implement quicksort in eight lines of code (or less!). The following implementation uses where, guards, and list comprehensions:

qsort lst
 | lst == [] = []
 | otherwise = less_eq ++ [pivot] ++ greater
 where
    pivot = head lst
    rest_lst = tail lst
    less_eq = qsort [a | a <- rest_lst, a <= pivot]
    greater = qsort [a | a <- rest_lst, a > pivot]

Neat!

Pattern Matching

Now, we’ll switch gears to an even more powerful type of syntactic sugar called pattern matching. While it’s not directly related to functions (or indeed, functional programming), it draws heavy inspiration from functional programming principles. In particular, it’ll make list and tuple processing easy!

First, we’ll explore parameter-based pattern matching, where:

  • we define multiple versions of the same function
  • each version of the function must have the same number and types of arguments
  • when the function is called, Haskell checks each definition in-order from top to bottom, and calls the first function which has matching parameters

Pattern Matching with Constants

Let’s start with matching constant values: we can place constants instead of the name of an expected argument.

-- genz.hs
genz_critic :: String -> String -> String
genz_critic "Carey" word =
  "Oh Carey... OK boomer!"
genz_critic name "lit" =
  name ++ ", you're a lit genz-er"
genz_critic name "wrekt" =
  name ++ ", git wrekt cheugy"
genz_critic name word =
  name ++ ", " ++ word ++ " is not gucci!"

In the function above, the first version has "Carey" in the place of the first argument. It only runs when user passes in "Carey" for the name argument. The same rule applies to other versions.

Here are two straightforward calls:

ghci> :load genz
[1 of 1] Compiling Main             ( genz.hs, interpreted )
Ok, one module loaded.

ghci> genz_critic "Ed" "wrekt"
"Ed, git wrekt cheugy"

ghci> genz_critic "Paul" "unix"
"Paul, unix is not gucci!"

Here’s a more tricky call, where it could match two different versions:

ghci> genz_critic "Carey" "lit"
"Oh Carey... OK boomer!

Here, we use the Carey version since it’s defined first. Order matters!

Pattern matching with constants work with other types, like numbers or lists. There is no limit on how many constant patterns you can have:

factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * factorial (n - 1)

list_len :: [a] -> Int
list_len [] = 0
list_len lst = 1 + list_len (tail lst)

has_this_many_eyes :: Int -> String
has_this_many_eyes 1 = "Cyclops"
has_this_many_eyes 2 = "Pupper"
has_this_many_eyes 3 = "Sphenodon Punctatus"
has_this_many_eyes 4 = "A nerd with glasses"
has_this_many_eyes n = "A freak!"

Pattern Matching with Tuples

One powerful use-case of pattern matching is with tuples; since we know how many items are in the tuple, we can “match” each item!

-- tuples.hs

-- Original way
exp :: (Int,Int) -> Int
exp t = (fst t) ^ (snd t)

-- Version #2: With pattern matching
expv2 :: (Int,Int) -> Int
expv2 (a,b) = a ^ b

-- Version #3: Adding a constant parameter
expv3 :: (Int,Int) -> Int
expv3 (a,0) = 1
expv3 (a,b) = a ^ b

Here, the advantage of versions 2 and 3 are their conciseness: we didn’t need to think about fst or snd!

Earlier, students asked how to extract values out of tuples that have more than 2 elements. With pattern matching, we can do that now!

-- tuples.hs
-- Extract the first value of a tuple
extract_1st :: (type1,type2,type3) -> type1
extract_1st (a,_,_) = a

-- Extract the second value of a tuple
extract_2nd :: (type1,type2,type3) -> type2
extract_2nd (_,b,_) = b

-- Extract the third value of a tuple
extract_3rd :: (type1,type2,type3) -> type3
extract_3rd (_,_,c) = c
ghci> :load tuples
[1 of 1] Compiling Main             ( tuples.hs, interpreted )
ghci> extract_3rd ("q",5,1.7)
1.7

The underscore _ is a wildcard: anything can match this item. In particular, note that we don’t even care about its type!

New learners are often tempted to write code like:

tuple_eq (a, a) = True   -- WRONG!!!
tuple_eq _ = False

Haskell will complain about “conflicting definitions for 'a'” on the first line. To compare the two values obtained from pattern matching, they have to be bound to different names and compared afterwards.

-- using guards
tuple_eq (a, b)
  | a == b = True
  | otherwise = False

-- or, use (==) directly
tuple_eq (a, b) = a == b

This differs from Prolog, a language we’ll explore much later in the course.

You might’ve noticed that _ is the exception to the above rule. Haskell has some special logic for _ – it doesn’t perform an actual binding, so you can use multiple _ in a tuple with no problems!

Pattern Matching with Lists

We can apply similar logic to pattern matching with lists. Here, we’ll rely on the cons operator (:) to form our structure. In particular, I (Matt) like to think of this as “inverse cons”. Something of the form x:xs matches the head x (first item) and the tail xs (last item); we can think of it as matching the entire list of x : xs!

-- lists.hs
-- Extract the first item from a list
get_first :: [a] -> a
get_first (x:xs) = x

-- Extract all but the first item of a list
get_rest :: [a] -> [a]
get_rest (x:xs) = xs

-- Extract the second item from a list
get_second :: [a] -> a
get_second (x:y:xs) = y

get_second comes naturally from our view of pattern matching: we can think of this as “inverse cons” for x : (y : xs), which means that x is the head, and y is the next item after the head.

ghci> :load lists
[1 of 1] Compiling Main             ( lists.hs, interpreted )
ghci> get_first [5,10,15,20]
5

ghci> get_rest [5,10,15,20]
[10,15,20]

ghci> get_second [5,10,15,20]
10

ghci> get_rest [5,6]
[6]

ghci> get_rest [5]
[]

We can also use the constant [] in place of the tail. As you’d expect, this only matches patterns where the tail is exactly the empty list.

-- favs.hs
favorites :: [String] -> String
favorites [] = "You have no favorites."
favorites (x:[]) = "Your favorite is " ++ x
favorites (x:y:[]) = "You have two favorites: "
 ++ x ++ " and " ++ y
favorites ("chocolate":xs) =
  "You have many favs, but chocolate is #1!"
favorites (x:y:_) =
   "You have at least three favorites!"
ghci> :load favs
[1 of 1] Compiling Main             ( favs.hs, interpreted )
ghci> favorites []
"You have no favorites."

ghci> favorites ["chocolate"]
"Your favorite is chocolate"

ghci> favorites ["mint","earwax"]
"You have two favorites: mint and earwax"

ghci> favorites ["chocolate","egg","dirt"]
"You have many favs, but chocolate is #1!"

ghci> favorites ["tea","coffee","carrot"]
"You have at least three favorites!"

There is another notation that uses brackets ([]) instead of parens (()). It’s not used as commonly as the parenthetic version. The core difference here is that it only matches lists of an exact length; each item in the list is just an item (there are no tails, etc.).

-- lists.hs\
whats_in_your_list [10] =
    "Your list has just 1 value which is 10."
whats_in_your_list [a,b] = "Your list has 2 values: "
    ++ (show a) ++ " and " ++ (show b)
whats_in_your_list [9,_,c] =
    "Starts with 9 and its third item is:"  ++ (show c)
whats_in_your_list _ =
    "Your list had something else in it."
ghci> :load lists
[1 of 1] Compiling Main             ( lists.hs, interpreted )
ghci> whats_in_your_list [10]
"Your list has just 1 value which is 10."

ghci> whats_in_your_list [10,20]
"Your list has 2 values: 10 and 20"

ghci> whats_in_your_list [9,10,20]
"Starts with 9 and its third item is 20."

ghci> whats_in_your_list [9,10,20,30]
"Your list had something else in it."

Broadly, pattern matching with lists makes our code more concise and readable! Consider our previous guard-based sm0lest:

sm0lest lst
 | lst == [] = error "empty list"
 | length lst == 1 = first
 | otherwise = min first (sm0lest rest)
 where
   first = head lst
   rest = tail lst

With pattern matching, the logic becomes more compact!

sm0lest [] = error "empty list"
sm0lest [x] = x
sm0lest (x:xs) = min x (sm0lest xs)

The same applies to the following reverse list example:

-- Reverse a list w/o pattern matching
rev lst
 | lst == [] = []
 | otherwise = (rev rest) ++ [first]
 where
  first = head lst
  rest  = tail lst

-- Reverse a list with pattern matching
rev [] = []
rev (x:xs) = (rev xs) ++ [x]

This reverse algorithm is not optimal; it has O(N^2) complexity (in the length of the list). Can you figure out an O(N) algorithm?

Influence

Pattern matching is mostly used in functional languages, but it’s notably been adopted by Python and Rust.

Python added pattern matching very recently (in version 3.10, released in 2021). Python’s pattern matching works on tuples, lists, dictionaries, and more! Python is a dynamically typed language, so match cases can have different types.

def func(val):  # Python pattern matching example
  match val:
    case (0, x):        # match tuple w/first elem zero
        print(f"Tuple of 0 and {x}")
    case ['a', x, 'c']: # match list w/3 elems: 'a', ??, 'c'
        print(f"List of a, {x}, and c")
    case [1, 2, *rest]: # seq of: 1, 2, ... other elements
        print(f'A list with 1, 2, and *{rest}')
    case {'foo': bar}:  # match dict w/key 'foo'
        print(f"foo maps to {bar} ")
    case _:
        print('no match')

Rust has almost always had pattern matching. It’s statically typed, so this example looks quite like Haskell. Here’s some pattern matching that Matt literally wrote two days ago:

match vtree_type {
  VTreeType::LeftLinear => VTree::left_linear(vars),
  VTreeType::RightLinear => VTree::right_linear(vars),
  VTreeType::EvenSplit(num) => VTree::even_split(vars, num),
  VTreeType::FromDTree => {
    let dtree = DTree::from_cnf(&cnf, &VarOrder::linear_order(cnf.num_vars()));
    VTree::from_dtree(&dtree).unwrap()
  }
}