Lecture 03
2023-04-10 | Week 2 | edited by Matt Wang
(originally written 2022-10-03 by Boyan Ding)
This is Matt! We continue from last lecture and start at slide 44 on the Intro to FP deck to slide 81 (note that slides 39-43 were covered in discussion). Let us know if you have questions!
Table of Contents
Functions
Our First Haskell Function
Haskell functions have three core components:
- (optional): type information about the function’s parameters and return value
- the function name and parameter names
- an expression that defines the function’s behaviour
Here’s a simple function in C++:
String insult(String name, String smell) {
String s = name + " smells like " +
smell + " and doesn't floss.";
return s;
}
And here’s how we’d define the same function in Haskell:
insult :: String -> String -> String
insult name smell =
name ++ " smells like " ++ smell ++ " and doesn't floss."
Some things to note:
- there’s no explicit
return
since the right-hand side of a function is what it returns! - spaces and indentation are used to define code blocks; no braces necessary (think Python)
String -> String -> String
may seem strange - it seems like there’s no difference between parameters and return types. We’ll talk about this later when we discuss currying.
In class, Carey said that ghci
does not allow multi-line functions. This is true if you just hit the enter key. However, you can use :{
and :}
to “capture” multiple lines. Here’s an example:
ghci> :{
ghci| insult name smell =
ghci| name ++
ghci| " smells like " ++
ghci| smell ++
ghci| " and doesn't floss."
ghci| :}
ghci> insult "carey" "cheese"
"carey smells like cheese and doesn't floss."
Optional Type Declarations
Type declaration is optional because of Haskell’s type inference! How does it work?
insult :: String -> String -> String
insult name smell =
name ++ " smells like " ++ smell ++ " and doesn't floss."
- we can only concatenate (
++
) lists, so we knowname
andsmell
are some type of list " smells like "
and" and doesn't floss."
are lists of characters (recall thatString = [Char]
)- since we can only concatenate lists of the same type,
name
andsmell
areString
s!! - the concatenation of
String
s isString
, and thusinsult
returns aString
So … why would we even use type declarations? Unlike who we’re insulting, it’s good hygiene (and, best practice)!
- type declarations can provide better compiler warnings and errors
- type declarations make your code easier to read
- arguably the most important trait in software
- the person reading it may be you six months later!
- and, it can help you think about how to write the function itself!
So, we recommend that you annotate your functions - or at least, the user-facing/top-level ones!
Main Functions and (avoiding) Monads
This is intentionally hand-wavy. We won’t be covering monads or impure FP at all in this class.
What we haven’t talked about yet is the main function - in other words, how can we have Haskell run a function when we open a file?
(and, while we’re at it, how can we get user input - which is a side effect?)
To do it, we’ll create a function called main
, and annotate it with the IO()
“type” (really - a system and monad). We then add this strange do
block, and write what seems like imperative code:
-- insult.hs
insult :: String -> String -> String
insult name smell =
name ++ " smells like " ++ smell ++ " and doesn't floss."
main::IO()
main = do
putStr "What is your name? "
name <- getLine
putStrLn (insult name "unwashed dog")
We can then call our main
function and get some input.
ghci> :load insult
[1 of 1] Compiling Main ( insult.hs, interpreted )
Ok, one module loaded.
ghci> main
What is your name? Carey
Carey smells like unwashed dog and doesn't floss.
Is this hand-wavy? Yes! IO (and monads, a way to deal with side effects) is not a focus for this class. If you’re interested, we suggest:
- taking CS 231 (here are some wonderful notes from Galen Wong)
- read the Learn You a Haskell chapters on IO and Monads
Someone in class asked if this example runs as a “main function”. It doesn’t - you need a bit more legwork (and an understanding of Haskell’s package system); more on that in GHC’s docs on the main
package.
More Examples!
Let’s see some other examples!
isBigger :: Double -> Double -> Bool
isBigger a b = a > b
average :: [Double] -> Double
average list =
(sum list) / fromIntegral (length list)
In class, Carey mentioned that fromIntegral
is used to cast the value of length
to be divisible, and said it maps from Integral -> Double
. This is mostly true, though it’s actually more powerful than that - to see why, try :t fromIntegral
!
This last example showcases the power of Haskell’s type system:
get_last_item :: [any_type] -> any_type
get_last_item lst =
head (reverse lst)
This function works for a list of any type! So, Haskell doesn’t care what type it is. The any_type
is a type variable, which we can think of as a generic or template type. We’ll refine this notion in a later lecture.
What’s cool is that we didn’t have to do anything special - it just works out of the box! And, we didn’t even need to tell Haskell that it’s generic: the type system figured it out by itself!
(out of scope of this class) Someone in class asked a great question: is this polymorphism (or generic programming) at runtime, or is a copy of this function generated for each type? In Haskell, it’s the former: there is one version of the language that “just works” for all types. This is similar to how Java does it. In other languages, like C++ or Rust, generics are compile-time: a copy of the function is made for each type that uses it (this is called monomorphization).
Practice Problem
In class, we went over a brain teaser. Consider:
f :: Int -> Int
f x = x^2
g :: Int -> Int
g x = 3 * x
What do the following two lines do?
y = f g 2
z = f 5 * 10
Well, it turns out:
y = f g 2
is a syntax error!- Haskell evaluates functions from left to right (keyword: left-associative)
- so,
f g 2
is the same as((f g) 2)
, which doesn’t make sense - sincef
doesn’t operate on functions
f 5 * 10 = 250
- in Haskell, functions have higher precedence than operators, so they’re always evaluated first
- in other words, this is
(f 5) * 10
-- f(g(x))
compute_f_of_g x = f (g x)
-- f(x * 10)
compute_f_of_x_times_ten x = f (x * 10)
If you’re curious, the Haskell spec defines operator precedence (and, how associativity works).
Local Bindings
This time, the majority of topics we cover are so called “syntactic sugar” in Haskell. They can make the programming easier. When we write code, sometime it is helpful to create “temporary variables” to simplify the code. In Haskell, it is achieved with let
and where
constructs.
The let
keyword
Here’s an example that demonstrates how to use let
:
-- let.hs
get_nerd_status gpa study_hrs =
let
gpa_part = 1 / (4.01 - gpa)
study_part = study_hrs * 10
nerd_score = gpa_part + study_part
in
if nerd_score > 100 then
"You are super nerdy!"
else "You're a nerd poser."
The let
construct consists of two parts:
- the first part is between the
let
and thein
; here, you define one or more “bindings” to associate a name with an expression- for example, the third line of the function binds the name
gpa_part
to the expression1 / (4.01 - gpa)
.
- for example, the third line of the function binds the name
- the second part follows
in
; it contains an expression where the bindings are used.
The bindings are immutable variables - you can’t change them!
The where
keyword
The where
keyword is similar; it just changes the syntax. With where
, you first write the code, then specify the bindings after the where
keyword. The equivalent example is:
-- let.hs
get_nerd_status gpa study_hrs =
if nerd_score > 100
then "You are super nerdy!"
else "You're a nerd poser."
where
gpa_part = 1 / (4.01 - gpa)
study_part = study_hrs * 10
nerd_score = gpa_part + study_part
When would you use let
and when do you use where
? A nice rule of thumb is:
- when defining bindings (variables) for a single expression (e.g. the example here), either one is fine
- when defining bindings for use across multiple expressions, use
where
The second case is mainly for guards and case statements. We haven’t learned them yet but we will see them shortly. It’s also more of an opinion - :)
Nested functions with let
or where
We can also use let
and where
to define nested functions:
-- nestfunc.hs
whats_the_behavior_of name =
if name == "Carey"
then behaves_like name "twelve year-old"
else behaves_like name "grown-up"
where
behaves_like n what =
n ++ " behaves like a " ++ what ++ "!"
This is very helpful in breaking down bigger problems into small, reusable functions – and helps us with naming!
Nested functions can use all of their enclosing function’s bindings. So, we can make our example more concise by using name
:
-- nestfunc.hs
-- Function to describe someone's behavior
whats_the_behavior_of name =
if name == "Carey"
then behaves_like "twelve year-old"
else behaves_like "grown-up"
where
behaves_like what =
name ++ " behaves like a " ++ what ++ "!"
Lazy execution with let
and where
In the previous lecture, we touched on Haskell’s lazy evaluation for lists. This also applies to let
and where
: bindings are not evaluated when they are defined; only when they are actually used!
potentially_slow_func arg =
let
val1 = really_slow_function arg
val2 = very_fast_function arg
val3 = pretty_fast_function arg
in
if val3 > 100 then val1
else val2
Haskell’s behavior when running the code above is:
- in the
let
block, Haskell bindsval1
,val2
, andval3
to the expressions - but doesn’t evaluate them! - when the function body runs, the
if
condition requiresval3
. - so, Haskell evaluates the expression associated with
val3
, callingpretty_fast_function
. - then, only one of
val1
andval2
will evaluate. - so, if
pretty_fast_function
returns 20, then we go to theelse
branch. To computeval2
very_fast_function
is called. We will never callreally_slow_function
in this case!
Note that Haskell doesn’t have to re-evaluate the binding every time it’s used - after doing it once, it can just cache it!
Laziness can be confusing with IO, since the sequence of IO actually matters!. Haskell’s solution is to disallow IO (and other side-effects) to happen in normal code. Try to think about what will happen if the code below is valid in Haskell.
-- Which getLine will run first?!?!
laziness_example arg =
let
val1 = getLine
val2 = getLine
in
if arg > 100 then val1 + val2
else val2 * val1
Control Flow
We’ve been using some control flow features already, but we haven’t formalized them. Let’s do that now!
if
-then
-else
The general syntax of the if
statement (or more precisely, expression) is:
if <expression> then <expression>
else <expression>
For example:
ageist_greeting age =
if age > 30 then "Hey boomer!"
else "Hey fam!"
A key insight is that every if
statement must have an accompanying else
. If there wasn’t an else
, and we didn’t take the if
- our function wouldn’t return anything. That’s not allowed!
Think about how if
-then
-else
works with types - if the two branches had different types, we’d break some of Haskell’s type rules. This formalizes the above argument - why is that?
Guards
Guards are a compact verison of the if-then
expression (but really, are just syntactic sugar).
It allows us to write
if <condition> then <expr>
as
| <condition> = <expr>
We can define a function as a series of one or more guards, like
somefunc param1 param2
| <if-x-is-true> = <run-this>
| <if-y-is-true> = <run-that>
| <if-z-is-true> = <run-the-other>
| otherwise = <run-this-otherwise>
A real example in Haskell:
major_guesser salary
| salary > 150000 = "CS"
| salary > 120000 = "EE"
| salary < 30000 = "Any major at USC"
| otherwise = "Probably Poli-sci"
Note:
- we don’t need the equal sign after defining the function!
- each guard begins with the pipe character:
|
- after that, the guard contains two parts:
- the first part is a boolean expression; we want to evaluate to see if it is
True
. - the second part is the “payload” expression. It will return if the expression is
True
.
- the first part is a boolean expression; we want to evaluate to see if it is
- Haskell evaluates all of the guards from top to bottom, and returns the payload of the first guard that is
True
.
In guards, otherwise
acts like a wildcard. In fact, otherwise
evaluates to True
(try it in ghci
)!
Let’s take a look at some examples:
-- Factorial using if/then
fact n =
if n <= 0 then 1
else n * fact (n-1)
-- Factorial using guards
fact n
| n == 0 = 1
| otherwise = n * fact (n-1)
-- Find sm0l-est value in a list
sm0lest lst
| lst == [] = error "empty list"
| length lst == 1 = first
| otherwise = min first (sm0lest rest)
where
first = head lst
rest = tail lst
Note that in the last example, the values defined in the where
clause are used by multiple guards.
We can use guard to implement quicksort in eight lines of code (or less!). The following implementation uses where
, guards, and list comprehensions:
qsort lst
| lst == [] = []
| otherwise = less_eq ++ [pivot] ++ greater
where
pivot = head lst
rest_lst = tail lst
less_eq = qsort [a | a <- rest_lst, a <= pivot]
greater = qsort [a | a <- rest_lst, a > pivot]
Neat!
Pattern Matching
Now, we’ll switch gears to an even more powerful type of syntactic sugar called pattern matching. While it’s not directly related to functions (or indeed, functional programming), it draws heavy inspiration from functional programming principles. In particular, it’ll make list and tuple processing easy!
First, we’ll explore parameter-based pattern matching, where:
- we define multiple versions of the same function
- each version of the function must have the same number and types of arguments
- when the function is called, Haskell checks each definition in-order from top to bottom, and calls the first function which has matching parameters
Pattern Matching with Constants
Let’s start with matching constant values: we can place constants instead of the name of an expected argument.
-- genz.hs
genz_critic :: String -> String -> String
genz_critic "Carey" word =
"Oh Carey... OK boomer!"
genz_critic name "lit" =
name ++ ", you're a lit genz-er"
genz_critic name "wrekt" =
name ++ ", git wrekt cheugy"
genz_critic name word =
name ++ ", " ++ word ++ " is not gucci!"
In the function above, the first version has "Carey"
in the place of the first argument. It only runs when user passes in "Carey"
for the name
argument. The same rule applies to other versions.
Here are two straightforward calls:
ghci> :load genz
[1 of 1] Compiling Main ( genz.hs, interpreted )
Ok, one module loaded.
ghci> genz_critic "Ed" "wrekt"
"Ed, git wrekt cheugy"
ghci> genz_critic "Paul" "unix"
"Paul, unix is not gucci!"
Here’s a more tricky call, where it could match two different versions:
ghci> genz_critic "Carey" "lit"
"Oh Carey... OK boomer!
Here, we use the Carey version since it’s defined first. Order matters!
Pattern matching with constants work with other types, like numbers or lists. There is no limit on how many constant patterns you can have:
factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * factorial (n - 1)
list_len :: [a] -> Int
list_len [] = 0
list_len lst = 1 + list_len (tail lst)
has_this_many_eyes :: Int -> String
has_this_many_eyes 1 = "Cyclops"
has_this_many_eyes 2 = "Pupper"
has_this_many_eyes 3 = "Sphenodon Punctatus"
has_this_many_eyes 4 = "A nerd with glasses"
has_this_many_eyes n = "A freak!"
Pattern Matching with Tuples
One powerful use-case of pattern matching is with tuples; since we know how many items are in the tuple, we can “match” each item!
-- tuples.hs
-- Original way
exp :: (Int,Int) -> Int
exp t = (fst t) ^ (snd t)
-- Version #2: With pattern matching
expv2 :: (Int,Int) -> Int
expv2 (a,b) = a ^ b
-- Version #3: Adding a constant parameter
expv3 :: (Int,Int) -> Int
expv3 (a,0) = 1
expv3 (a,b) = a ^ b
Here, the advantage of versions 2 and 3 are their conciseness: we didn’t need to think about fst
or snd
!
Earlier, students asked how to extract values out of tuples that have more than 2 elements. With pattern matching, we can do that now!
-- tuples.hs
-- Extract the first value of a tuple
extract_1st :: (type1,type2,type3) -> type1
extract_1st (a,_,_) = a
-- Extract the second value of a tuple
extract_2nd :: (type1,type2,type3) -> type2
extract_2nd (_,b,_) = b
-- Extract the third value of a tuple
extract_3rd :: (type1,type2,type3) -> type3
extract_3rd (_,_,c) = c
ghci> :load tuples
[1 of 1] Compiling Main ( tuples.hs, interpreted )
ghci> extract_3rd ("q",5,1.7)
1.7
The underscore _
is a wildcard: anything can match this item. In particular, note that we don’t even care about its type!
New learners are often tempted to write code like:
tuple_eq (a, a) = True -- WRONG!!!
tuple_eq _ = False
Haskell will complain about “conflicting definitions for 'a'
” on the first line. To compare the two values obtained from pattern matching, they have to be bound to different names and compared afterwards.
-- using guards
tuple_eq (a, b)
| a == b = True
| otherwise = False
-- or, use (==) directly
tuple_eq (a, b) = a == b
This differs from Prolog, a language we’ll explore much later in the course.
You might’ve noticed that _
is the exception to the above rule. Haskell has some special logic for _
– it doesn’t perform an actual binding, so you can use multiple _
in a tuple with no problems!
Pattern Matching with Lists
We can apply similar logic to pattern matching with lists. Here, we’ll rely on the cons operator (:
) to form our structure. In particular, I (Matt) like to think of this as “inverse cons”. Something of the form x:xs
matches the head x
(first item) and the tail xs
(last item); we can think of it as matching the entire list of x : xs
!
-- lists.hs
-- Extract the first item from a list
get_first :: [a] -> a
get_first (x:xs) = x
-- Extract all but the first item of a list
get_rest :: [a] -> [a]
get_rest (x:xs) = xs
-- Extract the second item from a list
get_second :: [a] -> a
get_second (x:y:xs) = y
get_second
comes naturally from our view of pattern matching: we can think of this as “inverse cons” for x : (y : xs)
, which means that x
is the head, and y
is the next item after the head.
ghci> :load lists
[1 of 1] Compiling Main ( lists.hs, interpreted )
ghci> get_first [5,10,15,20]
5
ghci> get_rest [5,10,15,20]
[10,15,20]
ghci> get_second [5,10,15,20]
10
ghci> get_rest [5,6]
[6]
ghci> get_rest [5]
[]
We can also use the constant []
in place of the tail. As you’d expect, this only matches patterns where the tail is exactly the empty list.
-- favs.hs
favorites :: [String] -> String
favorites [] = "You have no favorites."
favorites (x:[]) = "Your favorite is " ++ x
favorites (x:y:[]) = "You have two favorites: "
++ x ++ " and " ++ y
favorites ("chocolate":xs) =
"You have many favs, but chocolate is #1!"
favorites (x:y:_) =
"You have at least three favorites!"
ghci> :load favs
[1 of 1] Compiling Main ( favs.hs, interpreted )
ghci> favorites []
"You have no favorites."
ghci> favorites ["chocolate"]
"Your favorite is chocolate"
ghci> favorites ["mint","earwax"]
"You have two favorites: mint and earwax"
ghci> favorites ["chocolate","egg","dirt"]
"You have many favs, but chocolate is #1!"
ghci> favorites ["tea","coffee","carrot"]
"You have at least three favorites!"
There is another notation that uses brackets ([]
) instead of parens (()
). It’s not used as commonly as the parenthetic version. The core difference here is that it only matches lists of an exact length; each item in the list is just an item (there are no tails, etc.).
-- lists.hs\
whats_in_your_list [10] =
"Your list has just 1 value which is 10."
whats_in_your_list [a,b] = "Your list has 2 values: "
++ (show a) ++ " and " ++ (show b)
whats_in_your_list [9,_,c] =
"Starts with 9 and its third item is:" ++ (show c)
whats_in_your_list _ =
"Your list had something else in it."
ghci> :load lists
[1 of 1] Compiling Main ( lists.hs, interpreted )
ghci> whats_in_your_list [10]
"Your list has just 1 value which is 10."
ghci> whats_in_your_list [10,20]
"Your list has 2 values: 10 and 20"
ghci> whats_in_your_list [9,10,20]
"Starts with 9 and its third item is 20."
ghci> whats_in_your_list [9,10,20,30]
"Your list had something else in it."
Broadly, pattern matching with lists makes our code more concise and readable! Consider our previous guard-based sm0lest
:
sm0lest lst
| lst == [] = error "empty list"
| length lst == 1 = first
| otherwise = min first (sm0lest rest)
where
first = head lst
rest = tail lst
With pattern matching, the logic becomes more compact!
sm0lest [] = error "empty list"
sm0lest [x] = x
sm0lest (x:xs) = min x (sm0lest xs)
The same applies to the following reverse list example:
-- Reverse a list w/o pattern matching
rev lst
| lst == [] = []
| otherwise = (rev rest) ++ [first]
where
first = head lst
rest = tail lst
-- Reverse a list with pattern matching
rev [] = []
rev (x:xs) = (rev xs) ++ [x]
This reverse
algorithm is not optimal; it has O(N^2)
complexity (in the length of the list). Can you figure out an O(N)
algorithm?
Influence
Pattern matching is mostly used in functional languages, but it’s notably been adopted by Python and Rust.
Python added pattern matching very recently (in version 3.10
, released in 2021). Python’s pattern matching works on tuples, lists, dictionaries, and more! Python is a dynamically typed language, so match cases can have different types.
def func(val): # Python pattern matching example
match val:
case (0, x): # match tuple w/first elem zero
print(f"Tuple of 0 and {x}")
case ['a', x, 'c']: # match list w/3 elems: 'a', ??, 'c'
print(f"List of a, {x}, and c")
case [1, 2, *rest]: # seq of: 1, 2, ... other elements
print(f'A list with 1, 2, and *{rest}')
case {'foo': bar}: # match dict w/key 'foo'
print(f"foo maps to {bar} ")
case _:
print('no match')
Rust has almost always had pattern matching. It’s statically typed, so this example looks quite like Haskell. Here’s some pattern matching that Matt literally wrote two days ago:
match vtree_type {
VTreeType::LeftLinear => VTree::left_linear(vars),
VTreeType::RightLinear => VTree::right_linear(vars),
VTreeType::EvenSplit(num) => VTree::even_split(vars, num),
VTreeType::FromDTree => {
let dtree = DTree::from_cnf(&cnf, &VarOrder::linear_order(cnf.num_vars()));
VTree::from_dtree(&dtree).unwrap()
}
}