# News & Events

Published On: 6/10/2022

by Bassim Eledath

## Motivation

This blog post is about writing better code. In particular, we focus on methods commonly espoused by functional programmers. A lot of the subsequent ideas presented are inspired by the works of Eric Normand and Robert Cecil Martin, and by my own share of recent trials and tribulations involved with refactoring legacy code.

### Why write better code?

Primarily, we write better code to reduce technical debt. Technical debt is the implied cost of additional rework caused by coding an easy solution1. It’s essentially a trade-off. You could build an easy and quick ad hoc solution and spend more time debugging/modifying the code in the future or spend more time on a robust solution and spend less time debugging/modifying.

We refactor or rewrite existing code when technical debt is high enough. Such a decision is in no way trivial, though an oversimplification (as shown below) might be helpful in thinking through it.

$T_d = Estimated\:technical\:debt\:(hrs)\\ T_r = Estimated\:time\:to\:refactor\:(hrs)\\ T_{diff} = T_d - T_r\\ \mathbf {1} _{refactor}(T_{diff}):= \begin{cases}1~&{\text{ if }}~T_{diff} > 0,\\0~&{\text{ if }}~T_{diff} \le 0\end{cases}$

### A case study: Introducing NoviParlor

We shall now look at a simple case study to illustrate some of the best practices of functional programmers. Introducing NoviParlor: Durham’s newest ice cream shop. You are tasked with writing code that calculates the final price of placed orders. The final price takes into account:

1. Number of scoops
2. Ice cream flavors: Vanilla, Chocolate, Gooey Cake and Haskell. All flavors have different prices associated with them.
3. Toppings: Dry, wet or gold.
4. Discounts: NoviSci employees get a 20% discount. Statisticians at NoviSci get a 30% discount (don’t ask me why). And orders >$20 get a 10% discount. The data we are given as input follows the list structure shown below. # input data inp_data <- list( buyer = list( novi_employee = T, statistician = F ), scoops = list( vanilla = 1, chocolate = 2, gooey_cake = 0.5, haskell = 1 ), toppings = list( dry = 1, wet = 0, gold = 1 ), scoops_price = list( vanilla = 2, chocolate = 2, gooey_cake = 3, haskell = 5 ), toppings_price = list( dry = 0.5, wet = 0.25, gold = 10 ) ) Now let’s look at some hastily written, though working, code. This is meant to represent someone’s ad hoc way of coding up the solution. # global var total initialized to 0 total <- 0 calculate_total <- function(inp_data) { # get dot product of scoops and scoop prices by flavor and add to total price_scoops <- c() for (i in 1:length(inp_data$scoops)) {
price_scoops <- append(price_scoops, inp_data$scoops[[i]] * inp_data$scoops_price[[i]])
}
total <- total + sum(price_scoops)

# get dot product of toppings and topping prices by flavor and add to total
price_toppings <- c()
for (i in 1:length(inp_data$toppings)) { price_toppings <- append(price_toppings, inp_data$toppings[[i]] * inp_data$toppings_price[[i]]) } total <- total + sum(price_toppings) # apply discount with flag arguments ApplyDiscount <- function(novi_employee, statistician, over_20) { if (novi_employee) { if (statistician) { total <- total * 0.7 } else { total <- total * 0.8 } } if (over_20) { total <- total * 0.9 } return(total) } # apply fn ApplyDiscount() to total price before discount total <- ApplyDiscount( novi_employee = inp_data$buyer$novi_employee, statistician = inp_data$buyer$statistician, over_20 = total > 20 ) return(total) } calculate_total(inp_data) ## [1] 16.56 While the code gives the right output, there are a few problems with how it was written that stick out. Let’s address these by diving into the next section: the 5 tenets of functional programming. ## 5 tenets of functional programming ### Tenet 1: Avoid a global mutable state One of the biggest challenges in programming is working with a global mutable state. But what is it exactly? A global variable is one that is accessible from any other point in your code. Mutable means that it can be changed. The state of a variable can be thought of as an instance of a variable which depends on when and where it is run in the program. In the code example above, “total” is a global variable, and its value is modified in multiple places in the code i.e. it does not maintain state. We want to avoid having such a variable. And if we need to have a variable accessed several times, we wish to limit the scope of that variable as much as we can. By removing these variables our code becomes more predictable: we don’t have to guess what the value of a variable is as (1) it’s not accessed in multiple places and (2) its value doesn’t change. By modifying “total” frequently, we also place an undue burden on the order of our code, which can easily break when modifying/adding code. Here’s a silly example demonstrating this. x <- "You're hired!" print(x) ## [1] "You're hired!" x <- "You're fired. Go back home." Switching order of print statement. x <- "You're hired!" x <- "You're fired. Go back home." print(x) ## [1] "You're fired. Go back home." When modifying “x” in-place, the order in which we call “print(x)” makes all the unfortunate difference. In Functional Programming (FP), modifying a variable after it’s been initialized is heresy. ### Tenet 2: Minimize side-effects Side effects are anything a function does other than returning a value. A pure function is a function with absolutely no side-effects, and the return value is only determined by its input values. This is a strict definition and below we show three cases: 1. A pure function that maps the input directly to the output 2. A function that is not pure as it prints “x” - a side effect. 3. A function where a variable out of scope if referenced. This is yet another side effect. pure_function <- function(x) { return(x^2) } side_effect <- function(x) { print(x) return(x^2) } y <- 2 side_effect2 <- function(x) { return(y^2) } Side effects are usually inevitable (in practice) but we must try to minimize them as they introduce more places where code can break. Going back to our NoviParlor code, we see many instances of side-effects. We have side effects in the form of variable assignments (like “price_scoops”) that can be eliminated or decomposed into smaller functions. ### Tenet 3: Avoid flag arguments A flag argument is a function argument that tells the function to carry out a different operation depending on its value2. Flag arguments reduce the cohesion of a function, so we try to avoid them. In example1() below we see a flag argument that changes the behavior of the function. It looks harmless but things get messy when (1) the flag argument can take on many possible values and (2) when more flag arguments are introduced (as in example2()). flag_argument <- "green" flag_argument2 <- "red" # messy code example1 <- function(flag_argument) { if (flag_argument == "green") { print("green") } else if (flag_argument == "red") { print("red") } else { print("no color") } } # even messier code example2 <- function(flag_argument, flag_argument2) { if (flag_argument == "green" & flag_argument2 == "red") { print("green red") } else if (flag_argument == "red" & flag_argument2 == "green") { print("red green") } else if (flag_argument == "green" & flag_argument2 == "green") { print("green green") } else { print("red red") } } In our poorly written code example (above), ApplyDiscount() has three flag arguments: novi_employee, statistician and over_20. To better address tenets 4 and 5, let’s look at some well written code. # abstraction calculate_total_v2 <- function(data) { ## constants items <- c(data$scoops, data$toppings) prices <- c(data$scoops_price, data$toppings_price) return( calc_dot_prod(items, prices) %>% apply_discount(., data) ) } # actions apply_discount <- function(total, data) { ## bool constants novi_employee <- data$buyer$novi_employee statistician <- novi_employee & data$buyer\$statistician
order_20 <- total > 20

if (statistician) {
total <- discount_stat_novisci(total)
} else if (novi_employee) {
total <- discount_novisci(total)
}

if (order_20) {
total <- discount_order_total(total)
}

return(total)
}

# calculations
discount_novisci <- function(total) {
return(total * 0.8)
}

discount_stat_novisci <- function(total) {
return(total * 0.7)
}

discount_order_total <- function(total) {
return(total * 0.9)
}

calc_dot_prod <- function(item, price) {
return(
purrr::map2(
.x = item,
.y = price,
.f = ~ {
sum(.x * .y)
}
) %>% purrr::reduce(+)
)
}

calculate_total_v2(inp_data)
## [1] 16.56

### Tenet 4: Separate actions, calculations and data

A good functional programmer, according to Eric Normand, separates actions calculations and data. Let’s define these terms individually.

1. Actions: Actions are anything that depend on when they are called or how many times they are called3. They are simply functions with side-effects. In our code, apply_discount is an action as we can only apply the discount after calculating the total i.e. it depends on the state of the program. Actions are usually unavoidable in our code but we can try to minimize how many we end up using. It’s useful to think of actions as “code that needs the most attention”, as side effects introduce more opportunities for code to break. Bottom line - we have to be extra careful with them.

2. Calculations: Calculations are simply pure functions. They are 100% deterministic and don’t affect the world when they run. In our revised code discount_novisci(), discount_stat_novisci(), discount_order_total and calc_dot_prod() are all calculations. Calculations are also really easy to test as all it takes is to check the inputs and confirm the (expected) result. Calculations are your friends.

3. Data: Formally, data are facts about events3. What we like is for data to be as immutable as it can. If not, our code will behave in unpredictable ways. In our well written code example, we maintain the state of the input data and do not modify it in-place.

In my experience, thinking of code in terms of actions, calculations and data not only improves code organization, it also significantly improves code readability.

### Tenet 5: Abstract!

Abstraction might be controversial to bring up here as it is commonly associated with Object Oriented Programming (OOP). But FP can be extremely helpful in managing complexity, and abstraction is one way of doing so. Not to mention it’s used all the time.

Abstraction, for the unacquainted, is taking a specific problem and making a general solution for it. To drive a car, you do not need to know how the engine works - you only need to learn how to utilize configurable parameters (steering wheel, gear etc.) that depend on the lower level components such as the engine. Of course, a balance is required as abstracting away too much functionality leaves us with a system that is harder to modify. Generally we keep configurable data at high levels. In our well written code example, we use the function apply_discount() that depends on lower level calculations to get the price after discount.

### Bonus tenet: Use higher-order functions

Functional programming languages like R have several in-built higher-order functions that make life easier. A higher-order function is a function that takes a function as an argument, or returns a function.

Remember the for loops used in the bad code example earlier? In the well written code example, we use map2() and reduce() (from the purrr package) to find the dot product of two vectors to calculate the sum total of the price of ice cream orders (before discount). These operations are vectorized (so generally faster), easier to understand, and reduce duplication of code.

## Conclusion

The real beauty of functional programming is that it deals with beneficial, universal coding practices. The tenets discussed in this blog can be used regardless of what programming paradigm (object oriented, procedural etc.) you choose. Note, the “tenets” discussed in this post are by no means universally agreed upon and are also by no means hard rules to follow as exceptions always exist. They are merely perspectives that could help you build robust and maintainable code.

## References

1. Techopedia. What is technical debt? - definition from techopedia. Techopedia.com (2017).
2. PSA: Avoid flag arguments. ClearSlide (2016).
3. Normand, E. Grokking simplicity: Taming complex software with functional thinking. (Manning, 2021).

## About Target RWE

As the industry's best-in-class, complete real world evidence (RWE) solution, Target RWE is a distinctly collaborative enterprise that unifies real world data (RWD) sets and advanced RWE analytics in an integrated community, shifting the paradigm in healthcare for how decisions are made to improve lives.

Target RWE sources unique, connected data sets across multiple therapeutic areas representing granular data from diverse patients in academic and community settings. Our rigorous, interactive, and advanced RWE analytics extract deep insights from RWD to answer important questions in healthcare. Target RWE brings together the brightest minds in healthcare through an unmatched community of key opinion leaders, patients, and healthcare stakeholders in a collaborative and dynamic model. www.targetrwe.com

## Contact:

Kayla Slake
Marketing Manager

kslake@targetrwe.com

984.234.0268 ext 205