ASSIGNMENT 3
1. 14624761 - Trung Duong Nguyen
2. 63842660 - Syifa Nadhira Nurul Izzah
3. 19722024 - Hussein Al Aaref
1. One important component of problem-solving is to extract general insights from specific mistakes.
Find one problem that you solved incorrectly on a previous assignment. In one paragraph, describe the
specific mistake you made. In a second paragraph, describe a general modification you can make to your
problem-solving approach to avoid making similar mistakes in the future.
(If you achieved a perfect score on all of your previous assignments, describe a general part of your
problem-solving approach that helps you do so well!)
In the last assignment, we made a mistake in question 3a. Although we have done the right math and
reasoning, we did not use the word derivative well enough to make the impression that we are collating
the derivative, rather than the limits of p(t). Specifically, in the first paragraph, we state that “We want
to firstly calculate lim− p(t)” without using the word “derivative”, causing misunderstanding that we are
x→T
on the wrong track.
To overcome this mistake, we decided to make use of our time and re-read the assignment as much as
possible to spot any point/sentence that can cause misunderstanding. We also take note of how the
professor uses words in the small class and the big class and use them as a reference when writing the
upcoming assignment.
The rectifier function
0 if x < 0
r(x) =
x if x ≥ 0
is used in artificial neural networks to model the firing of neurons. However, r(x) is not differentiable at 0.
Differentiability can improve the stability and performance of neural networks. Two common differentiable
approximations to r(x) are the softplus function
p(x) = log (1 + ex )
and the swish function
x
s(x) = .
1 + e−x
In this assignment, you may use without proof the facts that p(x) > r(x) and s(x) ≤ r(x) for all x, and p(x)
and r(x) are both continuous.
2. (a) Explain why p(x) approximates r(x) well for large (positive and negative) values of x.
i. For large values (positive) of x, 1 will be ignorable since ex is too large compared to 1. Therefore,
p(x) will approximates log(ex ) ≈ x. This is also r(x), since r(x) = x for all x > 0.
ii. For large values (negative) of x, ex will be ignorable to 1 since ex will approach 0 for large
negative values of x. Therefore, p(x) will approximates log(1) ≈ 0. This is also r(x), since
r(x) = 0 for all x < 0.
, (b) Explain why s(x) approximates r(x) well for large (positive and negative) values of x.
We use algebra to replace the negative exponent from the function:
x
s(x) = 1+e−x
x
s(x) = 1+ e1x
x
s(x) = ex +1
ex
x∗ex
s(x) = ex +1
i. For large values (positive) of x, 1 can be ignored in the denominator, since 1 will be too small
compared to ex , and the denominator can be reduced to ex , which can be canceled with ex in
the numerator. Therefore, s(x) will approximate x for large values (positive) of x. This is also
r(x), since r(x) = x for all x > 0.
ii. For large values (negative) of x, the denominator will approach 1 (since ex will approach 0), and
the numerator will approach 0 (since ex will approach 0). Combination-ally, the function s(x)
will approach 10 , which is r(x), since r(x) = 0 for all x < 0.
3. Where is p(x) the worst approximation to r(x)? In other words, where is the vertical distance between
the two functions maximized?
Firstly, we want to define a function f (x) = p(x) − r(x), and from the function definition of r(x) and
p(x), f (x) can be defined: (
log(1 + ex ) if x < 0
f (x) = x
(1)
log(1 + e ) − x if x ≥ 0
This function also represents the vertical distance between p(x) and r(x). To find the max point of the
function, we want to observe the derivative of the function:
( x
e
′ x +1 if x < 0
f (x) = e−1 (2)
ex +1 if x > 0
Firstly, we notice that f ′ (x) > 0 when x < 0 and f ′ (x) < 0 when x > 0. This is due to ex > 0 with all x.
Secondly, we also found that the derivative at x = 0, f ′ (0) = DN E since there is a cusp at x = 0
in the function f (x).
We noticed that the function has no value that f ′ (x) = 0, or in other words, f ′ (x) ̸= 0 ∀x. Secondly, we
want to examine the point x = 0 as one of the critical points, since x = 0 is a singular point where f ′ (x)
is undefined (also, being the interval endpoint). We created a table to examine the behavior of f ′ (x) and
f (x) around the critical point x = 0:
x - 0 +
f’(x) + + -
f(x) increase f(0) decrease
From the table, we know that x = 0 is a max point base on the increased behaviour of the function on
the left, and the decreased behaviour of the function on the right. We also conclude that this max point
is the global max point since there is no other critical point that can be found. Therefore, the vertical
distance between p(x) and r(x) maximized at x = 0.
4. (a) On the interval (−∞, 0), where is s(x) the worst approximation to r(x)? You may not be able to
determine an exact x-value, but find the integer a < 0 such that s(x) is the worst approximation to
r(x) somewhere in the interval [a, a + 1].
1. 14624761 - Trung Duong Nguyen
2. 63842660 - Syifa Nadhira Nurul Izzah
3. 19722024 - Hussein Al Aaref
1. One important component of problem-solving is to extract general insights from specific mistakes.
Find one problem that you solved incorrectly on a previous assignment. In one paragraph, describe the
specific mistake you made. In a second paragraph, describe a general modification you can make to your
problem-solving approach to avoid making similar mistakes in the future.
(If you achieved a perfect score on all of your previous assignments, describe a general part of your
problem-solving approach that helps you do so well!)
In the last assignment, we made a mistake in question 3a. Although we have done the right math and
reasoning, we did not use the word derivative well enough to make the impression that we are collating
the derivative, rather than the limits of p(t). Specifically, in the first paragraph, we state that “We want
to firstly calculate lim− p(t)” without using the word “derivative”, causing misunderstanding that we are
x→T
on the wrong track.
To overcome this mistake, we decided to make use of our time and re-read the assignment as much as
possible to spot any point/sentence that can cause misunderstanding. We also take note of how the
professor uses words in the small class and the big class and use them as a reference when writing the
upcoming assignment.
The rectifier function
0 if x < 0
r(x) =
x if x ≥ 0
is used in artificial neural networks to model the firing of neurons. However, r(x) is not differentiable at 0.
Differentiability can improve the stability and performance of neural networks. Two common differentiable
approximations to r(x) are the softplus function
p(x) = log (1 + ex )
and the swish function
x
s(x) = .
1 + e−x
In this assignment, you may use without proof the facts that p(x) > r(x) and s(x) ≤ r(x) for all x, and p(x)
and r(x) are both continuous.
2. (a) Explain why p(x) approximates r(x) well for large (positive and negative) values of x.
i. For large values (positive) of x, 1 will be ignorable since ex is too large compared to 1. Therefore,
p(x) will approximates log(ex ) ≈ x. This is also r(x), since r(x) = x for all x > 0.
ii. For large values (negative) of x, ex will be ignorable to 1 since ex will approach 0 for large
negative values of x. Therefore, p(x) will approximates log(1) ≈ 0. This is also r(x), since
r(x) = 0 for all x < 0.
, (b) Explain why s(x) approximates r(x) well for large (positive and negative) values of x.
We use algebra to replace the negative exponent from the function:
x
s(x) = 1+e−x
x
s(x) = 1+ e1x
x
s(x) = ex +1
ex
x∗ex
s(x) = ex +1
i. For large values (positive) of x, 1 can be ignored in the denominator, since 1 will be too small
compared to ex , and the denominator can be reduced to ex , which can be canceled with ex in
the numerator. Therefore, s(x) will approximate x for large values (positive) of x. This is also
r(x), since r(x) = x for all x > 0.
ii. For large values (negative) of x, the denominator will approach 1 (since ex will approach 0), and
the numerator will approach 0 (since ex will approach 0). Combination-ally, the function s(x)
will approach 10 , which is r(x), since r(x) = 0 for all x < 0.
3. Where is p(x) the worst approximation to r(x)? In other words, where is the vertical distance between
the two functions maximized?
Firstly, we want to define a function f (x) = p(x) − r(x), and from the function definition of r(x) and
p(x), f (x) can be defined: (
log(1 + ex ) if x < 0
f (x) = x
(1)
log(1 + e ) − x if x ≥ 0
This function also represents the vertical distance between p(x) and r(x). To find the max point of the
function, we want to observe the derivative of the function:
( x
e
′ x +1 if x < 0
f (x) = e−1 (2)
ex +1 if x > 0
Firstly, we notice that f ′ (x) > 0 when x < 0 and f ′ (x) < 0 when x > 0. This is due to ex > 0 with all x.
Secondly, we also found that the derivative at x = 0, f ′ (0) = DN E since there is a cusp at x = 0
in the function f (x).
We noticed that the function has no value that f ′ (x) = 0, or in other words, f ′ (x) ̸= 0 ∀x. Secondly, we
want to examine the point x = 0 as one of the critical points, since x = 0 is a singular point where f ′ (x)
is undefined (also, being the interval endpoint). We created a table to examine the behavior of f ′ (x) and
f (x) around the critical point x = 0:
x - 0 +
f’(x) + + -
f(x) increase f(0) decrease
From the table, we know that x = 0 is a max point base on the increased behaviour of the function on
the left, and the decreased behaviour of the function on the right. We also conclude that this max point
is the global max point since there is no other critical point that can be found. Therefore, the vertical
distance between p(x) and r(x) maximized at x = 0.
4. (a) On the interval (−∞, 0), where is s(x) the worst approximation to r(x)? You may not be able to
determine an exact x-value, but find the integer a < 0 such that s(x) is the worst approximation to
r(x) somewhere in the interval [a, a + 1].