Exam Aug 2023

Due No Due Date
Points 50
Submitting a file upload

Solutions are available here: exam2023Aug_solutions.pdf Download exam2023Aug_solutions.pdf

----------

Instructions:

Name files handed in to Canvas using your anonymization code, such as NR.zip or NR-problem1.pdf etc (this facilitates matching of any handwritten solutions to the correct person in canvas during the anonymized exam grading). Any handwritten solutions should be marked with both your anonymization code and a personal identifier chosen by you.

All solutions must be well motivated. Code should be understandable, commenting will help

Preliminary limits for grades: 3: 25, 4:33, 5: 42.

Good luck !

------------------------------------------------------------------------------------------------------------------------

1. Dimensional Analysis [6p]

When a flowing liquid breaks up into small droplets, such as when using a spray can, the resulting droplet diameter LaTeX: d $d$ [m] is assumed to depend on

liquid velocity $v$ [m/s]
density $LaTeX: \rho$ $\rho$ [kg/m^3]
viscosity $LaTeX: \mu$ $\mu$ [kg/(m*s)
surface tension $LaTeX: \sigma$ $\sigma$ [kg/s^2]
outlet diameter D [m]

Show that dimensional analysis indicates a relationship of the form

$LaTeX: \frac{d}{D} = f(\Pi_1, \Pi_2)$ $\frac{d}{D} = f(\Pi_1, \Pi_2)$

with two dimensionless variables $LaTeX: \Pi_1$ $\Pi_1$ and $LaTeX: \Pi_2$ $\Pi_2$ (that depend on $LaTeX: v,\rho, \mu,\sigma, D$ $v,\rho, \mu,\sigma, D$ , but do not depend on d).

(You should not aim to determine the function f, it needs to be found from experiments).

------------------------------------------------------------------------------------------------------------------------

2. Logistic Regression for Classification [6p]

(In this problem you do not need to write any code, the problems can be solved by pen and paper).

The data in the picture below is used to fit a logistic regression model

$LaTeX: Prob(y=1 | x, w) = \sigma(w_0 + w_1x_1 + w_2x_2)$ $Prob(y=1 | x, w) = \sigma(w_0 + w_1x_1 + w_2x_2)$

where $LaTeX: \sigma(z) = 1/(1+\exp(-z))$ $\sigma(z) = 1/(1+\exp(-z))$ denotes the sigmoid function.

The two input features are LaTeX: x_1 $x_1$ and LaTeX: x_2 $x_2$ . The output is known to be LaTeX: y=0 $y=0$ for the red samples(o) and LaTeX: y=1 $y=1$ for the blue(+).

a) Explain briefly how one can fit the model above by maximum likelihood, i.e. find the weight vector LaTeX: w=[w_0 ; w_1 ; w_2] $w=[w_0 ; w_1 ; w_2]$ through minimizing a certain loss function

$LaTeX: J(w) = -L(w,D_{train})$ $J(w) = -L(w,D_{train})$

Describe LaTeX: L $L$ . Also describe how the obtained model can be used for classification of future data LaTeX: x $x$ .

b) Sketch a possible decision boundary corresponding to a good weight vector LaTeX: w $w$ , when trained as in a) on the data in the figure. How many classification errors does you method make on the training set? (You can make a copy of the figure and draw the decision line by hand).

c) Now suppose we regularize the LaTeX: w_0 $w_0$ parameter, i.e. we minimize

$LaTeX: J_0(w) = -L(w,D_{train}) + \lambda w_0^2$ $J_0(w) = -L(w,D_{train}) + \lambda w_0^2$

Assume $LaTeX: \lambda$ $\lambda$ is a very large number, so LaTeX: w_0 $w_0$ is regularized down to 0, but the other parameters are unregularized. Sketch a possible decision boundary. How many classification errors does you method make on the training data?

d) Now suppose we instead heavily regularize the LaTeX: w_1 $w_1$ parameter, i.e. we minimize

$LaTeX: J_1(w) = -L(w,D_{train}) + \lambda w_1^2$ $J_1(w) = -L(w,D_{train}) + \lambda w_1^2$

Sketch a possible decision boundary. How many classification errors does you method make on the training data?

e) Same question as in c, and d) but with heavy regularization on the LaTeX: w_2 $w_2$ parameter.

$LaTeX: J_2(w) = -L(w,D_{train}) + \lambda w_2^2$ $J_2(w) = -L(w,D_{train}) + \lambda w_2^2$

Sketch decision boundary and determine number of classification errors on the training data.

------------------------------------------------------------------------------------------------------------------------

3. Supervised Learning - linear and nonlinear regression [12p]

This google colab file Links to an external site. studies prediction of the strength of concrete based on 8 different input features. In total 824 different experiments have been performed. The data is given in the form of a 824*9 matrix, where the first 8 columns contain input features and the last column is the output, a numerical value y related to the the concrete strength. The goal is to predict LaTeX: y $y$ from the corresponding input features.

The file contains two initial attempts: A linear regression achieving mean absolute prediction error (MAE) around 8 and a kernel-based linear regression with Tikhonov regularization (so called kernel ridge regression) giving MAE around 13.

a) Improve the code and the prediction performance (accuracy) of the Kernel Ridge method.

b) Also try at least one more method, of your own choice.

------------------------------------------------------------------------------------------------------------------------

4. Causal Inference [8p]

The following linear equations describe a structured causal model.

A = na
X = c1*A + nx
Y = c2*X + ny
B = c3*A + c4*Y + nb
C = c5*Y + nc

Here na,nx,ny,nb,nc are N(0,1) independent random variables and c1,c2,c3,c4,c5 unknown coefficients. In this problem we are interested in the causal effect of X on Y.

a) Draw a DAG describing the model, and determine $LaTeX: \frac{\partial }{\partial x}E[Y \mid \mathbf{do}(X:=x)]$ $\frac{\partial }{\partial x}E[Y \mid \mathbf{do}(X:=x)]$ , i.e. the causal effect of X on Y (as a function of c1, c2,c3,c4, c5).

Which of the following ordinary least squares identifications will give a correct estimation for this causal effect (found as the estimated coefficient before the X variable)

b) Y ~ X

c) Y ~ X + A

d) Y ~ X + B

e) Y~ X + C

f) Y ~ X + A + B

g) In all the suggested LS models above, a bias will be included in the estimation. Is this a problem for the correct estimation of the causal effect? Note that no such bias was present in the correct structural causal model.

Motivate your answers.

(Feel free to generate data yourself to verify your conclusions)

------------------------------------------------------------------------------------------------------------------------

5. System Identification [12p]

The file sysiddata230817.mat Download sysiddata230817.mat contains some data from a linear system with one input u and one output y sampled at the rate h=0.01.

The code sysidproblem230817.m Download sysidproblem230817.m contains an initial investigation of the data and some not so successful identification of ARX models.

Identify a discrete time linear model of the system. Aim for using few model parameters. Be sure to describe your methodology, including outlier analysis, choice of suitable model structure and model order, and include model validation with residual analysis. Also hand in your matlab code.

(Hint: Useful commands might include help ident, systemIdentification, arx,oe,armax,bj, present, compare, resid, bodeplot,pzmap,...)

------------------------------------------------------------------------------------------------------------------------

6. System Identification Theory [6p]

We want to estimate parameters $LaTeX: \alpha$ $\alpha$ and $LaTeX: \beta$ $\beta$ in a nonlinear input output relation of the form

$LaTeX: y(t) = \alpha \exp(u(t)) + \beta \exp(-u(t)) + e(t)$ $y(t) = \alpha \exp(u(t)) + \beta \exp(-u(t)) + e(t)$

where the input $LaTeX: u(t) \in N(0,1)$ $u(t) \in N(0,1)$ and the measurement noise $LaTeX: e(t)\in N(0,1)$ $e(t)\in N(0,1)$ are independent normally distributed samples. The signals LaTeX: y(t) $y(t)$ and $u(t)$ are measured for $LaTeX: t=1,\ldots N$ $t=1,\ldots N$ .

a) [1p] Describe how to use least squares linear regression to estimate the true parameter vector $LaTeX: \theta_0 = \begin{bmatrix} \alpha \\\beta \end{bmatrix}$ $\theta_0 = \begin{bmatrix} \alpha \\\beta \end{bmatrix}$ .

b) [5p] Describe the asymptotic statistical properties of the estimation error $LaTeX: \hat \theta_N - \theta_0$ $\hat \theta_N - \theta_0$ , when $LaTeX: N\to \infty$ $N\to \infty$ : Will the estimate be bias-free? Will it be Gaussian? Also calculate the asymptotic behavior of the error covariance matrix $LaTeX: E[(\hat \theta_N-\theta_0)(\hat \theta_N-\theta_0)^T]$ $E[(\hat \theta_N-\theta_0)(\hat \theta_N-\theta_0)^T]$ as $LaTeX: N\to \infty$ $N\to \infty$ .

Hint: For a normally distributed variable $LaTeX: u(t) \in N(0,1)$ $u(t) \in N(0,1)$ it holds that $LaTeX: E(\exp(\gamma u)) = \exp(\gamma^2/2)$ $E(\exp(\gamma u)) = \exp(\gamma^2/2)$ .

Rubric

Title:

Find a Rubric

Title

Title
Criteria	Ratings	Pts
Description of criterion threshold: 5 pts Edit criterion description Delete criterion row	5 to >0 pts Full Marks blank 0 to >0 pts No Marks blank_2 This area will be used by the assessor to leave comments related to this criterion.	pts / 5 pts --
Description of criterion threshold: 5 pts Edit criterion description Delete criterion row	5 to >0 pts Full Marks blank 0 to >0 pts No Marks blank_2 This area will be used by the assessor to leave comments related to this criterion.	pts / 5 pts --