April 2022 On Campus Exam
Solutions to this exam is available here FRTN65_examApr2022_solutions.pdf
A code solution to Problem 3 could look like this: solutioncode3.m
Use your anonymization code when marking papers or sending in your files.
You can hand in either hand-written solutions on paper or send in files to our administrator mika@control.lth.se, or use a mix of both, whatever is easiest for you. If mailing, send one single zip-file and name this file NR.zip where NR is your anonymization code.
Allowed aid: All material is allowed to use, including online resources. You will need web access in problem 2,3,4. Communication with other persons is not allowed, except exam supervisors !!!
Total points is 50p. Limit for pass will be 25p.
Good luck !
1 Dimension analysis [10p]
Background: The attraction force between a proton and an electron with charge depends on the Coulomb's constant
. In the formula for the binding energy
of the electron in a hydrogen atom (i.e. the energy needed to kick the electron away from the proton) also the mass
is involved, and because it is a quantum effect also Planck's constant
is expected to appear.
a) Determine integer coefficients so that
becomes a dimensionless variable. Your analysis should describe how you found
from a linear equation system Ax=0. Hint: You can use null(A) in matlab to calculate solutions to Ax=0.
b) From the analysis, choosing the solution with , we expect a formula for the energy of the form
for some dimensionless constant
. This also turns out to be correct, with the constant
. Calculate the binding energy
for the electron.
2 Supervised learning [10p]
This google colab notebook investigates a data set consisting of some hundreds of penguins of 3 different species ('Adelie', 'Gentoo' or 'Chinstrap'). There are 6 features for the penguins: 4 of them are numerical and 2 are categorical. The goal is to find a good classifier that can determine the species from (some of) the attributes. To reduce future penguin measurement work, one wants to use few features. We will try to use only 2 instead of all 6.
Answer subproblems a,b,c,d described in the notebook. You do NOT have to hand in any code.
a) The first logistic regression does not work very well. Explain why.
b) Explain how the improvement is achieved.
c) Make a better choice of which 2 features to use for classification. Describe what accuracy you achieve.
d) We now use all 6 features. Describe what is done in the code and how the categorical variables have been used. Any idea why the LDA method seems to work slightly better than the logistic regression for this data set?
Problem 3 [12 points] System Identification Hands-on
The file problem3.mat contains input u and output y for a SISO system with sample rate h=0.01s. Identify a discrete time linear model of the system. Aim for using few model parameters. Be sure to describe your methodology, including outlier analysis, choice of suitable model structure and model order, and include model validation with residual analysis. Also hand in your matlab code.
Hint: Useful commands might include help ident, systemIdentification, arx,oe,armax,bj, present, compare, resid, bodeplot,pzmap,...
Problem 4 [10 points] Causal inference and DAGs
a) Draw the DAG corresponding to the following structural causal model. Be careful to get it right, since you will use the DAG in problems b)-i).
The next task is to draw conclusions about statements such as "T is independent of Y" (which is true) or " T is independent of Y given W" (which is false), using the DAG and this google colab code.
Are the following statements true or false? Motivate the answers.
b) T and W are independent
c) Y is independent of Z given X and V
d) S is independent of Y given U
e) T and U are independent
f) X and S are independent
g) X is independent of U given W
h) X is independent of V given Z
i) X is independent of V given Z and U
Correct answers give +1 and wrong answers -1. You can skip subproblems if you want.
Problem 5 [8 points]
A certain genotype can have the three different possibilities: AA, Aa and aa. Assume these three cases occur with probability respectively. To estimate
you make
independent observations getting
samples from each category (so
).
a) Find a lower bound on achievable error variance for any unbiased estimator
.
b) Find an optimal estimator achieving this lower bound, or prove that no such estimator exists.