NTU-Exam 板


LINE

课程名称︰机器学习 课程性质︰电机系选修 课程教师︰吴沛远 开课学院:电机资讯学院 开课系所︰电机工程学系 考试日期(年月日)︰2018/12/28 试题 : The paper consists of 9 questions. Total 105 points + 10 bonus points. In this exam we denote ● Sigmoid function: $\sigma(z) = \frac1{1+e^{-z}}$. You may apply approximate $\sigma(z) \approx \begin{cases}0,&\text{if }z\le-10,\\1,&\text{if }z\ge10. \end{cases}$ ● Sign function: $\operatorname{sgn}(z) = \begin{cases}1,&\text{if }z>0,\\0, &\text{if }z=0,\\-1,&\text{if }z<0.\end{cases}$ ● Unless otherwise specified, all log refer to natural log, i.e., $\log_e$. Problem 1: (20 pts) Multiple Selection (多选题有倒扣,最多倒扣至本大题零分) Please answer the following multiple selection questions. Wrong selections will result in inverted scores. No derivation required. (1) Suppose you are using a hard margin linear SVM classifier on 2 class clas- sification problem. Now you have been given the following data in which some points are dashed-circled that are representing support vectors. https://i.imgur.com/0YlXylk.png \begin{tikzpicture} \def\ok{ -2.3/2.6, -1.8/1.9, -1.7/1, -1.6/2.4, -1.3/3.2, -1.2/1.8, -1/3.9, -.9/2.5, -.7/1.4, -.5/2.2, -.5/3.4, -.4/2.9, .2/2.6, .2/3.1, .7/3.3 } \def\ng{ .7/0, .8/.6, 1.1/-.3, 1.2/.8, 1.3/.3, 1.6/1.1, 1.8/-.1, 1.9/.7, 2/1.7, 2.4/1.2, 2.4/2.1, 2.5/.6, 2.7/1.7, 3.1/2.2 } \foreach \x/\y in \ok{ \draw (\x, \y) circle(1pt); } \foreach \x/\y in \ng{ \filldraw (\x, \y) circle(1pt); } \def\sx{{0, .2, 1.4}} \def\sy{{1.7, .5, 1.8}} \draw (\sx[0], \sy[0]) circle(1pt); \draw[densely dotted] (\sx[0], \sy[0]) circle(2pt); \foreach \i in {1, 2}{ \filldraw (\sx[\i], \sy[\i]) circle(1pt); \draw[densely dotted] (\sx[\i], \sy[\i]) circle(2pt); } \foreach \i in {0, 1, 2}{ \draw[very thin, -stealth] (1.3, 3) -- (\sx[\i], \sy[\i]); } \draw (2, 3) node[fill=white]{\footnotesize support vectors}; \end{tikzpicture} (A) Removing any dash-circled points from the data will change the decision boundary. (B) Removing any dash-circled points from the data will not change the de- cision boundary. (C) Removing any non-dash-circled points from the data will change the de- cision boundary. (D) Removing any non-dash-circled points from the data will not change the decision boundary. (E) Removing all non-dash-circled points from the data will not change the decision boundary. (2) If we increase parameter C in soft margin linear SVM classifier, what will happen? (A) The training error decreases. (B) The training error increases. (C) The margin decreases. (D) The margin increases. (E) The testing error decreases. (3) Suppose you are using a kernel SVM to 2 class classification problem, where the data points are distributed on the x-y plane (i.e., data points are 2 dimensional). Suppose we choose kernel function as $k((x, y), (x', y')) = (xx'+yy')^2$, which of the following decision boundaries, as described by equation f(x, y) = 0, are possible? (A) f(x, y) = x + y. (B) $f(x, y) = x^2 + y^2$. (C) $f(x, y) = (x+y)^2$. (D) $f(x, y) = (x-1)^2 + 3(y+2)^2$. (E) $f(x, y) = x^2 - 4y$. (4) Suppose you are using a kernel SVM to 2 class classification problem, where the data points are distributed on the x-y plane (i.e., data points are 2 dimensional). Suppose we choose kernel function as $k((x, y), (x', y')) = (1+xx'+yy')^2$, which of the following decision boundaries, as described by equation f(x, y) = 0, are possible? (A) f(x, y) = x + y. (B) $f(x, y) = x^2 + y^2$. (C) $f(x, y) = (x+y)^2$. (D) $f(x, y) = (x-1)^2 + 3(y+2)^2$. (E) $f(x, y) = x^2 - 4y$. (5) Suppose a SVM classifier is trained from data set $\{(\bm x_i, y_i) \}_{i=1}^N$, where $y_i \in \{+1, -1\}$ denotes the labels, and the classi- fier classifies x as positive label if $f(\bm x) = \bm w^T\bm x+b \ge 0$. The primal problem for solving w is given by \begin{tabular}{ll} Minimize & $\frac12\|\bm w\|^2+C\sum_{i=1}^N\xi_i$\\ Subject to & $y_i(\bm w^T\bm x_i+b)\ge1-\xi_i,\forall i=1,\ldots,N$\\ Variables & $\bm w\in\mathbb R^d,b\in\mathbb R,\xi_1,\ldots,\xi_N\ge0$ \end{tabular} The dual problem for solving $\alpha_i$'s in $\bm w = \sum_{i=1}^N\alpha_i y_i\bm x_i$ is given by \begin{tabular}{ll} Maximize & $\sum_{i=1}^N\alpha_i-\frac12\sum_{i=1}^N\sum_{j=1}^N \alpha_i\alpha_jy_iy_j(\bm x_i^T\bm x_j)$\\ Subject to & $\sum_{i=1}^N\alpha_iy_i=0$\\ Variables & $0\le\alpha_i\le C$ \end{tabular} Upon achieving optimal in both primal and dual problems, (A) If $\alpha_i > 0$ then $\xi_i > 0$. (B) If $\xi_i > 0$ then $\alpha_i > 0$. (C) If $\alpha_i = C$ then $\xi_i > 0$. (D) If $\xi_i > 0$ then $\alpha_i = C$. (E) If $\alpha_i = 0$ then $\xi_i = 0$. (6) Suppose the neural network was trained with dropout rate p = 0.2, in the sense that each neuron had probability p of only passing zero to the conse- cutive neurons. After the neural network is trained, how should we modify the weights in the neural network, so it can be applied without dropout? (A) Multiply each weight by 0.2. (B) Multiply each weight by 0.8. (C) Multiply each weight by 1.2. (D) Multiply each weight by 1.25. (E) No modification is needed. (7) Suppose you have an input volume of dimension 48×48×3. That is, the in- puts are images of size 48×48 with 3 channels (RGB). How many parameters would a single 5×5 convolutional filter have (not including bias)? Note: Since there is just a single 5×5 convolutional filter, the output has only 1 channel. (A) 3 (B) 25 (C) 75 (D) 2304 (E) 6912 (8) In the context of ensemble methods, which of the following statements are true? (A) In bagging, each weak classifier is independent of each other. (B) In boosting, each weak classifier is independent of each other. (C) In case of under-fitting, we expect bagging as a better remedy over boosting. (D) In case of over-fitting, we expect bagging as a better remedy over boo- sting. (E) AdaBoost (Adaptive boosting) considers hinge loss. (9) Which of the following are convex functions on $\mathbb R^2$? (A) $f(u, v) = u^2 - 4uv + v^2$ (B) f(u, v) = u - 3v (C) $f(u, v) = \log(u^2+1)$ (D) $f(u, v) = u^2 + v^2 + \max(-u, 0)$ (E) f(u, v) = sgn(u) (10) Select all that belong to unsupervised learning algorithms. (A) Deep auto-encoder (B) Hierarchical Agglomerative Clustering (C) K-means (D) Linear regression (E) Logistic regression (F) Locally Linear Embedding (LLE) (G) Principal Component Analysis (PCA) (H) Random forest (I) Support Vector Machine (SVM) (J) t-Distributed Stochastic Neighbor Embedding (t-SNE) Problem 2: (10 pts) Linear Regression Consider regression function $f_{\bm w}(x) = w_0 + w_1x + w_2x^2$. Consider 10 data points as follows. \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|}\hline $i$ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline $x_i$ & 0.89 & 0.03 & 0.49 & 0.17 & 0.98 & 0.71 & 0.50 & 0.47 & 0.06 & 0.68\\\hline $y_i$ & 3.03 & $-1.14$ & 0.96 & $-0.53$ & 3.90 & 2.21 & 1.09 & 0.78 & $-0.77$ & 1.97\\\hline \end{tabular} Find the values of $w_0, w_1, w_2$ that minimizes the following loss function \[L(\bm w) = \sum_{i=1}^{10}|y_i-f_{\bm w}(x_i)|^2\] Problem 3: (12 pts) Fitting Single Gaussian Distribution Denote $\mathcal N(\bm\mu, \bm\Sigma)$ as the Gaussian distribution with mean μ and covariance matrix Σ. (1) (2 pts) Write down the probability density function for $X \sim \mathcal N \left(\begin{bmatrix}3\\-2\end{bmatrix}, \begin{bmatrix}2&-1\\-1&3 \end{bmatrix}\right)$. (2) (10 pts) Suppose the following 10 data points (4.48, 1.27), (2.36, 1.78), (4.21, -1.10), (5.42, 9.42), (3.48, -1.91), (1.56, -2.39), (3.71, -2.97), (3.37, -1.13), (3.35, 1.04), (4.26, -1.65) are independently generated from $\mathcal N(\bm\mu, \bm\Sigma)$. Find the maximum likelihood estimator of the mean μ and covariance matrix Σ. Problem 4: (3 pts) Cross-entropy Let X = {卤肉饭, 牛肉面, 大波萝, 寿司, 素食}. Consider two probability distribu- tions $P_X, Q_X$ as follows: \begin{tabular}{|c|c|c|c|c|c|}\hline $x$ & 卤肉饭 & 牛肉面 & 大波萝 & 寿司 & 素食\\\hline $P_X(x)$ & 0.3 & 0.4 & 0.15 & 0.1 & 0.05\\\hline $Q_X(x)$ & 0.2 & 0.1 & 0.05 & 0.25 & 0.4\\\hline \end{tabular} Find the cross entropy \[H(P_X, Q_X) = \sum_{x\in X}P_X(x)\ln\left(\frac1{Q_X(x)}\right)\] Problem 5: (10 pts) Logistic regression A group of 10 students spend various hours studying for the machine learning (ML) exam. The following table shows the number of hours each student spent stu- dying, and whether they passed (1) or failed (0). ┌─────┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐ │ Hours (X)│ 0.5│ 1 │ 1.5│1.75│ 2.5│2.75│3.25│ 4 │ 4.5│ 5 │ ├─────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤ │ Pass (Y) │ 0 │ 0 │ 0 │ 1 │ 0 │ 1 │ 1 │ 1 │ 1 │ 1 │ └─────┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘ Consider the logistic model that predicts the probability of passing the exam by the hours spent studying P(Y = 1|X = x) = σ(wx + b) Find the cross entropy loss should we fit the data with logistic model with pa- rameter w = 1.5, b = -4. Problem 6: (5 pts + 10 pts Bonus) Gaussian Mixture Model and Expectation Maximi- zation Suppose we wish to fit the following 10 data points (distributed in 1-D space) -12.72, -2.05, -6.56, 2.55, -1.77, 9.19, 8.85, -3.34, -3.74, 3.63 by Gaussian mixture model $p_\theta(x)$ parameterized by $\theta = (\pi_1, \mu_1, \sigma_1, \pi_2, \mu_2, \sigma_2)$, as given as follows \[p_\theta(x) = \pi_1(2\pi\sigma_1^2)^{-1/2}e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} + \pi_2(2\pi\sigma_2^2)^{-1/2}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}\] Suppose the initial guess of parameter is $\theta^{(0)} = \left(\pi_1^{(0)}, \mu_1^{(0)}, \sigma_1^{(0)}, \pi_2^{(0)}, \mu_2^{(0)}, \sigma_2^{(0)}\right) = (0.3, -1, 2, 0.7, 2, 3)$. (a) (5 pts) Compute the log likelihood of parameter $\theta^{(0)}$. (b) (10 pts Bonus) Apply expectation maximization algorithm, find the next up- date of parameters \[\theta^{(1)} = \left(\pi_1^{(1)}, \mu_1^{(1)}, \sigma_1^{(1)}, \pi_2^{(1)}, \mu_2^{(1)}, \sigma_2^{(1)}\right)\] Problem 7: (14 pts) Principal Component Analysis Consider the following 10 data points distributed in 2-D space as follows (1.91, -0.11), (-2.24, -1.09), (1.36, -0.20), (0.33, 0.13), (-0.33, 0.37), (0.00, -0.63), (-3.10, -0.47), (-0.34, 2.38), (2.43, -3.00), (-0.02, 2.62) (a) (10 pts) Find the first and second principal axes. (b) (4 pts) Find the first and second principal components for data point (0.96, 0.28). Note: The data points have zero mean. Problem 8: (9 pts) LSTM Consider a LSTM node as follows. Please fill out the following table in the ans- wer sheet. No derivation required. https://i.imgur.com/YDzJH7M.png \begin{tikzpicture} \draw[very thick] (0, 0) circle(.5) node{\Large $c^{(t)}$}; \filldraw (0, -2) circle(.1); \draw[-stealth] (0, -1.9) -- (0, -.5); \draw (0, -4) circle(.5); \draw (-.3, -4.3) -- (.3, -3.7); \draw (.15, -4.15) node{\scriptsize $g$}; \draw (.3, -3.7) node[anchor=south west]{\scriptsize identity}; \draw[-stealth] (0, -3.5) -- (0, -2.1); \draw (0, 2) circle(.5); \draw (-.3, 1.7) -- (.3, 2.3); \draw (.15, 1.85) node{\scriptsize $h$}; \draw (.3, 2.3) node[anchor=south west]{\scriptsize identity}; \draw[-stealth] (0, .5) -- (0, 1.5); \filldraw (0, 4) circle(.1); \draw[-stealth] (0, 2.5) -- (0, 3.9); \draw (-3, -2) circle(.5); \draw[domain=-6:6, samples=97, variable=\t] plot({\t/20-3}, {.5/(1+exp(-\t))-2.25}); \draw (-2.85, -2.15) node{\scriptsize $f$}; \draw (-3, -2.5) node[anchor=north]{\scriptsize sigmoid}; \draw (-2.7, -1.7) node[anchor=south west]{\bf\scriptsize Input Gate}; \draw[-stealth] (-2.5, -2) -- (-.1, -2); \filldraw (.9, 0) circle(.1); \draw[-stealth] (.4, .3) to[out=0, in=120] (.84, .08); \draw[-stealth] (.84, -.08) to[out=240, in=0] (.4, -.3); \draw (3, 0) circle(.5); \draw[domain=-6:6, samples=97, variable=\t] plot({\t/20+3}, {.5/(1+exp(-\t))-.25}); \draw (3.15, -.15) node{\scriptsize $f$}; \draw (3, -.5) node[anchor=north]{\scriptsize sigmoid}; \draw (2.7, .3) node[anchor=south east]{\bf\scriptsize Forget Gate}; \draw[-stealth] (2.5, 0) -- (1, 0); \draw (-3, 4) circle(.5); \draw[domain=-6:6, samples=97, variable=\t] plot({\t/20-3}, {.5/(1+exp(-\t))+3.75}); \draw (-2.85, 3.85) node{\scriptsize $f$}; \draw (-3, 3.5) node[anchor=north]{\scriptsize sigmoid}; \draw (-2.7, 3.7) node[anchor=north west]{\bf\scriptsize Output Gate}; \draw[-stealth] (-2.5, 4) -- (-.1, 4); \draw[thick] (3.5, -4.5) node[anchor=south east]{\bf\scriptsize Block} rectangle (-3.5, 4.5); \filldraw[lightgray] (-.3, -5.8) rectangle (.3, -5.2); \draw (0, -5.5) node{+}; \draw[-stealth] (0, -5.2) -- (0, -4.5); \filldraw[lightgray] (-2.4, -7.8) rectangle (-1.8, -7.2); \draw (-2.1, -7.5) node{$x_1^{(t)}$}; \draw[-stealth] (-2.1, -7.2) node[anchor=south]{\tiny 1} -- (0, -5.8); \filldraw[lightgray] (-1, -7.8) rectangle (-.4, -7.2); \draw (-.7, -7.5) node{$x_2^{(t)}$}; \draw[-stealth] (-.7, -7.2) node[anchor=south]{\tiny\color{gray}0} -- (0, -5.8); \filldraw[lightgray] (.4, -7.8) rectangle (1, -7.2); \draw (.7, -7.5) node{$x_3^{(t)}$}; \draw[-stealth] (.7, -7.2) node[anchor=south]{\tiny\color{gray}0} -- (0, -5.8); \filldraw[gray] (1.8, -7.8) rectangle (2.4, -7.2); \draw (2.1, -7.5) node{1}; \draw[-stealth] (2.1, -7.2) node[anchor=south]{\tiny\color{gray}0} -- (0, -5.8); \filldraw[lightgray] (-4.8, -2.3) rectangle (-4.2, -1.7); \draw (-4.5, -2) node{+}; \draw[-stealth] (-4.2, -2) -- (-3.5, -2); \filldraw[lightgray] (-6.8, -.2) rectangle (-6.2, .4); \draw (-6.5, .1) node{$x_1^{(t)}$}; \draw[-stealth] (-6.2, .1) node[anchor=west]{\tiny\color{gray}0} -- (-4.8, -2); \filldraw[lightgray] (-6.8, -1.6) rectangle (-6.2, -1); \draw (-6.5, -1.3) node{$x_2^{(t)}$}; \draw[-stealth] (-6.2, -1.3) node[anchor=west]{\tiny\color{gray}0} -- (-4.8, -2); \filldraw[lightgray] (-6.8, -3) rectangle (-6.2, -2.4); \draw (-6.5, -2.7) node{$x_3^{(t)}$}; \draw[-stealth] (-6.2, -2.7) node[anchor=west]{\tiny $-100$} -- (-4.8, -2); \filldraw[gray] (-6.8, -4.4) rectangle (-6.2, -3.8); \draw (-6.5, -4.1) node{1}; \draw[-stealth] (-6.2, -4.1) node[anchor=west]{\tiny $-10$} -- (-4.8, -2); \filldraw[lightgray] (4.2, -.3) rectangle (4.8, .3); \draw (4.5, 0) node{+}; \draw[-stealth] (4.2, 0) -- (3.5, 0); \filldraw[lightgray] (6.2, 1.8) rectangle (6.8, 2.4); \draw (6.5, 2.1) node{$x_1^{(t)}$}; \draw[-stealth] (6.2, 2.1) node[anchor=east]{\tiny\color{gray}0} -- (4.8, 0); \filldraw[lightgray] (6.2, .4) rectangle (6.8, 1); \draw (6.5, .7) node{$x_2^{(t)}$}; \draw[-stealth] (6.2, .7) node[anchor=east]{\tiny\color{gray}0} -- (4.8, 0); \filldraw[lightgray] (6.2, -1) rectangle (6.8, -.4); \draw (6.5, -.7) node{$x_3^{(t)}$}; \draw[-stealth] (6.2, -.7) node[anchor=east]{\tiny $-100$} -- (4.8, 0); \filldraw[gray] (6.2, -2.4) rectangle (6.8, -1.8); \draw (6.5, -2.1) node{1}; \draw[-stealth] (6.2, -2.1) node[anchor=east]{\tiny 10} -- (4.8, 0); \filldraw[lightgray] (-4.8, 3.7) rectangle (-4.2, 4.3); \draw (-4.5, 4) node{+}; \draw[-stealth] (-4.2, 4) -- (-3.5, 4); \filldraw[lightgray] (-6.8, 5.8) rectangle (-6.2, 6.4); \draw (-6.5, 6.1) node{$x_1^{(t)}$}; \draw[-stealth] (-6.2, 6.1) node[anchor=west]{\tiny\color{gray}0} -- (-4.8, 4); \filldraw[lightgray] (-6.8, 4.4) rectangle (-6.2, 5); \draw (-6.5, 4.7) node{$x_2^{(t)}$}; \draw[-stealth] (-6.2, 4.7) node[anchor=west]{\tiny $-100$} -- (-4.8, 4); \filldraw[lightgray] (-6.8, 3) rectangle (-6.2, 3.6); \draw (-6.5, 3.3) node{$x_3^{(t)}$}; \draw[-stealth] (-6.2, 3.3) node[anchor=west]{\tiny 100} -- (-4.8, 4); \filldraw[gray] (-6.8, 1.6) rectangle (-6.2, 2.2); \draw (-6.5, 1.9) node{1}; \draw[-stealth] (-6.2, 1.9) node[anchor=west]{\tiny $-10$} -- (-4.8, 4); \draw[-stealth] (0, 4.1) -- (0, 5.5) node[anchor=south]{\Large $y^{(t)}$}; \end{tikzpicture} \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|}\hline Time & $t$ & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline \multirow{3}{*}{Input} & $x_1^{(t)}$ & 0 & 1 & 3 & 2 & 4 & $-3$ & 7 & 2 & 3 & $-5$ & 8\\\cline{2-13} & $x_2^{(t)}$ & 0 & 0 & 1 & $-2$ & $-3$ & $-1$ & 0 & 0 & 0 & 0 & $-2$\\\cline{2-13} & $x_3^{(t)}$ & 0 & 0 & $-1$ & $-1$ & $-1$ & 0 & 0 & 1 & 1 & $-1$ & $-1$\\\hline Memory cell & $c^{(t)}$ & 0 &&&&&&&&&&\\\hline Output & $y^{(t)}$ & 0 &&&&&&&&&&\\\hline \end{tabular} Problem 9: (22 pts) Feedforward and Back Propagation Consider the following neural network https://i.imgur.com/RzcU43s.png \begin{tikzpicture} \draw (0, 3) node{Input}; \filldraw[lightgray] (-.4, -.5) rectangle (.4, 2.5); \filldraw[gray] (-.3, -.3) rectangle (.3, .3); \draw (0, 0) node{$x_2$}; \filldraw[gray] (-.3, 1.7) rectangle (.3, 2.3); \draw (0, 2) node{$x_1$}; \foreach \i in {1, ..., 4}{ \filldraw[gray] (2.7, {5.7-2*\i}) rectangle (3.3, {6.3-2*\i}); \draw (3, {6-2*\i}) node{+}; \draw[-stealth] (.3, 2) -- (2.7, {6-2*\i}) node[anchor=south]{\tiny $w^1_{\i1}$}; \draw[-stealth] (.3, 0) -- (2.7, {6-2*\i}) node[anchor=north]{\tiny $w^1_{\i2}$}; } \filldraw[lightgray] (4.6, -2.5) rectangle (5.4, 4.5); \filldraw[gray] (5, 3) ellipse(.7 and 1.6); \filldraw[gray] (5, -1) ellipse(.7 and 1.6); \foreach \i in {1, ..., 4}{ \filldraw[darkgray] (4.7, {5.7-2*\i}) rectangle (5.3, {6.3-2*\i}); \draw[white] (5, {6-2*\i}) node{$z^1_\i$}; \draw[-stealth] (3.3, {6-2*\i}) -- (4.7, {6-2*\i}); } \filldraw[gray] (7, 3) ellipse(.5 and .3); \draw (7, 3) node{Max}; \draw[-stealth] (5.3, 4) -- (6.5, 3); \draw[-stealth] (5.3, 2) -- (6.5, 3); \filldraw[gray] (7, -1) ellipse(.5 and .3); \draw (7, -1) node{Max}; \draw[-stealth] (5.3, 0) -- (6.5, -1); \draw[-stealth] (5.3, -2) -- (6.5, -1); \filldraw[lightgray] (8.1, -1.5) rectangle (8.9, 3.5); \filldraw[darkgray] (8.2, 2.7) rectangle (8.8, 3.3); \draw[white] (8.5, 3) node{$a_1$}; \draw[-stealth] (7.5, 3) -- (8.2, 3); \filldraw[darkgray] (8.2, -1.3) rectangle (8.8, -.7); \draw[white] (8.5, -1) node{$a_2$}; \draw[-stealth] (7.5, -1) -- (8.2, -1); \foreach \i in {1, ..., 4}{ \filldraw[gray] (11.2, {5.7-2*\i}) rectangle (11.8, {6.3-2*\i}); \draw (11.5, {6-2*\i}) node{+}; \draw[-stealth] (8.8, 3) -- (11.2, {6-2*\i}) node[anchor=south]{\tiny $w^2_{\i1}$}; \draw[-stealth] (8.8, -1) -- (11.2, {6-2*\i}) node[anchor=north]{\tiny $w^2_{\i2}$}; } \filldraw[lightgray] (13.1, -2.5) rectangle (13.9, 4.5); \filldraw[gray] (13.5, 3) ellipse(.7 and 1.6); \filldraw[gray] (13.5, -1) ellipse(.7 and 1.6); \foreach \i in {1, ..., 4}{ \filldraw[darkgray] (13.2, {5.7-2*\i}) rectangle (13.8, {6.3-2*\i}); \draw[white] (13.5, {6-2*\i}) node{$z^2_\i$}; \draw[-stealth] (11.8, {6-2*\i}) -- (13.2, {6-2*\i}); } \filldraw[gray] (15.5, 3) ellipse(.5 and .3); \draw (15.5, 3) node{Max}; \draw[-stealth] (13.8, 4) -- (15, 3); \draw[-stealth] (13.8, 2) -- (15, 3); \filldraw[gray] (15.5, -1) ellipse(.5 and .3); \draw (15.5, -1) node{Max}; \draw[-stealth] (13.8, 0) -- (15, -1); \draw[-stealth] (13.8, -2) -- (15, -1); \draw (17, 4) node{Output}; \filldraw[lightgray] (16.6, -1.5) rectangle (17.4, 3.5); \filldraw[darkgray] (16.7, 2.7) rectangle (17.3, 3.3); \draw[white] (17, 3) node{$y_1$}; \draw[-stealth] (16, 3) -- (16.7, 3); \filldraw[darkgray] (16.7, -1.3) rectangle (17.3, -.7); \draw[white] (17, -1) node{$y_2$}; \draw[-stealth] (16, -1) -- (16.7, -1); \end{tikzpicture} The above neural network can be represented as a function $f_\theta$, namely \[\begin{bmatrix}y_1\\y_2\end{bmatrix} = f_\theta\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}\right),\] where parameter θ records all the weights $w^k_{ij}$. (a) (6 pts) Suppose the weights are initialized as follows https://i.imgur.com/gcHnMYA.png \begin{tikzpicture} \def\w{{1, 3, 2, -1, 0, 0, 4, -2, 5, 2, -2, -1, 0, 1, 1, -1}} \draw (0, 3) node{Input}; \filldraw[lightgray] (-.4, -.5) rectangle (.4, 2.5); \filldraw[gray] (-.3, -.3) rectangle (.3, .3); \draw (0, 0) node{$x_2$}; \filldraw[gray] (-.3, 1.7) rectangle (.3, 2.3); \draw (0, 2) node{$x_1$}; \foreach \i in {1, ..., 4}{ \filldraw[gray] (2.7, {5.7-2*\i}) rectangle (3.3, {6.3-2*\i}); \draw (3, {6-2*\i}) node{+}; \draw[-stealth] (.3, 2) -- (2.7, {6-2*\i}) node[anchor=south]{ \pgfmathparse{int(\w[2*\i-2])} $\pgfmathresult$}; \draw[-stealth] (.3, 0) -- (2.7, {6-2*\i}) node[anchor=north]{ \pgfmathparse{int(\w[2*\i-1])} $\pgfmathresult$}; } \filldraw[lightgray] (4.6, -2.5) rectangle (5.4, 4.5); \filldraw[gray] (5, 3) ellipse(.7 and 1.6); \filldraw[gray] (5, -1) ellipse(.7 and 1.6); \foreach \i in {1, ..., 4}{ \filldraw[darkgray] (4.7, {5.7-2*\i}) rectangle (5.3, {6.3-2*\i}); \draw[white] (5, {6-2*\i}) node{$z^1_\i$}; \draw[-stealth] (3.3, {6-2*\i}) -- (4.7, {6-2*\i}); } \filldraw[gray] (7, 3) ellipse(.5 and .3); \draw (7, 3) node{Max}; \draw[-stealth] (5.3, 4) -- (6.5, 3); \draw[-stealth] (5.3, 2) -- (6.5, 3); \filldraw[gray] (7, -1) ellipse(.5 and .3); \draw (7, -1) node{Max}; \draw[-stealth] (5.3, 0) -- (6.5, -1); \draw[-stealth] (5.3, -2) -- (6.5, -1); \filldraw[lightgray] (8.1, -1.5) rectangle (8.9, 3.5); \filldraw[darkgray] (8.2, 2.7) rectangle (8.8, 3.3); \draw[white] (8.5, 3) node{$a_1$}; \draw[-stealth] (7.5, 3) -- (8.2, 3); \filldraw[darkgray] (8.2, -1.3) rectangle (8.8, -.7); \draw[white] (8.5, -1) node{$a_2$}; \draw[-stealth] (7.5, -1) -- (8.2, -1); \foreach \i in {1, ..., 4}{ \filldraw[gray] (11.2, {5.7-2*\i}) rectangle (11.8, {6.3-2*\i}); \draw (11.5, {6-2*\i}) node{+}; \draw[-stealth] (8.8, 3) -- (11.2, {6-2*\i}) node[anchor=south]{ \pgfmathparse{int(\w[2*\i+6])} $\pgfmathresult$}; \draw[-stealth] (8.8, -1) -- (11.2, {6-2*\i}) node[anchor=north]{ \pgfmathparse{int(\w[2*\i+7])} $\pgfmathresult$}; } \filldraw[lightgray] (13.1, -2.5) rectangle (13.9, 4.5); \filldraw[gray] (13.5, 3) ellipse(.7 and 1.6); \filldraw[gray] (13.5, -1) ellipse(.7 and 1.6); \foreach \i in {1, ..., 4}{ \filldraw[darkgray] (13.2, {5.7-2*\i}) rectangle (13.8, {6.3-2*\i}); \draw[white] (13.5, {6-2*\i}) node{$z^2_\i$}; \draw[-stealth] (11.8, {6-2*\i}) -- (13.2, {6-2*\i}); } \filldraw[gray] (15.5, 3) ellipse(.5 and .3); \draw (15.5, 3) node{Max}; \draw[-stealth] (13.8, 4) -- (15, 3); \draw[-stealth] (13.8, 2) -- (15, 3); \filldraw[gray] (15.5, -1) ellipse(.5 and .3); \draw (15.5, -1) node{Max}; \draw[-stealth] (13.8, 0) -- (15, -1); \draw[-stealth] (13.8, -2) -- (15, -1); \draw (17, 4) node{Output}; \filldraw[lightgray] (16.6, -1.5) rectangle (17.4, 3.5); \filldraw[darkgray] (16.7, 2.7) rectangle (17.3, 3.3); \draw[white] (17, 3) node{$y_1$}; \draw[-stealth] (16, 3) -- (16.7, 3); \filldraw[darkgray] (16.7, -1.3) rectangle (17.3, -.7); \draw[white] (17, -1) node{$y_2$}; \draw[-stealth] (16, -1) -- (16.7, -1); \end{tikzpicture} If $(x_1, x_2) = (1, -1)$, please fill out the following table in the answer sheet. No derivation required. \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|}\hline Variable & $z^1_1$ & $z^1_2$ & $z^1_3$ & $z^1_4$ & $a_1$ & $a_2$ & $z^2_1$ & $z^2_2$ & $z^2_3$ & $z^2_4$ & $y_1$ & $y_2$\\\hline Value &&&&&&&&&&&&\\\hline \end{tabular} (b) (16 pts) Continuing (a), if the ground truth is $(\hat y_1, \hat y_2) = (-10, 7)$, and the loss function is defined as \[L(\theta) = \left\|\begin{bmatrix}\hat y_1\\\hat y_2\end{bmatrix} - f_\theta\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}\right)\right\|^2\] Perform back propagation and fill out the following table in the answer sheet. No derivation required. \begin{tabular}{|c|c|c|c|c|c|c|c|c|}\hline Variable & $\frac{\partial L}{\partial w^1_{11}}$ & $\frac{\partial L} {\partial w^1_{12}}$ & $\frac{\partial L}{\partial w^1_{21}}$ & $\frac{\partial L}{\partial w^1_{22}}$ & $\frac{\partial L} {\partial w^1_{31}}$ & $\frac{\partial L}{\partial w^1_{32}}$ & $\frac{\partial L}{\partial w^1_{41}}$ & $\frac{\partial L} {\partial w^1_{42}}$\\\hline Value &&&&&&&&\\\hline Variable & $\frac{\partial L}{\partial w^2_{11}}$ & $\frac{\partial L} {\partial w^2_{12}}$ & $\frac{\partial L}{\partial w^2_{21}}$ & $\frac{\partial L}{\partial w^2_{22}}$ & $\frac{\partial L} {\partial w^2_{31}}$ & $\frac{\partial L}{\partial w^2_{32}}$ & $\frac{\partial L}{\partial w^2_{41}}$ & $\frac{\partial L} {\partial w^2_{42}}$\\\hline Value &&&&&&&&\\\hline \end{tabular} -- 第01话 似乎在课堂上听过的样子 第02话 那真是太令人绝望了 第03话 已经没什麽好期望了 第04话 被当、21都是存在的 第05话 怎麽可能会all pass 第06话 这考卷绝对有问题啊 第07话 你能面对真正的分数吗 第08话 我,真是个笨蛋 第09话 这样成绩,教授绝不会让我过的 第10话 再也不依靠考古题 第11话 最後留下的补考 第12话 我最爱的学分 --



※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 36.230.21.38 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/NTU-Exam/M.1745260641.A.6DF.html ※ 编辑: xavier13540 (36.230.6.106 台湾), 04/22/2025 15:06:46







like.gif 您可能会有兴趣的文章
icon.png[问题/行为] 猫晚上进房间会不会有憋尿问题
icon.pngRe: [闲聊] 选了错误的女孩成为魔法少女 XDDDDDDDDDD
icon.png[正妹] 瑞典 一张
icon.png[心得] EMS高领长版毛衣.墨小楼MC1002
icon.png[分享] 丹龙隔热纸GE55+33+22
icon.png[问题] 清洗洗衣机
icon.png[寻物] 窗台下的空间
icon.png[闲聊] 双极の女神1 木魔爵
icon.png[售车] 新竹 1997 march 1297cc 白色 四门
icon.png[讨论] 能从照片感受到摄影者心情吗
icon.png[狂贺] 贺贺贺贺 贺!岛村卯月!总选举NO.1
icon.png[难过] 羡慕白皮肤的女生
icon.png阅读文章
icon.png[黑特]
icon.png[问题] SBK S1安装於安全帽位置
icon.png[分享] 旧woo100绝版开箱!!
icon.pngRe: [无言] 关於小包卫生纸
icon.png[开箱] E5-2683V3 RX480Strix 快睿C1 简单测试
icon.png[心得] 苍の海贼龙 地狱 执行者16PT
icon.png[售车] 1999年Virage iO 1.8EXi
icon.png[心得] 挑战33 LV10 狮子座pt solo
icon.png[闲聊] 手把手教你不被桶之新手主购教学
icon.png[分享] Civic Type R 量产版官方照无预警流出
icon.png[售车] Golf 4 2.0 银色 自排
icon.png[出售] Graco提篮汽座(有底座)2000元诚可议
icon.png[问题] 请问补牙材质掉了还能再补吗?(台中半年内
icon.png[问题] 44th 单曲 生写竟然都给重复的啊啊!
icon.png[心得] 华南红卡/icash 核卡
icon.png[问题] 拔牙矫正这样正常吗
icon.png[赠送] 老莫高业 初业 102年版
icon.png[情报] 三大行动支付 本季掀战火
icon.png[宝宝] 博客来Amos水蜡笔5/1特价五折
icon.pngRe: [心得] 新鲜人一些面试分享
icon.png[心得] 苍の海贼龙 地狱 麒麟25PT
icon.pngRe: [闲聊] (君の名は。雷慎入) 君名二创漫画翻译
icon.pngRe: [闲聊] OGN中场影片:失踪人口局 (英文字幕)
icon.png[问题] 台湾大哥大4G讯号差
icon.png[出售] [全国]全新千寻侘草LED灯, 水草

请输入看板名称,例如:BuyTogether站内搜寻

TOP