ニューラルネットワークについて一度で詳細な説明をステップバイステップな説明を得ることは難しい。コースやビデオでは常にある部分の説明がぬけている。それですべての情報を集めることそして説明をひとつのブログにステップバイステップに行うことをこころみた。

このブログを８のセクションに分けた、それが最も妥当であることが分かったので

１．モデル表現

２．モデル表現数学

３．アクティベーション関数

４．バイアス・ノード

５．コスト(損失）関数

６．前方伝搬（プロパゲーション）計算

７．バックプロパゲーション・アルゴリズム

８．コード実装

さー始めよう

モデル表現

人工的ニューラル・ネットワークは、動物の脳を構成するconstitute)

生物学的ニューラルネットワークによりインスパイヤーされた計算システムです。

そのシステムは例を考慮することにより、一般的にはタスクスペシフィックな規則でプログラムされることなく、タスクを実行するために学ぶ（learn)

図１　ニューラルネットワーク

ニューラルネットワークは、次の三つのタイプのレイヤーで構成される

１．入力レイヤー　ニューらるネットワークの最初の層

２．hiddenレイヤー　　入力層と出力層の間の仲介層intermidiateですべての計算が行われる層

３．出力層　入力されたデータに対して結果を生産する

上の図では三つの黄色のサークル（円）がある。これらは入力層を表現し、通常ベクトルXとして表現される。４つの青のサークル緑のサークルはヒデュン層を表している。これらのサークルはアクティベーション　ノードを表しており通常WとΘで表される赤のサークルは出力層あるいは予測値である。（あるいは多重出力クラス／タイプは多重値

各ノードは次の層の各ノードに接続されます。各ノードとの接続（逆矢印）は特定の重荷weightをもっています。主には、そのノードが次の層からノード上に与えるインパクト衝撃としてみｒｙことができます。それなので一つのノードが次のように見えます。

図２ニューラルネットワークからのノード

まず図１の青いノードをみてみましょう。前の黄色い層からのすべてのノードはこれに接続されています。これらのすべての接続は重み(impact)を表します。黄色い層かあのすべてのノードの値に重みが掛けられ、これらすべてが総和されトップの青いノードにある値を与えます。

青いノードは事前に定義されたアクティベーション関数イメージ２の（ユニットステップ関数）でもしこのノードがアクティベートされるかあるいは以下にアクティブになるか、この総和された値にもとづき定義される。さらなるノードは値１のノードバイアスノードと呼ばれる。

モデル表現数学

数学的方程式を理解するためにより単純なニューラルネットワークモデルを使用します。

このモデルは４つのノード（３と一つのバイアス・ノード）　４ノード（３＋１バイアス・ノード）の一つのヒデュン層、そして出力層

hidden layer with 4 nodes (3 + 1 “bias”) and one output node.

イメージ３　単純なニューラルネットワーク

バイアスノードをｘ０そしてa0とマークします。入力ノードは一つのベクトルX,隠ぺい層ノードををベクトルAで置きます。

イメージ４　X 入力層とhidden層ベクトルA

重み（矢印）は通常ΘとWで記されます

この場合、Θで記します。入力層および隠ぺい層は３ｘ４行列で表します。隠ぺい層と出力層間の重みは１ｘ４行列を表します。

ネットワークがもし層ｊにaユニット層ｊ＋１にｂユニット、でΘｊは次元はbx(a+1)になります。

図５　層１の重み行列

次に隠ぺい層の悪手ベーション層ノードを計算しようとしています。これを行うために

入力ベクトルX、最初の層（X*Θ１)に対すると重み行列Θ１、そしてアクティバーと関数ｇを適用する。

そして得られるのは

図６　アクティベーションノードの計算

隠ぺい層ベクトルを二番目の層のΘを掛けることで仮定関数（A*Θの）

Image 7: Compute output node value (hypothesis)

図７出力ノードの値を計算　仮定関数

この例は一つだけの隠ぺい層と４つのノードです。多重の隠ぺい層で各層が複数のノードのニューラルネットワークを一般化しょうとするならば、次の式が得られます。

Image 8: Generalized Compute node value function

図８　一般化されたノード値関数の計算

ここでｎノードのL層でmノードのL-1層

アクティベートション関数

ニューラルネットワークでアクティベート関数は、もし与えられたノードがアクティベートされるかどうかは重みづけされた和に基づく。この重みづけされた和の値をｘと定義する。

このセクションで、なぜステップ関数と線形関数は約二たたずsigned function はもっとも普通なアクティベート関数として話す。また他の関数については別にかきましょう。

ステップ関数

最初のアイデアの一つはステップ関数と呼ばれる(離散出力）の使用。ここで閾値を使用し

ｓ＞閾値　ならノード(値　１）をアクティベート　値　１

ｓ＜閾値　ノードをアクティベートしない　値　０

これは良いようにみえるがノード１は出力とし０か１のみもつ。この場合多重の出力クラス(ノード）問題を生じる。この問題は可能な多重出力　値一を持つためにクラス／ノード」がアクティベートされるべきとなる。従って正しく分類や決定を行うことができない。

線形関数

出力地の範囲を定義するためにもう一つの可能性は線形関数を定義すること

ｙ＝aｘ

しかし、ニューラルネットワーク内の線形関数のみを使ったら出力層が線形関数であるべき

従って非線形のデータをマップできない。これに対する証明は次のように示される。

f(x)=x+3

g(x)=2x+5

そうすると関数合成により

ｇ（ｆ（ｘ））＝２（ｘ＋３）＋５＝２X+11

これはまた線形関数となる

sigmoid関数

これはあぅてぃベーション関数として最も広く使われている。その方程式」以下の公式で与えられる。

s(x)=1/(1+e^s)=e^x/(e^x+1)

Image 9: Sigmoid Equation. source: wikipedia

図９sigmoide関数

Image 10: Sigmoid Function. source: wikipedia

図10　　それを非常に適切なものにするいくつかの性質は

・非線形であること

・値の範囲は　（０，１）であること

・ｘ軸（－２，２）ではこの関数は非常にsteepで、これは１或は０の値を分類する傾向の関数生ず

この性質によりノードが０から１の間のどんな値でもとることができる。多重の出力クラスの場合、出力クラスのアクティベーションのことなる確率の結果となる。そしてもっとも高いアクティベーション(確率)値を選ぶ。

Biasノード

biasノードを使うことは成功学習モデルを作成するのに通常危機的。短くするにはbias値は、データに対しフィットするようにアクティベーション関数を左か右にシフトすることを可能にする。(出力としてよりよい予測関数）

いかに三つのsigmoid関数がありこれが変数ｘにいくらかの値を掛け算、加法・引算してどのようにして関数に影響を与えることができるかを気づかせてくれる。

・ｘにかける　　関数を深くする

・ｘに足すか引くかする。

図１１

Image 11: Sigmoid Functions. source: desmos.com

コスト関数

Let’s start with defining the general equation for the cost function. コスト関数に対する一般的方程式を定義するところから始めよう。

This function represent the sum of the error, difference between the predicted value and the real (labeled) value.この関数は、誤差の和、予想された値と実際（ラベル付けされた）の値との差を表している。

Image 12: General Cost functoin. source: coursera.org

図１２　一般的なコスト関数　資料元　coursera.org
Since this is type of a classification problem y can only take discrete values {0,1}. これは分類のタイプなのでｙは離散値｛０，１｝のみとることができる。

It can only be in one type of class.それはクラスの一つの中にのみある。

For example if we classify images of dogs (class 1), cats (class 2) and birds (class 3).例えば

もし犬（クラス１）と猫（クラス２）、鳥（クラス３）の像を分類するとき

If the input image is dog. もし入力イメージが犬の場合

The output classes will be value 1 for dog class and value 0 for the other classes.出力のクラスは犬のクラスの１の値を持ち他のクラスに対しては０の値を持つ。

This means that we want our hypothesis to satisfyこれは我々が仮定関数が満たすことを意味している。

Image 13: Hypothesis function range values

図１３　仮定関数の値の範囲
So that’s why we will define our hypothesis asそれで我々が仮定関数を次のように

定義する言われです。

Image 14: Hypothesis function　図１４　仮定関数
Where g in this case will be Sigmoid function, since this function has range values between (0,1).ここでこの場合ｇはsigmoid関数となる。なぜならこの関数は（０，１）の範囲体を持つから。

Our goal is to optimize the cost function so we need to find min J(θ). 我々のゴールはコスト関数を４最適化するこｇとなので最小のJ（Θ）を見つける必要がある。

But Sigmoid function is a “non-convex” function (“Image 15”) which means that there are multiple local minimums. . 関数は凹（非凸）関数（図１５）しかしSigmiidなので、これは多重の局所最小値があることを意味する。So it’s not guaranteed to converge (find) to the global minimumそれで大局化最小値に」収束（見つける）ことを意味する。

What we need is “convex” function in order gradient descent algorithm to be able to find the global minimum (minimize J(θ)).われわれが必要とするのは凸関数、勾配降下アルゴリズムが大局最小値（最小J（Θ））が見つけることｇできる。

In order to do that we use log function.これを行うために対数（log)関数を使う。
Image 15: Convex vs Non-convex function. source: researchgate.com

図１５　凸　対　凹関数　　情報基reseachgate,com
So that’s why we use following cost function for neural networks
Image 16: Neural Network cost function. source: coursera.org

Image 16: Neural Network cost function. source: coursera.org

図１６　ニューラルネットワークコスト関数　　情報基　coursera.org
In case where labeled value y is equal to 1 the hypothesis is -log(h(x)) or -log(1-h(x)) otherwise.ラベル付けされた値ｙが１の場合仮定関数は-log(h(x))あるいは-log(1-h(x))となる

The intuition is pretty simple if we look at the function graphs.関数グラフをみると、直観はかなり単純です。

Let first look at the case where y=1. ｙ＝１の場合についてみてましょう。

Then -log(h(x)) would look like the graph below. すると-log(h(x))は以下のようあグラフとなる。

And we are only interested in the (0,1) x-axis interval since hypothesis can only take values in that range (“Image 13”)そして、仮定はその範囲（図１３）の範囲内の値のみとることがdekirunode4できるので我々は（０，１）ｘ軸間隔にのみ興味がある。
Image 17: Cost function -log(h(x)) . source: desmos.com

Image 17: Cost function -log(h(x)) . source: desmos.com

図１７コスト関数　-log(h(x)) 情報元　　desmos.com

What we can see from the graph is that if y=1 and h(x) approaches value of 1 (x-axis) the cost approaches the value 0 (h(x)-y would be 0) since it’s the right prediction.

グラフから読み取れるものは、正しい予測なので、もしｙ＝１でh(x)が（ｘ軸の）１の値に近づくならコストは値０(h(x)-yは０）に近づく。

Otherwise if h(x) approaches 0 the cost function goes to infinity (very large cost).

そうでない場合もしh(x)は０に近づき、コスト関数は無限（非常に高いコスト）に行く。

In the other case where y=0, the cost function is -log(1-h(x))

ｙ＝０で他の場合、コスト関数は-lig(1-h(x))

Image 18: -log(1-h) cost function. source: desmos.com

図１８　-log(1-h)コスト関数　　上表元　desmos.com

From the graph here we can see that if h(x) approaches value of 0 the cost would approach 0 since it’s also the right prediction in this case.グラフから、これもこの場合の正しい予測なのでもしh(x)が値０に近づくということが解る。
Since y (labeled value) is always equal to 0 or 1 we can write cost function in one equation.ｙ（ラベル付けされた値）が常に０か１の場合、コスト関数は一つの方程式を描くことができる。
Image 19: Cost function equation. source: coursera.org

Image 19: Cost function equation. source: coursera.org

図１９　コスト関数方程式　　情報元　　course.org

If we fully write our cost function with the summation we would get:

もし、我々が、得られた和から完全にコスト関数を書けたら次を得る

Image 20: Cost function in case of one output node. source:

図２０　一つの出力ノードの場合のコスト関数。情報元coursera.org

And this is for the case where there is only one node in the output layer of Neural Network. そしてこれは、ニューラルネットワークの出力層に一つだけのノードがある場合だ。

If we generalize this for multiple output nodes (multiclass classification) what we get is:

もしこれを多重出力ノード（多重分類）の場合次が得られる。
Image 21: Generalized Cost function. source: coursera.org

図２１　一般化されたコスト関数　情報元coursera.org
The right parts of the equations represent cost function “regularization”. 方程式の右側の部分はコスト関数を表している。

This regularization prevent the data from “overfitting”, by reducing the magnitude/values of θ.この正規化は、大きさ／値を制限し、データの　過剰近似から防ぐ。

前方プロパゲーション計算

This process of Forward propagation is actually getting the Neural Network output value based on a given input. この前方拡散のプロセスは実際に与えられた入力に基づいて出力値を得る。

This algorithm is used to calculate the cost value. このアルゴリズムはコストの値を計算するのにつかわれる。

What it does is the same mathematical process as the one described in section 2 “Model Representation Mathematics”. これが行うのはセクション２のモデル表現数学に掛かれているものと同じ数学的プロセスです。

Where in the end we get our hypothesis value “Image 7”.こうして最後に仮定の値　図７を得る。

After we got the h(x) value (hypothesis) we use the Cost function equation (“Image 21”) to calculate the cost for the given set of inputs.h(x)値を得た後、与えられた入力の値に対してコストを計算するためにコスト関数方程式（イメージ２１）を使用する。

Image 22: Calculate Forward propagation　図２２前方拡散の計算
Here we can notice how forward propagation works and how a Neural Network generates the predictions.ここで、如何に前方拡散が働き、如何にニューラルネットワークが予測を生成するかについて気が付くことができる。

バックプロパゲーション・アルゴリズム

What we want to do is minimize the cost function J(θ) using the optimal set of values for θ (weights).我々が行いたいことは、Θ(重み）に対する最適な値の集合を使ってコスト関数J(x)を最適化することです。

Backpropagation is a method we use in order to compute the partial derivative of J(θ).

J(Θ）の偏微分を計算するために使われる方法です。
This partial derivative value is then used in Gradient descent algorithm (“Image 23”) for calculating the θ values for the Neural Network that minimize the cost function J(θ).

この偏微分の値は、コスト関数を最小化するニューラルネットワークにたいするΘの値を計算するために勾配降下アルゴリズム(イメージ２３）で使われる。

Image 23: General form of gradient descent. source: coursera.org

図２３　勾配降下の一般形式　　情報元coursera.com

Backpropagation algorithm has 5 steps:逆拡散アルゴリズムニハ５つのアルゴリズム
1.Set a(1) = X; for the training examples

a(1)=Xをセット　　　訓練用の例
2.Perfo前方拡散実行し他の層(l=2..L)に対してa(l)を計算。rm forward propagation and compute a(l) for the other layers (l = 2…L)
３．Use y and compute the delta value for the last layer δ(L) = h(x) — y

ｙを使って最後の層　δ（L）＝ｈ（ｘ）－ｙに対してδ値を計算
４.Compute the δ(l) values backwards for each layer (described in “Math behind Backpropagation” section)

各層（逆拡散のバックグラウンドのセクションに書かれている）に対しδ（ｌ）値を計算する
５.Calculate derivative values Δ(l) = (a(l))^T ∘ δ(l+1) for each layer, which represent the derivative of cost J(θ) with respect to θ(l) for layer l

各層に対し微分値Δ(l) = (a(l))^T ∘ δ(l+1)を計算。これは層ｌに対するΘ（ｌ）に関するJ(Θ）の微分を表す。

Backpropagation is about determining how changing the weights impact the overall cost in the neural network.逆拡散は、いかに重みのニューラルネットワークのオーバーオール（総体的）のコストインパクトを変えるかを決定する。

What it does is propagating the “error” backwards in the neural network. これｇ行うのはニューラルネットワークの誤差を逆方向に拡散することです。On the way back it is finding how much each weight is contributing in the overall “error”. 逆の方向では、如何に多くの各重みが相対的誤差に寄与しているかを見つけます。The weights that contribute more to the overall “error” will have larger derivation values, which means that they will change more (when computing Gradient descent).相対的誤差により寄与する重みは大きな微分値を持つでしょう。これは（勾配降下を計算するとき）変化させるかを意味します。
Now that we have sense of what Backpropagation algorithm is doing we can dive deeper in the concepts and math behind.なにが逆拡散アルゴリズムが行うかの間隔を持ち背景にある概念や数学に深く飛び込むことができます。

Why derivatives ?　なぜ　微分

The derivative of a function (in our case J(θ)) on each variable (in our case weight θ) tells us the sensitivity of the function with respect to that variable or how changing the variable impacts the function value.各変数に関して我々の場合J（Θ）の）関数の微分は変数に関してあるいは以下に変数が間数値にインパクトを変えるか関数の感度を教えてくれる。
Let’s look at a simple example neural network　単純なニューラルネットワークの例をみてみよう

Image 24: Simple Neural Network　図２４　単純なニューラルネットワーク

There are two input nodes x and y. 二つの入力ノードｘ、ｙがあります。The output function is calculating the product x and y. 出力関数はｘとｙの積を計算します。We can now compute the partial derivatives for both nodesこれで両方のノードに対する偏微分を計算できます。

Image 25: Derivatives to respect to y and x of f(x,y) = xy function
図２５　f（x、y）関数のｘ、ｙに関する偏微

The partial derivative with respect to x is saying that if x value increase for some value ϵ then it would increase the function (product xy) by 7ϵ and the partial derivative with respect to y is saying that if y value increase for some value ϵ then it would increase the function by 3ϵ.
ｘに関する偏微分は、もしいくらかの値ε増加したなら関数（積ｘｙ）は７ε増加するそしてｙに関する偏微分はもしｙの値がいくらかε増加したなら関数を３ε増加させる。

As we defined, Backpropagation algorithm is calculating the derivative of cost function with respect to each θ weight parameter. 我々が定義したように、逆拡散アルゴリズムはコスト関数を各Θ重みパラメータに関して微分を計算する。
By doing this we determine how sensitive is the cost function J(θ) to each of these θ weight parameters.これをすることによりコスト関数J（Θ）がこれらの各Θ重みパラメータに関してどれだけ感度があるかを決定する。
It also help us determine how much we should change each θ weight parameter when computing the Gradient descent.これはまた勾配降下を計算するとき各Θ重みパラメータをどれだけ変更すべきかを決定することを助ける。
So at the end we get model that best fits our data.このため最後にはこのモデルが我々のデータにもっともフィットするモデルを得ることができる。

Math behind Backpropagation　逆拡散の背後にある数学

We will by using the neural network model below as starting point to derive the equations.

以下のニューラルネットワークを使って方程式を導くスタート位置として使います。

Image 26: Neural Network　　図２６　ニューラルネットワーク

In this model we got 3 output nodes (K) and 2 hidden layers. As previously defined, the cost function for the neural network is:このモデルでは３つの出力ノード（K)と二つの隠ぺい層を得る。以前定義したようにコスト関数は次のようになる。

Image 27: Generalized Cost function. source: coursera.org
図２７一般化されたコスト関数

What we need is to compute the partial derivative of J(θ) with respect to each θ parameters.我々に必要なのは各Θパラメータに関してJ(Θ）の偏微分を計算することです。

We are going to leave out the summarization since we are using vectorized implementation (matrix multiplication). 我々はベクトル化手法（行列積）の導入をつかうこおにより和から離れることができる。

Also we can leave out the regularization (right part of the equation above) and we will compute it separately at the end．われわれはまた正規化（上の方程式の右の部分）からも離れることができる。そして最後にそれを別々に計算できる。

Since it is addition the derivative can be computed independently.また正規化からも離れることができる。

NOTE: Vectorized implementation will be used so we calculate for all training examples at once.

注意：ベクトル化の導入によりトレーニング用の例を一度に計算できる。

We start with defining the derivative rules that we will use.我々が使おうとしている微分ルールを定義することからはじめよう。

Now we define the basic equation for our neural network model where l is layer notation and L is for the last layer.ここでニューラルネットワークの基本方程式を定義します。ここでｌは層を表し、Lha最後の層を示す。

Image 29: Initial Neural Network model equations
図２９　初期のニューラルネットワークもでるの方程式

In our case L has value 4, since we got 4 layers in our model.この場合Lは４、なので４層のモデルを得る。

So let’s start by computing the partial derivative with respect to weights between 3rd and 4th layer.それなので３番目と４番ん目の層の間に関する偏微分の計算からはじめよう。

Image 30: Derivative of θ parameters between 3rd and 4th layer
図３０　３層と４層の間のΘパラメータの微分

Step (6) — Sigmoid derivative　　ステップ６　　sigmoid微分

To explain the step (6) we need to calculate the partial derivative of sigmoid function.

ステップ６を説明するあめにsigmoid関数の偏微分を計算する必要がある。

Image 31: Derivative of Sigmoid function
図３１　sigmoid関数の微分

In case of the last layer L we got,最後の層Lの場合卯木を得る

Image 32: Output layer equation
図３２　出力層の方程式

so,　　それで

Image 33: Output layer equation
図　３３　出力層方程式

Step (11) — Get rid of the summarization (Σ)　っステップ（１１）総和（Σ）を取り除く

Also in the last step (11) it’s important to note that we need to multiply δ by a transpose in order to get rid of the summarization (1…m for training examples).

またステップ（１１）では、総和（例では1..m）を取り除くにはδをaの転置を描ける必要があることに気付くのが重要である。
δ — matrix with dimensions　　次元のδ‐行列

[number_of_training_examples, output_layer_size]　トレーニングの例の数、出力層のサイズ

so this also means that we will get rid from the second summarization (1…K for number of output nodes).従ってこれは、二番目の総和（出力のノードの数に対して1..K）を取り除ことをいみする。

a — matrix with dimensions　次元のa行列

[hidden_layer_size, number_of_training_examples] 隠ぺい層の数　トレイニング例の数

Now we continue with the next derivative for the θ parameters between 2nd and 3rd layer. ここで2番目の層と３番目の層の間のΘパラメータに対する微分で続ける。

For this derivation we can start from step (9) (“Image 30”).この微分は、ステップ（９）（像３０）から始めることができる、

Since θ(2) is inside a(3) function we need to apply the “Chain Rule” when calculating the derivative (step(6) from derivative rules on “Image 28”).Θ(2)はa(3)関数の中にあり、微分を計算するとき(図２８の微分ルールからステップ（６））チェイン・ルールを適用する必要がある。

Image 34: Derivative of θ parameters between 2nd and 3rd layer
図３４　２層と３層の間のΘパラメータの微分

Now we got the derivative for θ parameter between 2nd and 3rd layer.これで２層と３層の間のΘパラメータの微分が得られる。

What we left to do is compute the derivative for θ parameter between input layer and 2nd layer. 残っている為すべきことは入力層と第二層間のΘパラメータの微分を計算することです。

By doing this we will see that the same process (equations) will be repeated so we can derive general δ and derivative equations. これを行うことにより、同じプロセス(方程式）が繰り返されるので一般的でδと微分方程式を導くことができる。

Again we continue from step (3) (“Image 34”).ふたたびステップ（３）(図３４）から続けます。

Image 35: Derivative of θ parameters between input and 2nd layer
図３５　入力層と二番目の層の間のΘパラメータの微分

From the equation above we can derive equations for δ parameter and derivative with respect to θ parameter.上の方程式からδに対する方程式とΘパラメータに関する微分を導くことができる。

Image 36: Recursive δ equation
図３６　　再帰的δ方程式

Image 37: Derivative of J (cost) with respect to θ in layer l equation
図３７　層１の方程式のθに関するｊ（コスト）の微分

At the end we get is three matrices (same as θ weight matrices) with same dimensions as the θ weight matrices and calculated derivatives for each θ parameter.終わりに、我々がΘ重み行列と同じ次元の３つの行列で各Θパラメータに対し計算された微分が得られる。

Add the regularization　正規化を追加

As already mentioned regularization is needed for preventing the model from overfitting the data. すでに述べたようにモデルがデータを超過フィッティングしてしまうことを防ぐことが必要です。

We have already defined regularization for our cost function which is the right part of the equation defined on “Image 21“.我々は図２１で定義された方程式の右側の部分であるコスト関数に対し正規化をすでに定義しました。

Image 38: Regularization equation for Cost function
図３８　コスト関数の正規化方程式

In order to add the regularization for the gradient (partial derivative) we need to compute the partial derivative for the regularization above.勾配(偏微分）に対する正規化を追加するために上の正規化に対する偏微分を計算する必要があります。

Image 39: Regularization equation for gradient (partial derivative)
図３９　勾配(偏微分）に対する正規化方程式

Which means just adding the sum of all theta values from each layer to the partial derivatives with respect to θ.これは、各層のすべてのΘに関する偏微分のすべてのΘ値の総和を求めることです。

コード実装

We can now implement all the equations in code where we will calculate the Cost and derivatives (using Backpropagation)われわれは、（逆拡散を使って）コストと微分の計算するコードのすべての方程式を導入できso we can use them in Gradient descent algorithm later to optimize θ parameters for our model.そのため、後に我々のモデルに対しΘパラメータを最適化するために、それらを勾配降下アルゴリズムにおいて使用することができる。

Image 38: Code implementation of Neural Network Cost function and Backpropagation algorithm
図２８　ニューラルネットワークのコスト関数と逆伝搬アルゴリズムへのコードの導入

Conclusion　結論

Hopefully this was clear and easy to understand. クリヤーでイージーであると思っています。

If you think that some part needs better explanation please feel free to add a comment or suggestion. もしあなたがより良い説明が必要だと考える部分があるなら遠慮なくコメントをお願いします。For any questions feel free to contact me.どのような質問でも自由に私にコンタクトしてください。