1 <?xml version="1.0" encoding="UTF-8"?>
3     <refnamediv>
4         <refname>derivative</refname>
5         <refpurpose>approximate derivatives of a function</refpurpose>
6     </refnamediv>
7     <refsynopsisdiv>
8         <title>Calling Sequence</title>
9         <synopsis>derivative(F,x)
10             [J [,H]] = derivative(F,x [,h ,order ,H_form ,Q])
11         </synopsis>
12     </refsynopsisdiv>
13     <refsection>
14         <title>Arguments</title>
15         <variablelist>
16             <varlistentry>
17                 <term>F</term>
18                 <listitem>
19                     <para>
20                         a Scilab function F: <literal>R^n --&gt; R^m</literal> or a
21                         <literal>list(F,p1,...,pk)</literal>, where F is a scilab function
22                         in the form <literal>y=F(x,p1,...,pk)</literal>, p1, ..., pk being
23                         any scilab objects (matrices, lists,...).
24                     </para>
25                 </listitem>
26             </varlistentry>
27             <varlistentry>
28                 <term>x</term>
29                 <listitem>
30                     <para>real column vector of dimension n.</para>
31                 </listitem>
32             </varlistentry>
33             <varlistentry>
34                 <term>h</term>
35                 <listitem>
36                     <para>(optional) real, the stepsize used in the finite difference
37                         approximations.
38                     </para>
39                 </listitem>
40             </varlistentry>
41             <varlistentry>
42                 <term>order</term>
43                 <listitem>
44                     <para>(optional) integer, the order of the finite difference formula
45                         used to approximate the derivatives (order = 1,2 or 4, default is
46                         order=2 ).
47                     </para>
48                 </listitem>
49             </varlistentry>
50             <varlistentry>
51                 <term>H_form</term>
52                 <listitem>
53                     <para>(optional) string, the form in which the Hessean will be
54                         returned. Possible forms are:
55                     </para>
56                     <variablelist>
57                         <varlistentry>
58                             <term>H_form='default'</term>
59                             <listitem>
60                                 <para>
61                                     H is a m x (<literal>n^2</literal>) matrix ; in this
62                                     form, the k-th row of H corresponds to the Hessean of the k-th
63                                     component of F, given as the following row vector :
64                                 </para>
65                                 <informalequation>
66                                     <mediaobject>
67                                         <imageobject>
68                                             <imagedata align="center" fileref="../mml/derivative_equation_1.mml"/>
69                                         </imageobject>
70                                     </mediaobject>
71                                 </informalequation>
72                                 <para>((grad(F_k) being a row vector).</para>
73                             </listitem>
74                         </varlistentry>
75                         <varlistentry>
76                             <term>H_form='blockmat' :</term>
77                             <listitem>
78                                 <para>H is a (mxn) x n block matrix : the classic Hessean
79                                     matrices (of each component of F) are stacked by row (H = [H1
80                                     ; H2 ; ... ; Hm] in scilab syntax).
81                                 </para>
82                             </listitem>
83                         </varlistentry>
84                         <varlistentry>
85                             <term>H_form='hypermat' :</term>
86                             <listitem>
87                                 <para>H is a n x n matrix for m=1, and a n x n x m hypermatrix
88                                     otherwise. H(:,:,k) is the classic Hessean matrix of the k-th
89                                     component of F.
90                                 </para>
91                             </listitem>
92                         </varlistentry>
93                     </variablelist>
94                 </listitem>
95             </varlistentry>
96             <varlistentry>
97                 <term>Q</term>
98                 <listitem>
99                     <para>(optional) real matrix, orthogonal (default is eye(n,n)). Q is added to have the possibility to remove
100                         the arbitrariness of using the canonical basis to approximate the derivatives of a function and it should be an
101                         orthogonal matrix. It is not mandatory but better to recover the derivative as you need the inverse matrix (and
102                         so simply Q' instead of inv(Q)).
103                     </para>
104                 </listitem>
105             </varlistentry>
106             <varlistentry>
107                 <term>J</term>
108                 <listitem>
109                     <para>approximated Jacobian</para>
110                 </listitem>
111             </varlistentry>
112             <varlistentry>
113                 <term>H</term>
114                 <listitem>
115                     <para>approximated Hessian</para>
116                 </listitem>
117             </varlistentry>
118         </variablelist>
119     </refsection>
120     <refsection>
121         <title>Description</title>
122         <para>Numerical approximation of the first and second derivatives of a
123             function F: <literal> R^n --&gt; R^m</literal> at the point x. The
124             Jacobian is computed by approximating the directional derivatives of the
125             components of F in the direction of the columns of Q. (For m=1, v=Q(:,k) :
126             grad(F(x))*v = Dv(F(x)).) The second derivatives are computed by
127             composition of first order derivatives. If H is given in its default form
128             the Taylor series of F(x) up to terms of second order is given by :
129         </para>
130         <informalequation>
131             <mediaobject>
132                 <imageobject>
133                     <imagedata align="center" fileref="../mml/derivative_equation_2.mml"/>
134                 </imageobject>
135             </mediaobject>
136         </informalequation>
137         <para>(([J,H]=derivative(F,x,H_form='default'), J=J(x), H=H(x).)</para>
138     </refsection>
139     <refsection>
140         <title>Performances</title>
141         <para>
142             If the problem is correctly scaled, increasing the accuracy reduces
143             the total error but requires more function evaluations.
144             The following list presents the number of function evaluations required to
145             compute the Jacobian depending on the order of the formula and the dimension of <literal>x</literal>,
146             denoted by <literal>n</literal>:
147         </para>
148         <itemizedlist>
149             <listitem>
150                 <para>
151                     <literal>order=1</literal>, the number of function evaluations is <literal>n+1</literal>,
152                 </para>
153             </listitem>
154             <listitem>
155                 <para>
156                     <literal>order=2</literal>, the number of function evaluations is <literal>2n</literal>,
157                 </para>
158             </listitem>
159             <listitem>
160                 <para>
161                     <literal>order=4</literal>, the number of function evaluations is <literal>4n</literal>.
162                 </para>
163             </listitem>
164         </itemizedlist>
165         <para>Computing the Hessian matrix requires square the number of function evaluations,
166             as detailed in the following list.
167         </para>
168         <itemizedlist>
169             <listitem>
170                 <para>
171                     <literal>order=1</literal>, the number of function evaluations is <literal>(n+1)^2</literal>,
172                 </para>
173             </listitem>
174             <listitem>
175                 <para>
176                     <literal>order=2</literal>, the number of function evaluations is <literal>4n^2</literal>,
177                 </para>
178             </listitem>
179             <listitem>
180                 <para>
181                     <literal>order=4</literal>, the number of function evaluations is <literal>16n^2</literal>.
182                 </para>
183             </listitem>
184         </itemizedlist>
185     </refsection>
186     <refsection>
187         <title>Remarks</title>
188         <para>The step size h must be small to get a low error but if it is too
189             small floating point errors will dominate by cancellation. As a rule of
190             thumb, do not change the default step size. To work around numerical
191             difficulties one may also change the order and/or choose different
192             orthogonal matrices Q (the default is eye(n,n)), especially if the
193             approximate derivatives are used in optimization routines. All the
194             optional arguments may also be passed as named arguments, so that one can
195             use calls in the form :
196         </para>
197         <programlisting><![CDATA[
198 derivative(F, x, H_form = "hypermat")
199 derivative(F, x, order = 4) etc.
200  ]]></programlisting>
201     </refsection>
202     <refsection>
203         <title>Examples</title>
204         <programlisting role="example"><![CDATA[
205 function y=F(x)
206   y=[sin(x(1)*x(2))+exp(x(2)*x(3)+x(1)) ; sum(x.^3)];
207 endfunction
209 function y=G(x,p)
210   y=[sin(x(1)*x(2)*p)+exp(x(2)*x(3)+x(1)) ; sum(x.^3)];
211 endfunction
213 x=[1;2;3];
214 [J,H]=derivative(F,x,H_form='blockmat')
216 n=3;
217 // form an orthogonal matrix :
218 Q = qr(rand(n,n))
219 // Test order 1, 2 and 4 formulas.
220 for i=[1,2,4]
221   [J,H]=derivative(F,x,order=i,H_form='blockmat',Q=Q);
222   mprintf("order= %d \n",i);
223   H,
224 end
226 p=1;
227 h=1e-3;
228 [J,H]=derivative(list(G,p),x,h,2,H_form='hypermat');
229 H
230 [J,H]=derivative(list(G,p),x,h,4,Q=Q);
231 H
233 // Taylor series example:
234 dx=1e-3*[1;1;-1];
235 [J,H]=derivative(F,x);
236 F(x+dx)
237 F(x+dx)-F(x)
238 F(x+dx)-F(x)-J*dx
239 F(x+dx)-F(x)-J*dx-1/2*H*(dx .*. dx)
241 // A trivial example
242 function y=f(x,A,p,w)
243   y=x'*A*x+p'*x+w;
244 endfunction
245 // with Jacobian and Hessean given by J(x)=x'*(A+A')+p', and H(x)=A+A'.
246 A = rand(3,3);
247 p = rand(3,1);
248 w = 1;
249 x = rand(3,1);
250 [J,H]=derivative(list(f,A,p,w),x,h=1,H_form='blockmat')
252 // Since f(x) is quadratic in x, approximate derivatives of order=2 or 4 by finite
253 // differences should be exact for all h~=0. The apparent errors are caused by
254 // cancellation in the floating point operations, so a "big" h is choosen.
255 // Comparison with the exact matrices:
256 Je = x'*(A+A')+p'
257 He = A+A'
258 clean(Je - J)
259 clean(He - H)
260  ]]></programlisting>
261     </refsection>
262     <refsection>
263         <title>Accuracy issues</title>
264         <para>
265             The <literal>derivative</literal> function uses the same step <literal>h</literal>
266             whatever the direction and whatever the norm of <literal>x</literal>.
267             This may lead to a poor scaling with respect to <literal>x</literal>.
268             An accurate scaling of the step is not possible without many evaluations
269             of the function. Still, the user has the possibility to compare the results
270             produced by the <literal>derivative</literal> and the <literal>numdiff</literal>
271             functions. Indeed, the <literal>numdiff</literal> function scales the
272             step depending on the absolute value of <literal>x</literal>.
273             This scaling may produce more accurate results, especially if
274             the magnitude of <literal>x</literal> is large.
275         </para>
276         <para>
277             In the following Scilab script, we compute the derivative of an
278             univariate quadratic function. The exact derivative can be
279             computed analytically and the relative error is computed.
280             In this rather extreme case, the <literal>derivative</literal> function
281             produces no significant digits, while the <literal>numdiff</literal>
282             function produces 6 significant digits.
283         </para>
284         <programlisting role="example"><![CDATA[
285  // Difference between derivative and numdiff when x is large
286 function y = myfunction (x)
287   y = x*x;
288 endfunction
289 x = 1.e100;
290 fe = 2.0 * x;
291 fp = derivative(myfunction,x);
292 e = abs(fp-fe)/fe;
293 mprintf("Relative error with derivative: %e\n",e)
294 fp = numdiff(myfunction,x);
295 e = abs(fp-fe)/fe;
296 mprintf("Relative error with numdiff: %e\n",e)
297 ]]></programlisting>
298         <para>
299             The previous script produces the following output.
300         </para>
301         <programlisting role="example"><![CDATA[
302 Relative error with derivative: 1.000000e+000
303 Relative error with numdiff: 7.140672e-006
304 ]]></programlisting>
305         <para>
306             In a practical situation, we may not know what is the correct numerical
307             derivative. Still, we are warned that the numerical derivatives
308             should be used with caution in this specific case.
309         </para>
310     </refsection>
311     <refsection role="see also">
312         <title>See Also</title>
313         <simplelist type="inline">
314             <member>
316             </member>
317             <member>
319             </member>
320             <member>
322             </member>
323             <member>