Typo fixes
[scilab.git] / scilab / modules / optimization / help / en_US / derivative.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <refentry xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:svg="http://www.w3.org/2000/svg" xmlns:ns4="http://www.w3.org/1999/xhtml" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:db="http://docbook.org/ns/docbook" xmlns:scilab="http://www.scilab.org" xml:id="derivative" xml:lang="en">
3     <refnamediv>
4         <refname>derivative</refname>
5         <refpurpose>
6             approximate derivatives of a function.
7             <emphasis role="bold">
8                 This function is obsolete. Please use the <link linkend="numderivative">numderivative</link> function instead.
9             </emphasis>
10         </refpurpose>
11     </refnamediv>
12     <refsynopsisdiv>
13         <title>Calling Sequence</title>
14         <synopsis>
15             derivative(F,x)
16             [J [,H]] = derivative(F,x [,h ,order ,H_form ,Q])
17         </synopsis>
18     </refsynopsisdiv>
19     <refsection>
20         <title>Arguments</title>
21         <variablelist>
22             <varlistentry>
23                 <term>F</term>
24                 <listitem>
25                     <para>
26                         a Scilab function F: <literal>R^n --&gt; R^m</literal> or a
27                         <literal>list(F,p1,...,pk)</literal>, where F is a scilab function
28                         in the form <literal>y=F(x,p1,...,pk)</literal>, p1, ..., pk being
29                         any scilab objects (matrices, lists,...).
30                     </para>
31                 </listitem>
32             </varlistentry>
33             <varlistentry>
34                 <term>x</term>
35                 <listitem>
36                     <para>real column vector of dimension n.</para>
37                 </listitem>
38             </varlistentry>
39             <varlistentry>
40                 <term>h</term>
41                 <listitem>
42                     <para>
43                         (optional) real, the stepsize used in the finite difference
44                         approximations.
45                     </para>
46                 </listitem>
47             </varlistentry>
48             <varlistentry>
49                 <term>order</term>
50                 <listitem>
51                     <para>
52                         (optional) integer, the order of the finite difference formula
53                         used to approximate the derivatives (order = 1,2 or 4, default is
54                         order=2 ).
55                     </para>
56                 </listitem>
57             </varlistentry>
58             <varlistentry>
59                 <term>H_form</term>
60                 <listitem>
61                     <para>
62                         (optional) string, the form in which the Hessian will be
63                         returned. Possible forms are:
64                     </para>
65                     <variablelist>
66                         <varlistentry>
67                             <term>H_form='default'</term>
68                             <listitem>
69                                 <para>
70                                     H is a m x (<literal>n^2</literal>) matrix ; in this
71                                     form, the k-th row of H corresponds to the Hessian of the k-th
72                                     component of F, given as the following row vector :
73                                 </para>
74                                 <informalequation>
75                                     <mediaobject>
76                                         <imageobject>
77                                             <imagedata align="center" fileref="../mml/derivative_equation_1.mml"/>
78                                         </imageobject>
79                                     </mediaobject>
80                                 </informalequation>
81                                 <para>((grad(F_k) being a row vector).</para>
82                             </listitem>
83                         </varlistentry>
84                         <varlistentry>
85                             <term>H_form='blockmat' :</term>
86                             <listitem>
87                                 <para>
88                                     H is a (mxn) x n block matrix : the classic Hessian
89                                     matrices (of each component of F) are stacked by row (H = [H1
90                                     ; H2 ; ... ; Hm] in scilab syntax).
91                                 </para>
92                             </listitem>
93                         </varlistentry>
94                         <varlistentry>
95                             <term>H_form='hypermat' :</term>
96                             <listitem>
97                                 <para>
98                                     H is a n x n matrix for m=1, and a n x n x m hypermatrix
99                                     otherwise. H(:,:,k) is the classic Hessian matrix of the k-th
100                                     component of F.
101                                 </para>
102                             </listitem>
103                         </varlistentry>
104                     </variablelist>
105                 </listitem>
106             </varlistentry>
107             <varlistentry>
108                 <term>Q</term>
109                 <listitem>
110                     <para>
111                         (optional) real matrix, orthogonal (default is eye(n,n)). Q is added to have the possibility to remove
112                         the arbitrariness of using the canonical basis to approximate the derivatives of a function and it should be an
113                         orthogonal matrix. It is not mandatory but better to recover the derivative as you need the inverse matrix (and
114                         so simply Q' instead of inv(Q)).
115                     </para>
116                 </listitem>
117             </varlistentry>
118             <varlistentry>
119                 <term>J</term>
120                 <listitem>
121                     <para>approximated Jacobian</para>
122                 </listitem>
123             </varlistentry>
124             <varlistentry>
125                 <term>H</term>
126                 <listitem>
127                     <para>approximated Hessian</para>
128                 </listitem>
129             </varlistentry>
130         </variablelist>
131     </refsection>
132     <refsection>
133         <title>Description</title>
134         <para>
135             Numerical approximation of the first and second derivatives of a
136             function F: <literal> R^n --&gt; R^m</literal> at the point x. The
137             Jacobian is computed by approximating the directional derivatives of the
138             components of F in the direction of the columns of Q. (For m=1, v=Q(:,k) :
139             grad(F(x))*v = Dv(F(x)).) The second derivatives are computed by
140             composition of first order derivatives. If H is given in its default form
141             the Taylor series of F(x) up to terms of second order is given by :
142         </para>
143         <informalequation>
144             <mediaobject>
145                 <imageobject>
146                     <imagedata align="center" fileref="../mml/derivative_equation_2.mml"/>
147                 </imageobject>
148             </mediaobject>
149         </informalequation>
150         <para>(([J,H]=derivative(F,x,H_form='default'), J=J(x), H=H(x).)</para>
151     </refsection>
152     <refsection>
153         <title>Performances</title>
154         <para>
155             If the problem is correctly scaled, increasing the accuracy reduces
156             the total error but requires more function evaluations.
157             The following list presents the number of function evaluations required to
158             compute the Jacobian depending on the order of the formula and the dimension of <literal>x</literal>,
159             denoted by <literal>n</literal>:
160         </para>
161         <itemizedlist>
162             <listitem>
163                 <para>
164                     <literal>order=1</literal>, the number of function evaluations is <literal>n+1</literal>,
165                 </para>
166             </listitem>
167             <listitem>
168                 <para>
169                     <literal>order=2</literal>, the number of function evaluations is <literal>2n</literal>,
170                 </para>
171             </listitem>
172             <listitem>
173                 <para>
174                     <literal>order=4</literal>, the number of function evaluations is <literal>4n</literal>.
175                 </para>
176             </listitem>
177         </itemizedlist>
178         <para>
179             Computing the Hessian matrix requires square the number of function evaluations,
180             as detailed in the following list.
181         </para>
182         <itemizedlist>
183             <listitem>
184                 <para>
185                     <literal>order=1</literal>, the number of function evaluations is <literal>(n+1)^2</literal>,
186                 </para>
187             </listitem>
188             <listitem>
189                 <para>
190                     <literal>order=2</literal>, the number of function evaluations is <literal>4n^2</literal>,
191                 </para>
192             </listitem>
193             <listitem>
194                 <para>
195                     <literal>order=4</literal>, the number of function evaluations is <literal>16n^2</literal>.
196                 </para>
197             </listitem>
198         </itemizedlist>
199     </refsection>
200     <refsection>
201         <title>Remarks</title>
202         <para>
203             The step size h must be small to get a low error but if it is too
204             small floating point errors will dominate by cancellation. As a rule of
205             thumb, do not change the default step size. To work around numerical
206             difficulties one may also change the order and/or choose different
207             orthogonal matrices Q (the default is eye(n,n)), especially if the
208             approximate derivatives are used in optimization routines. All the
209             optional arguments may also be passed as named arguments, so that one can
210             use calls in the form :
211         </para>
212         <programlisting>
213             <![CDATA[
214          derivative(F, x, H_form = "hypermat")
215          derivative(F, x, order = 4) etc.
216          ]]>
217         </programlisting>
218     </refsection>
219     <refsection>
220         <title>Examples</title>
221         <programlisting role="example">
222             <![CDATA[
223          function y=F(x)
224          y=[sin(x(1)*x(2))+exp(x(2)*x(3)+x(1)) ; sum(x.^3)];
225          endfunction
226
227          function y=G(x,p)
228          y=[sin(x(1)*x(2)*p)+exp(x(2)*x(3)+x(1)) ; sum(x.^3)];
229          endfunction
230
231          x=[1;2;3];
232          [J,H]=derivative(F,x,H_form='blockmat')
233
234          n=3;
235          // form an orthogonal matrix :
236          Q = qr(rand(n,n))
237          // Test order 1, 2 and 4 formulas.
238          for i=[1,2,4]
239          [J,H]=derivative(F,x,order=i,H_form='blockmat',Q=Q);
240          mprintf("order= %d \n",i);
241          H,
242          end
243
244          p=1;
245          h=1e-3;
246          [J,H]=derivative(list(G,p),x,h,2,H_form='hypermat');
247          H
248          [J,H]=derivative(list(G,p),x,h,4,Q=Q);
249          H
250
251          // Taylor series example:
252          dx=1e-3*[1;1;-1];
253          [J,H]=derivative(F,x);
254          F(x+dx)
255          F(x+dx)-F(x)
256          F(x+dx)-F(x)-J*dx
257          F(x+dx)-F(x)-J*dx-1/2*H*(dx .*. dx)
258
259          // A trivial example
260          function y=f(x,A,p,w)
261          y=x'*A*x+p'*x+w;
262          endfunction
263          // with Jacobian and Hessian given by J(x)=x'*(A+A')+p', and H(x)=A+A'.
264          A = rand(3,3);
265          p = rand(3,1);
266          w = 1;
267          x = rand(3,1);
268          [J,H]=derivative(list(f,A,p,w),x,h=1,H_form='blockmat')
269
270          // Since f(x) is quadratic in x, approximate derivatives of order=2 or 4 by finite
271          // differences should be exact for all h~=0. The apparent errors are caused by
272          // cancellation in the floating point operations, so a "big" h is chosen.
273          // Comparison with the exact matrices:
274          Je = x'*(A+A')+p'
275          He = A+A'
276          clean(Je - J)
277          clean(He - H)
278          ]]>
279         </programlisting>
280     </refsection>
281     <refsection>
282         <title>Accuracy issues</title>
283         <para>
284             The <literal>derivative</literal> function uses the same step <literal>h</literal>
285             whatever the direction and whatever the norm of <literal>x</literal>.
286             This may lead to a poor scaling with respect to <literal>x</literal>.
287             An accurate scaling of the step is not possible without many evaluations
288             of the function. Still, the user has the possibility to compare the results
289             produced by the <literal>derivative</literal> and the <literal>numdiff</literal>
290             functions. Indeed, the <literal>numdiff</literal> function scales the
291             step depending on the absolute value of <literal>x</literal>.
292             This scaling may produce more accurate results, especially if
293             the magnitude of <literal>x</literal> is large.
294         </para>
295         <para>
296             In the following Scilab script, we compute the derivative of an
297             univariate quadratic function. The exact derivative can be
298             computed analytically and the relative error is computed.
299             In this rather extreme case, the <literal>derivative</literal> function
300             produces no significant digits, while the <literal>numdiff</literal>
301             function produces 6 significant digits.
302         </para>
303         <programlisting role="example">
304             <![CDATA[
305          // Difference between derivative and numdiff when x is large
306          function y = myfunction (x)
307          y = x*x;
308          endfunction
309          x = 1.e100;
310          fe = 2.0 * x;
311          fp = derivative(myfunction,x);
312          e = abs(fp-fe)/fe;
313          mprintf("Relative error with derivative: %e\n",e)
314          fp = numdiff(myfunction,x);
315          e = abs(fp-fe)/fe;
316          mprintf("Relative error with numdiff: %e\n",e)
317          ]]>
318         </programlisting>
319         <para>
320             The previous script produces the following output.
321         </para>
322         <programlisting role="example">
323             <![CDATA[
324          Relative error with derivative: 1.000000e+000
325          Relative error with numdiff: 7.140672e-006
326          ]]>
327         </programlisting>
328         <para>
329             In a practical situation, we may not know what is the correct numerical
330             derivative. Still, we are warned that the numerical derivatives
331             should be used with caution in this specific case.
332         </para>
333     </refsection>
334     <refsection role="see also">
335         <title>See Also</title>
336         <simplelist type="inline">
337             <member>
338                 <link linkend="interp">interp</link>
339             </member>
340             <member>
341                 <link linkend="interp2d">interp2d</link>
342             </member>
343             <member>
344                 <link linkend="splin">splin</link>
345             </member>
346             <member>
347                 <link linkend="eval_cshep2d">eval_cshep2d</link>
348             </member>
349             <member>
350                 <link linkend="diff">diff</link>
351             </member>
352             <member>
353                 <link linkend="numdiff">numdiff</link>
354             </member>
355             <member>
356                 <link linkend="derivat">derivat</link>
357             </member>
358             <member>
359                 <link linkend="numderivative">numderivative</link>
360             </member>
361         </simplelist>
362     </refsection>
363     <refsection>
364         <title>History</title>
365         <revhistory>
366             <revision>
367                 <revnumber>5.5.0</revnumber>
368                 <revremark>Tagged as obsolete. Will be removed in Scilab 6.0.0.</revremark>
369             </revision>
370         </revhistory>
371     </refsection>
372     <refsection>
373         <title>Appendix</title>
374         <para>
375             We now discuss how a script using the <literal>derivative</literal> function can be
376             updated to use the <literal>numderivative</literal> function.
377         </para>
378         <para>
379             Consider the function:
380         </para>
381         <programlisting role="example">
382             <![CDATA[
383          function y = F(x)
384          f1 = sin(x(1)*x(2)) + exp(x(2)*x(3)+x(1))
385          f2 = sum(x.^3)
386          y = [f1 ; f2];
387          endfunction
388          ]]>
389         </programlisting>
390         <para>
391             and the point:
392         </para>
393         <programlisting role="example">
394             <![CDATA[
395          x = [1 ; 2 ; 3];
396          ]]>
397         </programlisting>
398         <para>
399             Therefore, the statement:
400         </para>
401         <programlisting role="example">
402             <![CDATA[
403          [J, H] = derivative(F, x)
404          ]]>
405         </programlisting>
406         <para>
407             can be replaced with
408         </para>
409         <programlisting role="example">
410             <![CDATA[
411          [J, H] = numderivative(F, x)
412          ]]>
413         </programlisting>
414         <para>
415             The statement:
416         </para>
417         <programlisting role="example">
418             <![CDATA[
419          [J, H] = derivative(F, x, order=4)
420          ]]>
421         </programlisting>
422         <para>
423             can be replaced by:
424         </para>
425         <programlisting role="example">
426             <![CDATA[
427          [J, H] = numderivative(F, x, [], 4)
428          ]]>
429         </programlisting>
430         <para>
431             The statement:
432         </para>
433         <programlisting role="example">
434             <![CDATA[
435          [J, H] = derivative(F, x, H_form="blockmat")
436          ]]>
437         </programlisting>
438         <para>
439             can be replaced by:
440         </para>
441         <programlisting role="example">
442             <![CDATA[
443          [J, H] = numderivative(F, x, [], [], "blockmat")
444          ]]>
445         </programlisting>
446         <para>
447             We emphasize that <literal>numderivative</literal> and <literal>derivative</literal> do not
448             use the same strategy for the choice of the default step <literal>h</literal>.
449             Hence, in general, the functions <literal>numderivative</literal> and <literal>derivative</literal>
450             will not produce exactly the same outputs.
451             Still, in general, we expect that <literal>numderivative</literal> is more accurate.
452             It is not easy to get the same step in <literal>numderivative</literal> as in <literal>derivative</literal>,
453             because the choice of the step depends on the degree of
454             differenciation (Jacobian or Hessian) and the order of the formula.
455         </para>
456     </refsection>
457 </refentry>