Added a more obvious message for numdiff and derivative obsolescence
[scilab.git] / scilab / modules / optimization / help / en_US / derivative.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <refentry xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:svg="http://www.w3.org/2000/svg" xmlns:ns4="http://www.w3.org/1999/xhtml" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:db="http://docbook.org/ns/docbook" xmlns:scilab="http://www.scilab.org" xml:id="derivative" xml:lang="en">
3    <refnamediv>
4       <refname>derivative</refname>
5       <refpurpose>approximate derivatives of a function. <emphasis role="bold"> This function is obsolete. Please use the <link linkend="numderivative">numderivative</link> function instead.</emphasis></refpurpose>
6    </refnamediv>
7    <refsynopsisdiv>
8       <title>Calling Sequence</title>
9       <synopsis>
10          derivative(F,x)
11          [J [,H]] = derivative(F,x [,h ,order ,H_form ,Q])
12       </synopsis>
13    </refsynopsisdiv>
14    <refsection>
15       <title>Arguments</title>
16       <variablelist>
17          <varlistentry>
18             <term>F</term>
19             <listitem>
20                <para>
21                   a Scilab function F: <literal>R^n --&gt; R^m</literal> or a
22                   <literal>list(F,p1,...,pk)</literal>, where F is a scilab function
23                   in the form <literal>y=F(x,p1,...,pk)</literal>, p1, ..., pk being
24                   any scilab objects (matrices, lists,...).
25                </para>
26             </listitem>
27          </varlistentry>
28          <varlistentry>
29             <term>x</term>
30             <listitem>
31                <para>real column vector of dimension n.</para>
32             </listitem>
33          </varlistentry>
34          <varlistentry>
35             <term>h</term>
36             <listitem>
37                <para>
38                   (optional) real, the stepsize used in the finite difference
39                   approximations.
40                </para>
41             </listitem>
42          </varlistentry>
43          <varlistentry>
44             <term>order</term>
45             <listitem>
46                <para>
47                   (optional) integer, the order of the finite difference formula
48                   used to approximate the derivatives (order = 1,2 or 4, default is
49                   order=2 ).
50                </para>
51             </listitem>
52          </varlistentry>
53          <varlistentry>
54             <term>H_form</term>
55             <listitem>
56                <para>
57                   (optional) string, the form in which the Hessean will be
58                   returned. Possible forms are:
59                </para>
60                <variablelist>
61                   <varlistentry>
62                      <term>H_form='default'</term>
63                      <listitem>
64                         <para>
65                            H is a m x (<literal>n^2</literal>) matrix ; in this
66                            form, the k-th row of H corresponds to the Hessean of the k-th
67                            component of F, given as the following row vector :
68                         </para>
69                         <informalequation>
70                            <mediaobject>
71                               <imageobject>
72                                  <imagedata align="center" fileref="../mml/derivative_equation_1.mml"/>
73                               </imageobject>
74                            </mediaobject>
75                         </informalequation>
76                         <para>((grad(F_k) being a row vector).</para>
77                      </listitem>
78                   </varlistentry>
79                   <varlistentry>
80                      <term>H_form='blockmat' :</term>
81                      <listitem>
82                         <para>
83                            H is a (mxn) x n block matrix : the classic Hessean
84                            matrices (of each component of F) are stacked by row (H = [H1
85                            ; H2 ; ... ; Hm] in scilab syntax).
86                         </para>
87                      </listitem>
88                   </varlistentry>
89                   <varlistentry>
90                      <term>H_form='hypermat' :</term>
91                      <listitem>
92                         <para>
93                            H is a n x n matrix for m=1, and a n x n x m hypermatrix
94                            otherwise. H(:,:,k) is the classic Hessean matrix of the k-th
95                            component of F.
96                         </para>
97                      </listitem>
98                   </varlistentry>
99                </variablelist>
100             </listitem>
101          </varlistentry>
102          <varlistentry>
103             <term>Q</term>
104             <listitem>
105                <para>
106                   (optional) real matrix, orthogonal (default is eye(n,n)). Q is added to have the possibility to remove
107                   the arbitrariness of using the canonical basis to approximate the derivatives of a function and it should be an
108                   orthogonal matrix. It is not mandatory but better to recover the derivative as you need the inverse matrix (and
109                   so simply Q' instead of inv(Q)).
110                </para>
111             </listitem>
112          </varlistentry>
113          <varlistentry>
114             <term>J</term>
115             <listitem>
116                <para>approximated Jacobian</para>
117             </listitem>
118          </varlistentry>
119          <varlistentry>
120             <term>H</term>
121             <listitem>
122                <para>approximated Hessian</para>
123             </listitem>
124          </varlistentry>
125       </variablelist>
126    </refsection>
127    <refsection>
128       <title>Description</title>
129       <para>
130          Numerical approximation of the first and second derivatives of a
131          function F: <literal> R^n --&gt; R^m</literal> at the point x. The
132          Jacobian is computed by approximating the directional derivatives of the
133          components of F in the direction of the columns of Q. (For m=1, v=Q(:,k) :
134          grad(F(x))*v = Dv(F(x)).) The second derivatives are computed by
135          composition of first order derivatives. If H is given in its default form
136          the Taylor series of F(x) up to terms of second order is given by :
137       </para>
138       <informalequation>
139          <mediaobject>
140             <imageobject>
141                <imagedata align="center" fileref="../mml/derivative_equation_2.mml"/>
142             </imageobject>
143          </mediaobject>
144       </informalequation>
145       <para>(([J,H]=derivative(F,x,H_form='default'), J=J(x), H=H(x).)</para>
146    </refsection>
147    <refsection>
148       <title>Performances</title>
149       <para>
150          If the problem is correctly scaled, increasing the accuracy reduces
151          the total error but requires more function evaluations.
152          The following list presents the number of function evaluations required to
153          compute the Jacobian depending on the order of the formula and the dimension of <literal>x</literal>,
154          denoted by <literal>n</literal>:
155       </para>
156       <itemizedlist>
157          <listitem>
158             <para>
159                <literal>order=1</literal>, the number of function evaluations is <literal>n+1</literal>,
160             </para>
161          </listitem>
162          <listitem>
163             <para>
164                <literal>order=2</literal>, the number of function evaluations is <literal>2n</literal>,
165             </para>
166          </listitem>
167          <listitem>
168             <para>
169                <literal>order=4</literal>, the number of function evaluations is <literal>4n</literal>.
170             </para>
171          </listitem>
172       </itemizedlist>
173       <para>
174          Computing the Hessian matrix requires square the number of function evaluations,
175          as detailed in the following list.
176       </para>
177       <itemizedlist>
178          <listitem>
179             <para>
180                <literal>order=1</literal>, the number of function evaluations is <literal>(n+1)^2</literal>,
181             </para>
182          </listitem>
183          <listitem>
184             <para>
185                <literal>order=2</literal>, the number of function evaluations is <literal>4n^2</literal>,
186             </para>
187          </listitem>
188          <listitem>
189             <para>
190                <literal>order=4</literal>, the number of function evaluations is <literal>16n^2</literal>.
191             </para>
192          </listitem>
193       </itemizedlist>
194    </refsection>
195    <refsection>
196       <title>Remarks</title>
197       <para>
198          The step size h must be small to get a low error but if it is too
199          small floating point errors will dominate by cancellation. As a rule of
200          thumb, do not change the default step size. To work around numerical
201          difficulties one may also change the order and/or choose different
202          orthogonal matrices Q (the default is eye(n,n)), especially if the
203          approximate derivatives are used in optimization routines. All the
204          optional arguments may also be passed as named arguments, so that one can
205          use calls in the form :
206       </para>
207       <programlisting>
208          <![CDATA[
209          derivative(F, x, H_form = "hypermat")
210          derivative(F, x, order = 4) etc.
211          ]]>
212       </programlisting>
213    </refsection>
214    <refsection>
215       <title>Examples</title>
216       <programlisting role="example">
217          <![CDATA[
218          function y=F(x)
219          y=[sin(x(1)*x(2))+exp(x(2)*x(3)+x(1)) ; sum(x.^3)];
220          endfunction
221
222          function y=G(x,p) 
223          y=[sin(x(1)*x(2)*p)+exp(x(2)*x(3)+x(1)) ; sum(x.^3)];
224          endfunction
225
226          x=[1;2;3];
227          [J,H]=derivative(F,x,H_form='blockmat')
228
229          n=3;
230          // form an orthogonal matrix :   
231          Q = qr(rand(n,n))
232          // Test order 1, 2 and 4 formulas.
233          for i=[1,2,4]
234          [J,H]=derivative(F,x,order=i,H_form='blockmat',Q=Q);
235          mprintf("order= %d \n",i);
236          H,
237          end
238
239          p=1;
240          h=1e-3;
241          [J,H]=derivative(list(G,p),x,h,2,H_form='hypermat');
242          H
243          [J,H]=derivative(list(G,p),x,h,4,Q=Q);
244          H
245
246          // Taylor series example:
247          dx=1e-3*[1;1;-1];
248          [J,H]=derivative(F,x);
249          F(x+dx)
250          F(x+dx)-F(x)
251          F(x+dx)-F(x)-J*dx
252          F(x+dx)-F(x)-J*dx-1/2*H*(dx .*. dx)
253
254          // A trivial example
255          function y=f(x,A,p,w)
256          y=x'*A*x+p'*x+w; 
257          endfunction
258          // with Jacobian and Hessean given by J(x)=x'*(A+A')+p', and H(x)=A+A'.
259          A = rand(3,3); 
260          p = rand(3,1); 
261          w = 1;
262          x = rand(3,1);
263          [J,H]=derivative(list(f,A,p,w),x,h=1,H_form='blockmat')
264
265          // Since f(x) is quadratic in x, approximate derivatives of order=2 or 4 by finite
266          // differences should be exact for all h~=0. The apparent errors are caused by
267          // cancellation in the floating point operations, so a "big" h is choosen.
268          // Comparison with the exact matrices:
269          Je = x'*(A+A')+p'
270          He = A+A'
271          clean(Je - J)
272          clean(He - H)
273          ]]>
274       </programlisting>
275    </refsection>
276    <refsection>
277       <title>Accuracy issues</title>
278       <para>
279          The <literal>derivative</literal> function uses the same step <literal>h</literal>
280          whatever the direction and whatever the norm of <literal>x</literal>.
281          This may lead to a poor scaling with respect to <literal>x</literal>.
282          An accurate scaling of the step is not possible without many evaluations
283          of the function. Still, the user has the possibility to compare the results
284          produced by the <literal>derivative</literal> and the <literal>numdiff</literal>
285          functions. Indeed, the <literal>numdiff</literal> function scales the
286          step depending on the absolute value of <literal>x</literal>.
287          This scaling may produce more accurate results, especially if
288          the magnitude of <literal>x</literal> is large.
289       </para>
290       <para>
291          In the following Scilab script, we compute the derivative of an
292          univariate quadratic function. The exact derivative can be
293          computed analytically and the relative error is computed.
294          In this rather extreme case, the <literal>derivative</literal> function
295          produces no significant digits, while the <literal>numdiff</literal>
296          function produces 6 significant digits.
297       </para>
298       <programlisting role="example">
299          <![CDATA[
300          // Difference between derivative and numdiff when x is large
301          function y = myfunction (x)
302          y = x*x;
303          endfunction
304          x = 1.e100;
305          fe = 2.0 * x;
306          fp = derivative(myfunction,x);
307          e = abs(fp-fe)/fe;
308          mprintf("Relative error with derivative: %e\n",e)
309          fp = numdiff(myfunction,x);
310          e = abs(fp-fe)/fe;
311          mprintf("Relative error with numdiff: %e\n",e)
312          ]]>
313       </programlisting>
314       <para>
315          The previous script produces the following output.
316       </para>
317       <programlisting role="example">
318          <![CDATA[
319          Relative error with derivative: 1.000000e+000
320          Relative error with numdiff: 7.140672e-006
321          ]]>
322       </programlisting>
323       <para>
324          In a practical situation, we may not know what is the correct numerical
325          derivative. Still, we are warned that the numerical derivatives
326          should be used with caution in this specific case.
327       </para>
328    </refsection>
329    <refsection role="see also">
330       <title>See Also</title>
331       <simplelist type="inline">
332          <member>
333             <link linkend="interp">interp</link>
334          </member>
335          <member>
336             <link linkend="interp2d">interp2d</link>
337          </member>
338          <member>
339             <link linkend="splin">splin</link>
340          </member>
341          <member>
342             <link linkend="eval_cshep2d">eval_cshep2d</link>
343          </member>
344          <member>
345             <link linkend="diff">diff</link>
346          </member>
347          <member>
348             <link linkend="numdiff">numdiff</link>
349          </member>
350          <member>
351             <link linkend="derivat">derivat</link>
352          </member>
353          <member>
354             <link linkend="numderivative">numderivative</link>
355          </member>
356       </simplelist>
357       </refsection>  <refsection>
358       <title>History</title>
359       <revhistory>
360          <revision>
361             <revnumber>5.5.0</revnumber>
362             <revremark>Tagged as obsolete. Will be removed in Scilab 6.0.0.</revremark>
363          </revision>
364       </revhistory>
365    </refsection>
366    <refsection>
367       <title>Appendix</title>
368       <para>
369          We now discuss how a script using the <literal>derivative</literal> function can be
370          updated to use the <literal>numderivative</literal> function.
371       </para>
372       <para>
373          Consider the function:
374       </para>
375       <programlisting role="example">
376          <![CDATA[
377          function y = F(x)
378          f1 = sin(x(1)*x(2)) + exp(x(2)*x(3)+x(1))
379          f2 = sum(x.^3)
380          y = [f1 ; f2];
381          endfunction
382          ]]>
383       </programlisting>
384       <para>
385          and the point:
386       </para>
387       <programlisting role="example">
388          <![CDATA[
389          x = [1 ; 2 ; 3];
390          ]]>
391       </programlisting>
392       <para>
393          Therefore, the statement:
394       </para>
395       <programlisting role="example">
396          <![CDATA[
397          [J, H] = derivative(F, x)
398          ]]>
399       </programlisting>
400       <para>
401          can be replaced with
402       </para>
403       <programlisting role="example">
404          <![CDATA[
405          [J, H] = numderivative(F, x)
406          ]]>
407       </programlisting>
408       <para>
409          The statement:
410       </para>
411       <programlisting role="example">
412          <![CDATA[
413          [J, H] = derivative(F, x, order=4)
414          ]]>
415       </programlisting>
416       <para>
417          can be replaced by:
418       </para>
419       <programlisting role="example">
420          <![CDATA[
421          [J, H] = numderivative(F, x, [], 4)
422          ]]>
423       </programlisting>
424       <para>
425          The statement:
426       </para>
427       <programlisting role="example">
428          <![CDATA[
429          [J, H] = derivative(F, x, H_form="blockmat")
430          ]]>
431       </programlisting>
432       <para>
433          can be replaced by:
434       </para>
435       <programlisting role="example">
436          <![CDATA[
437          [J, H] = numderivative(F, x, [], [], "blockmat")
438          ]]>
439       </programlisting>
440       <para>
441          We emphasize that <literal>numderivative</literal> and <literal>derivative</literal> do not
442          use the same strategy for the choice of the default step <literal>h</literal>.
443          Hence, in general, the functions <literal>numderivative</literal> and <literal>derivative</literal>
444          will not produce exactly the same outputs.
445          Still, in general, we expect that <literal>numderivative</literal> is more accurate.
446          It is not easy to get the same step in <literal>numderivative</literal> as in <literal>derivative</literal>,
447          because the choice of the step depends on the degree of
448          differenciation (Jacobian or Hessian) and the order of the formula.
449       </para>
450    </refsection>
451 </refentry>