/usr/share/doc/diffutils-doc/Projects.html is in diffutils-doc 1:3.5-3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This manual is for GNU Diffutils
(version 3.5, 4 August 2016),
and documents the GNU diff, diff3,
sdiff, and cmp commands for showing the
differences between files and the GNU patch command for
using their output to update files.
Copyright (C) 1992-1994, 1998, 2001-2002, 2004, 2006, 2009-2016 Free
Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled
"GNU Free Documentation License." -->
<!-- Created by GNU Texinfo 6.3, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Comparing and Merging Files: Projects</title>
<meta name="description" content="Comparing and Merging Files: Projects">
<meta name="keywords" content="Comparing and Merging Files: Projects">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Index.html#Index" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="index.html#Top" rel="up" title="Top">
<link href="Copying-This-Manual.html#Copying-This-Manual" rel="next" title="Copying This Manual">
<link href="Standards-conformance.html#Standards-conformance" rel="prev" title="Standards conformance">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en">
<a name="Projects"></a>
<div class="header">
<p>
Next: <a href="Copying-This-Manual.html#Copying-This-Manual" accesskey="n" rel="next">Copying This Manual</a>, Previous: <a href="Standards-conformance.html#Standards-conformance" accesskey="p" rel="prev">Standards conformance</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Future-Projects"></a>
<h2 class="chapter">18 Future Projects</h2>
<p>Here are some ideas for improving <acronym>GNU</acronym> <code>diff</code> and
<code>patch</code>. The <acronym>GNU</acronym> project has identified some
improvements as potential programming projects for volunteers. You
can also help by reporting any bugs that you find.
</p>
<p>If you are a programmer and would like to contribute something to the
<acronym>GNU</acronym> project, please consider volunteering for one of these
projects. If you are seriously contemplating work, please write to
<a href="mailto:gvc@gnu.org">gvc@gnu.org</a> to coordinate with other volunteers.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top">• <a href="#Shortcomings" accesskey="1">Shortcomings</a>:</td><td> </td><td align="left" valign="top">Suggested projects for improvements.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Bugs" accesskey="2">Bugs</a>:</td><td> </td><td align="left" valign="top">Reporting bugs.
</td></tr>
</table>
<hr>
<a name="Shortcomings"></a>
<div class="header">
<p>
Next: <a href="#Bugs" accesskey="n" rel="next">Bugs</a>, Up: <a href="#Projects" accesskey="u" rel="up">Projects</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Suggested-Projects-for-Improving-GNU-diff-and-patch"></a>
<h3 class="section">18.1 Suggested Projects for Improving <acronym>GNU</acronym> <code>diff</code> and <code>patch</code></h3>
<a name="index-projects-for-directories"></a>
<p>One should be able to use <acronym>GNU</acronym> <code>diff</code> to generate a
patch from any pair of directory trees, and given the patch and a copy
of one such tree, use <code>patch</code> to generate a faithful copy of
the other. Unfortunately, some changes to directory trees cannot be
expressed using current patch formats; also, <code>patch</code> does not
handle some of the existing formats. These shortcomings motivate the
following suggested projects.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top">• <a href="#Internationalization" accesskey="1">Internationalization</a>:</td><td> </td><td align="left" valign="top">Handling multibyte and varying-width characters.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Changing-Structure" accesskey="2">Changing Structure</a>:</td><td> </td><td align="left" valign="top">Handling changes to the directory structure.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Special-Files" accesskey="3">Special Files</a>:</td><td> </td><td align="left" valign="top">Handling symbolic links, device special files, etc.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Unusual-File-Names" accesskey="4">Unusual File Names</a>:</td><td> </td><td align="left" valign="top">Handling file names that contain unusual characters.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Time-Stamp-Order" accesskey="5">Time Stamp Order</a>:</td><td> </td><td align="left" valign="top">Outputting diffs in time stamp order.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Ignoring-Changes" accesskey="6">Ignoring Changes</a>:</td><td> </td><td align="left" valign="top">Ignoring certain changes while showing others.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Speedups" accesskey="7">Speedups</a>:</td><td> </td><td align="left" valign="top">Improving performance.
</td></tr>
</table>
<hr>
<a name="Internationalization"></a>
<div class="header">
<p>
Next: <a href="#Changing-Structure" accesskey="n" rel="next">Changing Structure</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Handling-Multibyte-and-Varying_002dWidth-Characters"></a>
<h4 class="subsection">18.1.1 Handling Multibyte and Varying-Width Characters</h4>
<a name="index-multibyte-characters"></a>
<a name="index-varying_002dwidth-characters"></a>
<p><code>diff</code>, <code>diff3</code> and <code>sdiff</code> treat each line of
input as a string of unibyte characters. This can mishandle multibyte
characters in some cases. For example, when asked to ignore spaces,
<code>diff</code> does not properly ignore a multibyte space character.
</p>
<p>Also, <code>diff</code> currently assumes that each byte is one column
wide, and this assumption is incorrect in some locales, e.g., locales
that use UTF-8 encoding. This causes problems with the <samp>-y</samp> or
<samp>--side-by-side</samp> option of <code>diff</code>.
</p>
<p>These problems need to be fixed without unduly affecting the
performance of the utilities in unibyte environments.
</p>
<p>The IBM GNU/Linux Technology Center Internationalization Team has
proposed
<a href="http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz">patches
to support internationalized <code>diff</code></a>.
Unfortunately, these patches are incomplete and are to an older
version of <code>diff</code>, so more work needs to be done in this area.
</p>
<hr>
<a name="Changing-Structure"></a>
<div class="header">
<p>
Next: <a href="#Special-Files" accesskey="n" rel="next">Special Files</a>, Previous: <a href="#Internationalization" accesskey="p" rel="prev">Internationalization</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Handling-Changes-to-the-Directory-Structure"></a>
<h4 class="subsection">18.1.2 Handling Changes to the Directory Structure</h4>
<a name="index-directory-structure-changes"></a>
<p><code>diff</code> and <code>patch</code> do not handle some changes to directory
structure. For example, suppose one directory tree contains a directory
named ‘<samp>D</samp>’ with some subsidiary files, and another contains a file
with the same name ‘<samp>D</samp>’. ‘<samp>diff -r</samp>’ does not output enough
information for <code>patch</code> to transform the directory subtree into
the file.
</p>
<p>There should be a way to specify that a file has been removed without
having to include its entire contents in the patch file. There should
also be a way to tell <code>patch</code> that a file was renamed, even if
there is no way for <code>diff</code> to generate such information.
There should be a way to tell <code>patch</code> that a file’s time stamp
has changed, even if its contents have not changed.
</p>
<p>These problems can be fixed by extending the <code>diff</code> output format
to represent changes in directory structure, and extending <code>patch</code>
to understand these extensions.
</p>
<hr>
<a name="Special-Files"></a>
<div class="header">
<p>
Next: <a href="#Unusual-File-Names" accesskey="n" rel="next">Unusual File Names</a>, Previous: <a href="#Changing-Structure" accesskey="p" rel="prev">Changing Structure</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Files-that-are-Neither-Directories-Nor-Regular-Files"></a>
<h4 class="subsection">18.1.3 Files that are Neither Directories Nor Regular Files</h4>
<a name="index-special-files"></a>
<p>Some files are neither directories nor regular files: they are unusual
files like symbolic links, device special files, named pipes, and
sockets. Currently, <code>diff</code> treats symbolic links as if they
were the pointed-to files, except that a recursive <code>diff</code>
reports an error if it detects infinite loops of symbolic links (e.g.,
symbolic links to <samp>..</samp>). <code>diff</code> treats other special
files like regular files if they are specified at the top level, but
simply reports their presence when comparing directories. This means
that <code>patch</code> cannot represent changes to such files. For
example, if you change which file a symbolic link points to,
<code>diff</code> outputs the difference between the two files, instead
of the change to the symbolic link.
</p>
<p><code>diff</code> should optionally report changes to special files specially,
and <code>patch</code> should be extended to understand these extensions.
</p>
<hr>
<a name="Unusual-File-Names"></a>
<div class="header">
<p>
Next: <a href="#Time-Stamp-Order" accesskey="n" rel="next">Time Stamp Order</a>, Previous: <a href="#Special-Files" accesskey="p" rel="prev">Special Files</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="File-Names-that-Contain-Unusual-Characters"></a>
<h4 class="subsection">18.1.4 File Names that Contain Unusual Characters</h4>
<a name="index-file-names-with-unusual-characters"></a>
<p>When a file name contains an unusual character like a newline or
white space, ‘<samp>diff -r</samp>’ generates a patch that <code>patch</code> cannot
parse. The problem is with format of <code>diff</code> output, not just with
<code>patch</code>, because with odd enough file names one can cause
<code>diff</code> to generate a patch that is syntactically correct but
patches the wrong files. The format of <code>diff</code> output should be
extended to handle all possible file names.
</p>
<hr>
<a name="Time-Stamp-Order"></a>
<div class="header">
<p>
Next: <a href="#Ignoring-Changes" accesskey="n" rel="next">Ignoring Changes</a>, Previous: <a href="#Unusual-File-Names" accesskey="p" rel="prev">Unusual File Names</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Outputting-Diffs-in-Time-Stamp-Order"></a>
<h4 class="subsection">18.1.5 Outputting Diffs in Time Stamp Order</h4>
<p>Applying <code>patch</code> to a multiple-file diff can result in files
whose time stamps are out of order. <acronym>GNU</acronym> <code>patch</code> has
options to restore the time stamps of the updated files
(see <a href="Merging-with-patch.html#Patching-Time-Stamps">Patching Time Stamps</a>), but sometimes it is useful to generate
a patch that works even if the recipient does not have <acronym>GNU</acronym> patch,
or does not use these options. One way to do this would be to
implement a <code>diff</code> option to output diffs in time stamp order.
</p>
<hr>
<a name="Ignoring-Changes"></a>
<div class="header">
<p>
Next: <a href="#Speedups" accesskey="n" rel="next">Speedups</a>, Previous: <a href="#Time-Stamp-Order" accesskey="p" rel="prev">Time Stamp Order</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Ignoring-Certain-Changes"></a>
<h4 class="subsection">18.1.6 Ignoring Certain Changes</h4>
<p>It would be nice to have a feature for specifying two strings, one in
<var>from-file</var> and one in <var>to-file</var>, which should be considered to
match. Thus, if the two strings are ‘<samp>foo</samp>’ and ‘<samp>bar</samp>’, then if
two lines differ only in that ‘<samp>foo</samp>’ in file 1 corresponds to
‘<samp>bar</samp>’ in file 2, the lines are treated as identical.
</p>
<p>It is not clear how general this feature can or should be, or
what syntax should be used for it.
</p>
<p>A partial substitute is to filter one or both files before comparing,
e.g.:
</p>
<div class="example">
<pre class="example">sed 's/foo/bar/g' file1 | diff - file2
</pre></div>
<p>However, this outputs the filtered text, not the original.
</p>
<hr>
<a name="Speedups"></a>
<div class="header">
<p>
Previous: <a href="#Ignoring-Changes" accesskey="p" rel="prev">Ignoring Changes</a>, Up: <a href="#Shortcomings" accesskey="u" rel="up">Shortcomings</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Improving-Performance"></a>
<h4 class="subsection">18.1.7 Improving Performance</h4>
<p>When comparing two large directory structures, one of which was
originally copied from the other with time stamps preserved (e.g.,
with ‘<samp>cp -pR</samp>’), it would greatly improve performance if an option
told <code>diff</code> to assume that two files with the same size and
time stamps have the same content. See <a href="diff-Performance.html#diff-Performance">diff Performance</a>.
</p>
<hr>
<a name="Bugs"></a>
<div class="header">
<p>
Previous: <a href="#Shortcomings" accesskey="p" rel="prev">Shortcomings</a>, Up: <a href="#Projects" accesskey="u" rel="up">Projects</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Reporting-Bugs"></a>
<h3 class="section">18.2 Reporting Bugs</h3>
<a name="index-bug-reports"></a>
<a name="index-reporting-bugs"></a>
<p>If you think you have found a bug in <acronym>GNU</acronym> <code>cmp</code>,
<code>diff</code>, <code>diff3</code>, or <code>sdiff</code>, please report it
by electronic mail to the
<a href="http://mail.gnu.org/mailman/listinfo/bug-diffutils">GNU utilities
bug report mailing list</a> <a href="mailto:bug-diffutils@gnu.org">bug-diffutils@gnu.org</a>. Please send
bug reports for <acronym>GNU</acronym> <code>patch</code> to
<a href="mailto:bug-patch@gnu.org">bug-patch@gnu.org</a>. Send as precise a description of the
problem as you can, including the output of the <samp>--version</samp>
option and sample input files that produce the bug, if applicable. If
you have a nontrivial fix for the bug, please send it as well. If you
have a patch, please send it too. It may simplify the maintainer’s
job if the patch is relative to a recent test release, which you can
find in the directory <a href="ftp://alpha.gnu.org/gnu/diffutils/">ftp://alpha.gnu.org/gnu/diffutils/</a>.
</p>
<hr>
<div class="header">
<p>
Previous: <a href="#Shortcomings" accesskey="p" rel="prev">Shortcomings</a>, Up: <a href="#Projects" accesskey="u" rel="up">Projects</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|