/usr/share/doc/enca/TODO is in enca 1.18-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85  | #============================================================================
# Enca v1.18 (2016-01-07)  guess and convert encoding of text files
# Copyright (C) 2000-2003 David Necas (Yeti) <yeti@physics.muni.cz>
# Copyright (C) 2009-2016 Michal Cihar <michal@cihar.com>
#============================================================================
TO THE NEXT RELEASE:
(this list must be empty at the time of release)
IN FUTURE:
(should be done, but maybe not right now)
* LCUC check for cyrillic charsets.
* Backups -- like cp, mv, etc.  This will be hard to get right with all the
  silly converters.
* More tests
* Structured documentation (the manual page is ugly)
  - keep a reasonably brief manual page
  - put all the boring doc stuff somewhere else, there are possibilities:
    info: searchable, has links, partly portable, has console viewers
    HTML: poorly searchable, has links, most portable, has console viewers
    TeX (ps): not searchable, no links, portable, most pleasant to read,
          no console viewers
    => use SGML (or info itself?) and generate the others
MAYBE SOMEDAY:
(when I will have mood for it, items are freely moved here and removed again)
* Detect all-caps texts OK.
  After several experiments it seems we have to
  - use pair occurences, at least, with specificaly computed
    difference-maximising weights
  - guess in two steps
  - first with uncapitalization and pair weights, and check whether the
    sample looks like natural text (garbageness test, but better)
  - if the first approach fails, do it as we do it now
* design better levels of verbosity/warnings (or: remove the --verbose option,
  keep important messages and remove all others?)
  0: only messages followed by exit(EXIT_FAILURE) (or abort()) are printed
     plus `cannot convert...'
  1: all nonfatal errors/warnings
  2: what converters are tried, what language gets detected (do not duplicate
     --details)
  >2: debug
* _real_ paranoiac behaviour assuring that nothing gets lost and that
  conversion output is either correctly converted text or untouched original
  (requires major redesign of all the conversion stuff)
NEVER:
(you can do anything GNU GPL v2 allows, but I'll restrain)
* features that nobody needs (mm, well, ... ok, let it be)
* duplicate other tools functionality more than necessary, use them instead
* dependency on anything that is not ISO C and/or POSIX (moreover do not use
  braindead features of both); important functionallity must be present
  everywhere nevertheless, enca can be smaller, faster or cleverer on some
  (GNU) systems
* localization; please correct my english instead ;->
* converter calling generalization (would require inlcuding the whole wordexp
  thing in enca, and: launching external converter is Bad Thing(TM) anyway)
* data in run-time files (needs parser (could live with) and disallows hooks
  (can't live without))
* loadable module support (it's not very portable)
-------------
KNOWN ISO C CONFLICTS:
(perhaps to be solved someday)
All constants and typedefs.  They start with ENCA_ and Enca, but:
  Names beginning with a capital `E' followed a digit or uppercase
  letter may be used for additional error code names.               [errno.h]
And additionally inside libenca (i.e. not so serious):
* libenca.h: #define EPSILON                                        [errno.h]
* filters.c: isvbox[]                                               [ctype.h]
* guess.c: #define isbinary                                         [ctype.h]
* guess.c: #define istext                                           [ctype.h]
* multibyte.c: is_valid_utf7()                                      [ctype.h]
* multibyte.c: is_valid_utf8()                                      [ctype.h]
Some probably can't conflict.
 |