An important challenge in software reengineering is to encapsulate
collections of related data that, due to the absence of appropriate
constructs for encapsulation in legacy programming languages, may be
distributed throughout the code. The encapsulation of such collections
is a necessary step for reengineering a legacy system into an object
oriented design or implementation. Encapsulating a set of related
symbolic constants into an enumeration type is an instance of this
problem. We present a classification of how enumeration types are
modeled using symbolic constants in real-world programs, a set of
heuristics to identify candidate enumeration types, and an
experimental evaluation of the heuristics.
Full paper