DoD Q&A on Use of Open Source Software
5 August 2002 (Version 1.6)
Disclaimer
. This Q&A attempts to clarify the terms and conditions generally used to describe software where the source code is available for the public’s use. There are two categories of such software: open source and public domain. DOD believes the following definitions and explanations are based upon current, commonly held understandings by the software community of the meaning of these terms and conditions. However, these meanings are not universally accepted and many of them are still evolving. Also, this Q&A is not intended to be technically precise, rather it is intended to facilitate the intelligent discussion of the various issues involving open source software.Q1: What is "open source software"?
A1: Before defining this phrase, we need to first discuss source code and binary code. Source code is ordinary text that looks like a precisely structured and mathematical version of English. Programmers use source code to create, repair, and extend software, since it is easier to read and understand than lower level computer languages. Binary code is source code that has been translated into a machine-readable form consisting of only 2 digits, either a 0 or a 1. The software that people acquire (e.g., desktop Personal Computer software, etc.) is almost always binary code. If there is a problem with the software, it is difficult for even knowledgeable users to change or repair binary code. Instead, users must usually ask the owner of the software, who has access to the source code, to fix or extend it for them. In a nutshell, open source software deals with whether and how source code should be made available to users of binary software. This can be a delicate issue, since source code usually represents a substantial intellectual and financial investment by the software owner.
There are two accepted definitions of open source software. The definition most used by the general public is software where the source code is publicly available and others may modify and redistribute it. This definition encompasses all software where the source code is accessible. In contrast, many members of the software community use this phrase more restrictively. Under this second definition, "open source software" refers to an approach that aggressively seeks to make source code available to future, as well as current users of software. This is accomplished by software licenses that require users who redistribute open source software, to provide all the source code, even if the user modified the original software. Also, open source licenses go well beyond simply ensuring distribution of both the original and modified source code. Specifically, open source licenses give all software users the rights to:
The phrase "open source" was first coined by a group of Internet and Unix community leaders to describe what had previously been called "free software." These leaders felt that the earlier term "free" impeded acceptance of open source software by creating inaccurate impressions that such software could not be sold, or that it was of inferior quality. In this document, either of the above two definitions for "open source software" may be used. The context will determine whether the first or second definition is appropriate.
Open source software is further complicated by the fact that there are many different licenses that meet the strict definition of the phrase. Each of these licenses has its own set of terms and conditions, and understanding how they differ can be a daunting task. The single most important open source license is the GNU General Public License (GPL), which is used in more open source software than any other open source license. Other important open source licenses include the GNU Lesser General Public License (LGPL), the X License, the Berkeley System Distribution (BSD) License, the Apache License, the Artistic License, the Netscape Public License, and the Mozilla Public License.
Q2: What is proprietary or closed software?
A2: Proprietary or closed software, in contrast with open source, is licensed to the user by the owner, and grants very few rights beyond mere use. Occasionally, other rights are granted, but further distribution is almost always forbidden. Proprietary software source code is usually not publicly available. Rather than being developed by an informal community, proprietary software is generally privately developed and intended to produce income for the company or individuals that own it. Like open source software, proprietary software also comes in a variety of license formats. When proprietary software is protected by a patent, the source code is published, but the software owner may exclude others from using it without a license. Proprietary software is sometimes referred to as closed software to contrast it with open source software.
Q3: What is free software?
A3: Free software is a specific type of open source software. It is the "ancestral" form of open source software, since free software licenses were the very first open source licenses. It should be noted that the word "free" in "free software" does not refer to the cost of the software (it can be sold at market prices), but rather to a set of four freedoms granted to all users of free software. According to the Free Software Foundation (FSF), which defines and controls the use of the phrase "free software" (see
http://www.gnu.org/philosophy/free-sw.html), free software requires that free software users be granted freedoms to:In the case of the forth freedom, users of free software always have the right to refrain from releasing changes or improvements to the public. However, if they release the modified software to the public, they must also make the corresponding source code public. These freedoms are captured in the licensing rules of the GNU General Public License (GPL), which is described in Q4. While open source and free software are very closely linked in terms of underlying principles, the FSF differentiates free software from open source software (
http://www.gnu.org/philosophy/free-software-for-freedom.html). The FSF does not agree with being "lumped in" under the more recent and somewhat broader open source approach to ensuring access to source code.Q4: What is the GNU General Public License (GPL), and how does it relate to open source?
A4: The GNU GPL is a broadly adopted licensing approach encouraging the use of open source software and the expansion of its capability via the user community. In addition to open source rights, software under the GPL imposes an obligation upon software developers. If a software developer fixes or extends source code provided under a GPL license (GPL-ed code) and distributes the modified software program, the developer must also include the source code for all fixes or extensions. The open source community refers to this obligation as a "copyleft" requirement. According to GNU, proprietary software developers use copyright to restrict a user’s right to use software, whereas GNU uses copyright to guarantee users the freedom to redistribute and change GNU software. That is why it reverses the term, changing copyright into copyleft.
Q5: What is public domain software and how does it differ from open source?
A5: Public domain software is software whose source code is available to the public, and which lacks any license or copyright restrictions on how it can be used. Public domain software is also free of cost, and anyone may copy and distribute it. Public domain software is generally considered by the public to be open source software because the availability of its source code allows it to be modified, repaired, or extended by any user.
The most important difference between public domain software and the more restrictive open source software is that since public domain software lacks license clauses that guarantee the rights of later users to access the source code, public domain software can be "captured" by proprietary software. In this sense, capture means incorporated into and hidden from the public. While capture does not affect the public domain status of the original source code, it does have two important practical consequences. The first consequence is that public domain software is more likely to be used in proprietary software than the more restrictive open source software. The second consequence is that the original public domain software is likely to fall into disuse over time, since people tend to migrate to the improved proprietary software. Both these consequences are much less likely for open source software than for public domain software, since open source licenses generally try to prevent capture of the original source code.
There is an important historical link between public domain software and the founders of the Free Software Foundation. These individuals were frustrated that public domain software often became inactive over time or was eclipsed by proprietary software derived from it. This strategy achieved its goal of keeping publicly held software active and growing over time, since it led directly to the large body of open source software that is now available over the Internet. However, a side effect of this strategy was that it made it much more difficult for companies to profit by capturing and building on open source code in the same way they had for public domain software. Currently, whether blocking of commercial capture of open source software is good or bad is a hotly debated issue within the software industry.
Q6: What are "shareware" and "freeware," and how do they differ from open source?
A6: Shareware is software that can be redistributed without charge (shared) by users. However, the owner of the software usually expects to be compensated by regular users, and imposes various restrictions if a fee is not paid. These restrictions may be contractual (e.g., requesting payment if the shareware is used for more than 30 days), or they may be built into the shareware itself (e.g., it may stop functioning after 30 days). Freeware is very similar to shareware, but places no restrictions on how or how long it can be used. In both cases, the software is normally distributed only as binary code, rather than as source code. This is in sharp contrast to public domain and open source software, both of which allow source code to be available to users. Since shareware and freeware are normally distributed only in binary form, they usually display proprietary licenses whenever they are executed. Shareware and freeware products occasionally make their source code available to users, but when they do they always maintain ownership of the source code.
Q7: What are the differences between the license terms and conditions of the various types of open source licenses?
A7: As of mid 2002, there were nearly three dozen software licenses that qualified as "open source" (see
http://www.opensource.org/licenses/bsd-license.php). In practice, however, only a small number of these licenses are widely used. Furthermore, the less frequently used licenses are often similar to the more commonly used licenses. Table 1 summarizes a number of differences between four of the most important open source licenses. Additionally, the table includes the related concept of public domain software, and an example of a proprietary software license that is notable for precluding the use of open source software.Table 1. A Comparison of Open Source and Related Licenses
|
License: Property |
GPL |
LGPL |
BSD |
Apache |
Public Domain |
Microsoft MIT EULA 4 | ||
|
(a) Can be stored on disk with other license types |
X |
X |
X |
X |
X |
(Forbids OSS) | ||
|
(b) Can be executed in parallel with other license types |
X |
X |
X |
X |
X |
(Forbids OSS) | ||
|
(c) Can be executed on top of other license types |
X |
X |
X |
X |
X |
(Forbids OSS) | ||
|
(d) Can be executed underneath other license types |
X1 |
X |
X |
X |
X |
(Forbids OSS) | ||
|
(e) Source can be integrated with other license types |
X |
X |
X |
X |
(Forbids OSS) | |||
|
(f) User decides if and when to publish derived code |
X2 |
X |
X |
X |
X |
X | ||
|
(g) Software can be sold for a profit |
X |
X |
X |
X |
X |
X | ||
|
(h) Binary code can be replicated by users as desired |
X |
X |
X |
X |
X |
|||
|
(i) Binary code can be redistributed as desired |
X3 |
X |
X |
X |
X |
|||
|
(j) Binary code can be used as desired by users |
X |
X |
X |
X |
X |
|||
|
(k) New recipients always receive source code |
X |
|||||||
|
(l) New recipients receive full source modification rights |
X |
|||||||
|
(m) New recipients receive full redistribution rights |
X |
|||||||
|
(n) Binary code can be released without source code |
X |
X |
X |
X | ||||
|
(o) Derived code can have a different type of license |
X |
|||||||
|
(p) Original source can be incorporated into closed source products |
X |
|||||||
|
1 Provided that both programs are fully and independently usable in other unrelated contexts2 Provided that the binary code has not been previously released to the public3 Provided that source code is always redistributed along with the binary code4 Specifically bans use of: GPL, LGPL, Artistic, Perl, Mozilla, Netscape, Sun Community, and Sun Industry Standards | ||||||||
|
License Acronyms: | ||||||||
|
GPL – |
GNU General Public License |
(Microsoft) MIT – |
Mobile Internet Toolkit | |||||
|
LGPL – |
GNU Lesser General Public License |
EULA – |
End-User License Agreement | |||||
|
BSD – |
Berkeley Software Distribution |
OSS – |
Open Source Software (including Free) | |||||
|
MPL – |
Mozilla Public License |
|||||||
Properties (a) through (e) in the table examine the ability of a license to co-exist with other types of software, e.g., the ability of open source licenses to co-exist with proprietary software. In this category, the most exclusive license is the Microsoft MIT EULA license (
http://msdn.microsoft.com/downloads/eula_mit.htm), which prohibits placing open source software on the same platform as the EULA software. No other open source or proprietary licenses come even remotely close to this level of exclusivity. The GPL license takes a distant second place for exclusivity, since it forbids design-time incorporation of GPL source code into non-GPL source code. However, unlike the Microsoft MIT EULA, the GPL places no constraints on other software running on the same system. The GPL also allows non-GPL software to use GPL software as long as the two programs are not inextricably linked to each other (that is, they can both be used independently in other contexts). The GNU Lesser GPL (LGPL) is even more accommodating, since it allows software to be directly incorporated into non-free software. The BSD and Apache license are still more accommodating by allowing distribution in binary form only. Finally, and not surprisingly, the most permissive category of all is public domain software, which allows essentially any type of use of the software.Properties (k) through (m) in the table point out the flip side of the somewhat restrictive nature of the GPL license. Unlike many of the other open source licenses, the GPL is the only one that ensures that later-generation users of the software will retain exactly the same rights to use, change, and redistribute GPL software as the first user.
Q8: How does DoD use open source software?
A8: A two-week email-based survey done by The MITRE Corporation in March 2002 identified 110 open source software applications used in the DoD, and 249 specific examples of how they are being used. There were four main categories of uses:
Analysis of the results indicates that for certain Internet and development related open source software applications, the number of uses might be several orders of magnitude higher than identified in the two-week survey.
The largest category of open source use in the DoD is infrastructure support. Sixty-five applications were used to support the day-to-day operations of DoD facilities. Examples include the Linux and OpenBSD operating systems, the Apache web-support application, and the Sendmail email-support application. Linux is used in a small number of field-deployed systems, and nearly two dozen examples of using Apache to provide web services were found. Due to a strong historical link between open source and emergence of the early Internet as a collaborative sharing tool, a more exhaustive study of DoD use of Internet-related tools such as Sendmail and Apache would likely identify orders of magnitude more examples than were found in this short study. Finally, since most infrastructure uses of open source software do not involve the modification of source code, the fact that infrastructure is the largest category means that most DoD uses of open source are not impacted by the terms of the GPL license.
Development applications are the second largest category of open source software applications identified by the survey, with a total of 53 tools (some of which are also used for infrastructure applications) identified. DoD developers and subcontractors use these applications to create software in such computer languages as Ada (GNAT), C, C++, and Java (GCC), and Perl. These applications are also used to write and edit software (EMACS), keep software source code organized (CVS), look for errors (GDB), provide libraries of software components (MySQL, C++ Boost), and for overall computer support (Linux, Cygwin). Because development tools have had an even longer relationship to open source than Internet applications, the small number of examples identified in the study very likely represents only a tiny fraction of the total number of examples that would be found in a more extensive survey.
A total of 40 different tools (again with some overlap with other categories) are used in various ways to support DoD and homeland security in security-related DOD uses. OpenBSD and Linux are used in applications such as firewalls and network auditing. Tools such as SARA, Snort, SNARE, and ACID protect networks by preemptively finding security vulnerabilities, and by watching for attempts to break into networks.
Finally, 20 of the tools identified in the survey are used to support some aspect of research or advanced development. As these tend to be specialized uses, the scaling factor for a larger survey would not be nearly as large as for infrastructure and development. Applications such as Linux (with the "Beowulf" clustering additions) and Condor are used to make suites of old, inexpensive computers into effective yet very-low-cost supercomputers. This makes it possible to dedicate more financial resources to the actual research. Other research infrastructure applications (e.g., TAO, CVW, JADE, and Jikes) support the use of advanced software architectures. Libraries of mathematical routines (e.g., Colt, Maxima, and Octave) make analysis easier, and also provide a convenient way to share mature, and high quality algorithms. Specialized analysis and simulation routines (e.g., EADSIM, SCA, VOCAL, Weka, and Xpatch) provide tools specific to particular DoD research domains. Finally, visualization libraries and tools (e.g., VisAD, VTK, and XGobi) help researchers identify patterns in data.
Q9: What are the concerns about DoD’s use of open source software?
A9: The three most widespread concerns about DoD use of open source software are that:
(1) Revealing source code could expose vulnerabilities,
(2) Hostile groups or individuals might introduce Trojan code into open source applications, and
(3) Use of GNU GPL software could "capture" proprietary software.
All three of these issues have pros and cons, which are discussed below.
Issue (1) – Exposure of vulnerabilities
In general, exposing source code increases the risk of successful attack against software by making it easier for hostile human analysts to read and understand the software. While this is by no means the only way in which software can be attacked, it is an important one because of the unique ability of human analysts to develop unexpected insights on how to subvert an otherwise reliable piece of software. Thus if the risk of making source code available to hostile analysts were the only factor in using open source applications in high-risk systems, the decision would invariably be to use closed software.
However, in the long term the best defense against security flaws is not to try to hide them, but rather to seek them out aggressively by using friendly analysts with resources at least as good as those of potential hostile analysts. The more aggressive and better-supported this "friendly attack" strategy is, the more likely the software will be in the long term to resist a concerted hostile attack.
Consequently, the issue in comparing open source software to closed software turns out not only to be which one is easier for a hostile analyst to read, but also, which form of software better can support a more thorough and aggressive "friendly attack" strategy.
Since the source for closed software is seen on a regular basis by a limited set of people, open source software on average tends to provide a superior framework for providing thorough friendly-attack analyses of the software. This is because active open source systems typically possess larger groups of people who are both familiar with the source code and interested in exercising its capabilities than do similar closed source systems. If properly motivated, this extended pool of resources thus is likely to provide a broader and more insightful analysis that can more than compensate for the initial penalty of making the source code more accessible.
However, there is no single answer to this question, since in the end it will depend on factors such as the quality of the software and how the software is supported. Low-quality software, for example, should not in general be made open, since it would expose too many flaws. High-quality software is much more likely to benefit from the kinds of friendly attack analyses that can be staged in an open source process. Similarly, a software application that is of little interest to any broader community is a poor candidate for an open source approach, since it is unlikely to garner enough attention to provide the necessary levels of friendly analysis. Finally, a potential for friendly analysis of either open or closed source software should never be confused with the actual analysis. The analysis process generally requires some degree of structure and incentives, and is unlikely to just "happen" by itself.
OpenBSD (
http://www.openbsd.org/security.html#goals) is an interesting example of an open source project that has chosen to focus almost entirely on removing security vulnerabilities through unusually thorough "friendly attack" analysis of its own code base. The result is code base that appears to be more likely to be a source of frustration to hostile groups than a source of inspiration (http://www.openbsd.org/security.html#process). As a result, and despite the fact that its source code is readily available, the OpenBSD operating system is widely recognized not as a security risk, but rather as one of the better choices around for high-security operating systems (http://geodsoft.com/howto/harden/hardintro.htm).Issue (2) – Introduction of Trojan software
Trojan software is hostile software that has been covertly placed into ordinary "friendly" software applications. The second issue that is often a concern in DoD use of open source software is that the open process by which it is developed and supported could be used to insert Trojan programs into open source applications.
As with exposing code vulnerabilities, the issue of the possible introduction of Trojan code requires an examination of relative risk issues. Closed-source systems suffer from the danger of Trojan software being introduced in the form of binary executables. While presumably harder to insert into commercial software processes, a closed source binary Trojan can in principle be large, complex, and very difficult to detect. The surprisingly common practice of placing "Easter Eggs" (innocuous "surprise" software) into commercial software demonstrates how serious a problem this can be even in mature commercial software processes.
For an open source project, Trojan code could presumably be introduced either as source code contributed to the development process, or through the capture and corruption of source code and/or executable download files at an open source web site. In the first case of using contributed source code to introduce a Trojan, the chances of successful insertion of a large Trojan are close to nil, since the necessary large additions of source code would almost certainly be recognized quickly as being suspect. In the second and more likely scenario of corruption of download sites, authoritative sources are available for most or all of the open source software that DoD currently uses. These authoritative sites often include integrity checking measures that significantly reduce the risk of downloading Trojan software. As a whole, the risks of Trojan software in open source appear to be no greater than for proprietary software, and may even be significantly less due to the reduced opportunity for introducing large binary Trojans into an open source process.
Issue (3) – Capture of software by GPL Licenses
One of the more notorious features of the GNU General Public License, or GPL, is its insistence that when GPL source code is directly incorporated into new software (e.g., as part of the source code of a new module), then the entire new module must be given a GPL license. The true level of risk from this feature of the GPL may be overstated, since the conditions under which it applies are limited to certain steps in the design (versus execution) of new software. For example, the GPL "capture" issue is irrelevant both for software that simply resides on the same system, and for software that uses services provided by executable forms of GPL software (e.g., a Linux kernel).
For DoD software, the issue of GPL "capture" of proprietary software is significant only for the two categories of DoD open source use that deal extensively with source code: software development and research support. Where capture of software is a concern, several methods exist for separation of the proprietary software so that GPL provisions do not propagate into the proprietary software. The most important such GPL propagation boundary is the pipe/socket boundary, by which users can combine GPL and non-GPL modules via Unix-style data-flow connection called pipes and sockets. Another is the aggregation boundary, by which GPL and non-GPL software may simply exist on the same system without any direct links and have no effect on each other. Finally, developers may choose to use the Lesser GPL (LGPL) to create software that can be freely used by other users. The advantage of LGPL in this case is that it allows others to use such software without any risk of invoking GPL constraints. Thus, for example, a DoD group that would like to make security-related software freely available to for-profit commercial companies could provide that software under an LGPL license that would allow them to create proprietary software packages that can nonetheless freely incorporate the LGPL security modules.
With reasonable care, and in particular for the most important category of infrastructure (operational) use, the GPL propagation boundaries and LGPL licensing option can be used to incorporate GNU software into a system without disrupting other licenses.
In general, the GPL and other similar licenses become an issue in operational systems only if unusually restrictive licenses such as the Microsoft MIT EULA are introduced into the same system (see Table 1). For example, if a user wants to use internally-developed GPL software, that software cannot be run on the same platform as a Microsoft MIT EULA component because this particular EULA bans simultaneous distribution with open source software. Such licenses would also bar GPL-based software add-on components from being considered for any follow-on procurements. Compared to this issue, the GPL capture issue is relatively minor, since GPL is far more tolerant of other licenses, and because it provides a number of ways to minimize its effect on other software modules during software development.
Q10: What are the concerns with respect to use of GPL-ed software within the DoD?
A10: DoD relies on the commercial software industry to develop new and innovative products to meet its IT needs and to serve the goal of information superiority. DoD facilitates such innovation through direct funding of academic and industry researchers and by transferring technology developed with DoD funding to the private sector for commercialization.
Intellectual property rights play a key role in the private sector and provide the economic incentive for commercial firms to invest in innovative products. This is recognized throughout the DoD acquisition community. Indeed, a recent report prepared for the Undersecretary for Acquisitions, Technology and Logistics notes:
"Innovation requires substantial financial investment and effort over a long period of time and uses scarce resources. To make this investment worthwhile, industry relies on IP rights as the primary means to recoup these nonrecurring costs and seek profit. A developer’s IP rights ensure that the developer has the exclusive right to exploit his or her innovation commercially and financially, with the understanding that the technology must be embodied in products or services that will ensure a return on investment. The end result of protecting IP rights is that technology is advanced and disseminated widely, and innovators are rewarded for their efforts."
(Source: Intellectual Property: Navigating Through Commercial Waters. In
http://www.acq.osd.mil/ar/doc/intelprop.pdf, p.19, Oct. 15, 2001.)Because of the range of revenue models and business practices adopted by the marketplace, many commercial platform and software providers have chosen to use GPL-ed technology notwithstanding copyleft licensing provisions. Some commercial software developers, however, see the broadening use of GPL-ed software as a challenge to their legitimate IP rights. The DOD must endeavor to protect the IP rights of commercial developers and not endanger those rights through inadvertent or intentional disclosure of their software code.
Q11: What licenses do DoD open source software applications use?
A11: Table 2 shows the types of open source license found in the 110 DoD tools, listed from most frequently used to least frequently used.
Table 2. Open Source License Use in the DoD
|
License |
Number of DoD Applications |
|
GPL |
58 |
|
BSD |
7 |
|
Apache |
6 |
|
Closed (from open) |
4 |
|
Community (U.S. government only) |
4 |
|
LGPL |
3 |
|
ACE/TAO License |
2 |
|
Proprietary with source access |
2 |
|
SATAN License |
2 |
|
Aladdin Ghostscript Free Public License |
1 |
|
Artistic |
1 |
|
C++ Boost License |
1 |
|
Colt Distribution Licenses |
1 |
|
Freeware |
1 |
|
Gnuplot License |
1 |
|
Government Public Domain |
1 |
|
IBM Public License |
1 |
|
ISC License |
1 |
|
MIT/X11 License |
1 |
|
MITRE |
1 |
|
Mozilla |
1 |
|
OpenMAP |
1 |
|
OpenSSL |
1 |
|
Publicly available government software |
1 |
|
Qmail License |
1 |
|
RTLinux Open Patent License Version 2 |
1 |
|
Sendmail License |
1 |
|
VTK License |
1 |
|
WU-FTPD License |
1 |
|
Zlib License |
1 |
|
Zope License |
1 |
|
Total: |
110 |
The most notable feature of this table is the overwhelming dominance of the GNU General Public License (GPL), which at 58 instances is used by 53% of the total number of DoD open source applications. The BSD and Apache licenses are a distant second (7) and third (6). Several of the remaining licenses, such as the "RTLinux Open Patent License Version 2," are either fully compatible with the GPL or closely modeled after it.