http://www.datamation.com/open-source/
Accessibility
The Business Value of Cisco UCS Integrated Infrastructure Solutions for Running SAP Workloads
Launched in 2013, this site aims to provide information on making other websites accessible to people with a variety of impairments, particularly those who are blind. You can read the content at the link above; if you‘d like to contribute, visit the project‘s GitHub page. Operating System: OS Independent
Accounting
This web-based accounting package was created with small and medium-sized businesses (SMBs) in mind. It includes CRM, work order and invoice capabilities as well as standard accounting features. Check out the online demo to see it in action. Operating System: OS Independent
Another web-based accounting option for SMBs, FrontAccounting boasts inventory tracking and manufacturing management abilities. It‘s been downloaded more than 200,000 times. Operating System: OS Independent
4. GnuCash
GnuCash combines personal finance software with small business accounting software, which some small business owners find helpful. It can track investments, create graphs, import financial data, set up scheduled transactions and perform standard double-entry accounting. Operating System: Windows, Linux, OS X
5. LedgerSMB
LedgerSMB combines ERP and accounting capabilities in a single package, and it also includes a flexible development framework for extending its features. It has been downloaded more than 86,000 times since 2006. Operating System: Windows, Linux, OS X
6. TurboCASH
Used by more than 80,000 businesses, TurboCASH is a flexible accounting package that compares favorably with QuickBooks and Sage. It was created in the UK but also has a chart of accounts and currency features designed for U.S. businesses. Operating System: Windows
App Collection
7. OpenDisc
The OpenDisc project collects many of the most popular open source applications for Windows into one download. You can also get the project on a CD for a donation of $10. Operating System: Windows
Anti-Spam/Email Filtering
8. ASSP
ASSP claims to be "the absolute best SPAM fighting weapon that the world has ever known!" It offers easy, browser-based setup and works with most mail servers. Operating System: OS Independent.
9. MailScanner
Downloaded more than 1.3 million times, MailScanner is based on SpamAssassin and works with anti-virus software like ClamAV to protect mail servers at companies or ISPs. Support is available through third-party companies. Operating System: OS Independent.
10. Scrollout F1
This full-featured mail security solution incorporates anti-spam, anti-virus and other capabilities with an interface that the project creators say is as easy to use as a car radio. Paid support is available. Operating System: Windows, Linux.
11. SpamAssassin
This Apache project claims to be the "#1 Enterprise Open-Source Spam Filter." It uses a wide variety of methods to identify and block spam, and it works with nearly all mail servers. Operating System: primarily Linux and OS X, although Windows versions are available.
12. SpamBayes
SpamBayes uses statistical algorithms to calculate the probability that an incoming message is spam, and it adapts over time as spammers change their methods. It‘s available as a plug-in for many popular email services and clients, including Outlook, Thunderbird and others. Operating System: OS Independent.
Anti-Virus/Anti-Malware
13. ClamAV
One of the most popular open source security applications, ClamAV has been incorporated into many different products and has been called "the de facto standard for mail gateway scanning." The core program works on UNIX-based systems, but the website also offers information onImmunet, a ClamAV-based Windows solution that is available in both free and paid versions. Operating System: Linux, but front-ends and additional versions are available for other OSes.
14. ClamTk
This variation on ClamAV adds an easy-to-use GUI to the popular anti-virus engine. Now ten years old, this is a mature project that is included in many Linux distributions. Operating System: Linux.
This Windows-based version of ClamAV boasts more than 600,000 users. It offers a scanning scheduler, integration with Windows Explorer and Outlook, automatic downloads of the updated malware database and support for Windows 7 and 8. Operating System: Windows.
Artificial Intelligence
16. Caffe
The brainchild of a UC Berkeley PhD candidate, Caffe is a deep learning framework based on expressive architecture and extensible code. It‘s claim to fame is its speed, which makes it popular with both researchers and enterprise users. According to its website, it can process more than 60 million images in a single day using just one NVIDIA K40 GPU. It is managed by the Berkeley Vision and Learning Center (BVLC), and companies like NVIDIA and Amazon have made grants to support its development.
17. CNTK
Short for Computational Network Toolkit, CNTK is one of Microsoft‘s open source artificial intelligence tools. It boasts outstanding performance whether it is running on a system with only CPUs, a single GPU, multiple GPUs or multiple machines with multiple GPUs. Microsoft has primarily utilized it for research into speech recognition, but it is also useful for applications like machine translation, image recognition, image captioning, text processing, language understanding and language modeling.
18. Deeplearning4j
Deeplearning4j is an open source deep learning library for the Java Virtual Machine (JVM). It runs in distributed environments and integrates with both Hadoop and Apache Spark. It makes it possible to configure deep neural networks, and it‘s compatible with Java, Scala and other JVM languages.
The project is managed by a commercial company called Skymind, which offers paid support, training and an enterprise distribution of Deeplearning4j.
19. Distributed Machine Learning Toolkit
Like CNTK, the Distributed Machine Learning Toolkit (DMTK) is one of Microsoft‘s open source artificial intelligence tools. Designed for use in big data applications, it aims to make it faster to train AI systems. It consists of three key components: the DMTK framework, the LightLDA topic model algorithm, and the Distributed (Multisense) Word Embedding algorithm. As proof of DMTK‘s speed, Microsoft says that on an eight-cluster machine, it can "train a topic model with 1 million topics and a 10-million-word vocabulary (for a total of 10 trillion parameters), on a document collection with over 100-billion tokens," a feat that is unparalleled by other tools.
20. H2O
Focused more on enterprise uses for AI than on research, H2O has large companies like Capital One, Cisco, Nielsen Catalina, PayPal and Transamerica among its users. It claims to make is possible for anyone to use the power of machine learning and predictive analytics to solve business problems. It can be used for predictive modeling, risk and fraud analysis, insurance analytics, advertising technology, healthcare and customer intelligence.
It comes in two open source versions: standard H2O and Sparkling Water, which is integrated with Apache Spark. Paid enterprise support is also available.
21. NuPIC
Managed by a company called Numenta, NuPIC is an open source artificial intelligence project based on a theory called Hierarchical Temporal Memory, or HTM. Essentially, HTM is an attempt to create a computer system modeled after the human neocortex. The goal is to create machines that "approach or exceed human level performance for many cognitive tasks."
In addition to the open source license, Numenta also offers NuPic under a commercial license, and it also offers licenses on the patents that underlie the technology.
22. OpenCyc
Developed by a company called Cycorp, OpenCyc provides access to the Cyc knowledge base and commonsense reasoning engine. It includes more than 239,000 terms, about 2,093,000 triples, and about 69,000 owl:sameAs links to external semantic data namespaces. It is useful for rich domain modeling, semantic data integration, text understanding, domain-specific expert systems and game AIs. The company also offers two other versions of Cyc: one for researchers that is free but not open source and one for enterprise use that requires a fee.
23. OpenNN
Designed for researchers and developers with advanced understanding of artificial intelligence, OpenNN is a C++ programming library for implementing neural networks. Its key features include deep architectures and fast performance. Extensive documentation is available on the website, including an introductory tutorial that explains the basics of neural networks. Paid support for OpenNNis available through Artelnics, a Spain-based firm that specializes in predictive analytics.
24. SystemML
First developed by IBM, SystemML is now an Apache big data project. It offers a highly-scalable platform that can implement high-level math and algorithms written in R or a Python-like syntax. Enterprises are already using it to track customer service on auto repairs, to direct airport traffic and to link social media data with banking customers. It can run on top of Spark or Hadoop.
25. TensorFlow
TensorFlow is one of Google‘s open source artificial intelligence tools. It offers a library for numerical computation using data flow graphs. It can run on a wide variety of different systems with single- or multi-CPUs and GPUs and even runs on mobile devices. It boasts deep flexibility, true portability, automatic differential capabilities and support for Python and C++. The website includes a very extensive list of tutorials and how-tos for developers or researchers interested in using or extending its capabilities.
26. Torch
Torch describes itself as "a scientific computing framework with wide support for machine learning algorithms that puts GPUs first." The emphasis here is on flexibility and speed. In addition, it‘s fairly easy to use with packages for machine learning, computer vision, signal processing, parallel processing, image, video, audio and networking. It relies on a scripting language called LuaJIT that is based on Lua.
Astronomy
27. Celestia
Travel virtually to anywhere in the known universe at any time with Celestia. It displays hundreds of thousands of celestial bodies as they would appear in the night skies. Operating System: Windows, Linux, OS X.
28. KStars
Similar to Stellarium, KStars lets users view "up to 100 million stars, 13,000 deep-sky objects, all 8 planets, the sun and moon, and thousands of comets and asteroids." It also includes a number of tools helpful for amateur astronomers, such as an observation list, an FOV editor, a sky calendar, supernova alerts and a glossary of technical terms. (Note that in order to use KStars on Windows, you‘ll have to download KDE for Windows.) Operating System: Windows, Linux
29. Stellarium
Another option for budding astronomers, this one confines the point of view to planet earth rather than allowing users to zoom throughout the universe, but it is so accurate that it is used by many planetariums. Operating System: Windows, Linux, OS X.
Audio Tools
30. Amarok
Amarok invites users to rediscover their music. It integrates with a variety of Web services and includes features like dynamic playlists, collection management, bookmarking, file tracking and import from other music databases, including iTunes. Operating System: Windows, Linux, OS X, iOS.
31. Ardour
Designed for use by professional audio engineers, musicians, soundtrack editors and composers, Ardour is a complete audio recording, mixing and editing suite. Key features include support for most hardware, flexible recording, unlimited multichannel tracks, unlimited undo/redo and much more. Operating System: Linux, OS X
32. aTunes
This Java-based music player and manager displays complete information—including lyrics—for the song currently playing. It‘s a good option for users with particularly large music collections. Operating System: OS Independent
33. Audacious
Unlike some audio players, Audacious doesn‘t use a lot of system resources, so it doesn‘t degrade system performance when you‘re using your PC for other tasks as well as listening to music. The latest update offers improved playlist shuffling, easier recording of Internet streams and a better equalizer interface. Operating System: Windows, Linux.
34. Audacity
A perennial favorite among Linux desktop users, Audacity gets hundreds of thousands of downloads per month. It was updated in July with new scrubbing and seeking features, preset effects and improved plug-in installation. Operating System: Windows, Linux, OS X
35. CDex
Downloaded more than 60 million times, CDex is a simple, handy tool for converting CDs to data files. It supports multiple file formats, including WAV, MP3, FLAC, AAC, WMA and OGG. Operating System: Windows.
36. Cdrtools
This suite of command-line tools includes the cdrecord CD/DVD/Blu-ray recording software, as well as tools for reading optical media, extracting audio, and more. It‘s a mature project that has been around for quite a few years. Operating System: Windows, Linux, OS X.
37. cdrtfe
Cdrtfe serves as a front-end for cdrtools and some other command-line recording applications. It can burn audio CDs, data discs, bootable discs, DVD-Video discs, ISO images and other types of optical media. The latest version supports Windows 10. Operating System: Windows.
38. Clementine
Based on an older version of Amarok, Clementine focuses on providing "a fast and easy-to-use interface for searching and playing your music." It supports Internet radio streams, cloud computing services like Dropbox and Google Drive, CUE sheets, tabbed playlists, audio CD playback and much more. Operating System: Windows, Linux, OS X, Android.
39. DeaDBeeF
This self-proclaimed "ultimate music player" supports a very long list of file formats. Key features include cue sheet support, tabbed playlists, cover art display, 18-band graphic equalizer, tag editor, gapless playback and more. Operating System: Linux, Unix, Android.
40. EasyTAG
EasyTAG allows users to view and edit the tag fields on MP3, MP2, MP4/AAC, FLAC, Ogg Vorbis, MusePack, Monkey‘s Audio, and WavPack files. It includes a tree-based browser and CDDB support for manual and automatic searches. Operating System: Windows, Linux
41. Exaile
Another option for Linux users, Exaile offers both playback and a powerful music manager. Key features include smart playlists, advanced track tagging, multiple plug-ins, automatic album art, lyrics and much more. Operating System: Linux.
42. FlacSquisher
This tool was made for audiophiles who like to keep their original music in the lossless FLAC file format. FlacSquisher converts those files to MP3s so that users can take them with them on mobile devices without taking up too much space. Operating System: Windows.
43. Fre:ac
Fre:AC stands for "free audio converter," and it can rip audio CDs or convert among numerous file formats. It‘s also portable, meaning that you can run it from a USB thumb drive without installing it on your system. Operating System: Windows.
44. Frinika
Java-based Frinika is a lightweight but fairly complete music workstation. It includes a sequencer, soft-synths, real-time effects and recording capabilities. Operating System: OS Independent
45. Giada
Giada describes itself as "a free, minimal, hardcore audio tool for DJs, live performers and electronic musicians." It‘s not quite as full-featured as the other options on the list, but it is an effective, lightweight looping tool. Operating System: Linux.
46. Guayadeque
Created for "all music enthusiasts," Guayadeque is a full-featured music management system that can handle large file collections. Noteworthy features include a configurable crossfader engine, configurable silence detector for gapless playback, labeling, smart play mode, last.fm support and more. Operating System: Linux
47. Hydrogen
"Professional yet simple and intuitive," Hydrogen is a drum machine for Linux only. The video on the site helps you quickly see how it works and what it can do. Operating System: Windows, Linux, OS X.
48. Jajuk
Java-based Jajuk works on multiple platforms. Aimed at advanced users, it offers a very full feature set as well as an intuitive interface. Operating System: OS Independent.
49. Jams
Formerly a paid app, Jams is now an open source Android music player with an elegant interface. It can connect to Google Play Music for purchasing songs and includes features like tag support, blacklisting, 9-band equalizer, scrobbling, crossfade, album art download and more. Operating System: Android.
50. KMid
This KDE app plays both Midi and karaoke files, making it easy for you to serenade your sweetheart. It includes a piano player interface and also accepts input from external keyboards. Operating System: Windows
51. Linux MultiMedia Studio (LMMS)
"Made by musicians, for musicians," LMMS is a full-featured music production system with plenty of presets and samples built in. Note that despite the word "Linux" in the name, it is available for Windows and OS X as well. Operating System: Windows, Linux, OS X
52. Mixxx
Made for professional DJs, Mixxx offers "everything you need to start making DJ mixes in a tight, integrated package." It supports more than 30 DJ MIDI controllers, integrates with iTunes and includes BPM detection and sync. Operating System: Windows, Linux, OS X.
53. MOC
Simply select a directory, and the MOC (Music On Console) audio player will play all files in that directory. Supported file formats include MP3, Ogg Vorbis, FLAC, Musepack, Speex, WAVE, AIFF, and AU. Operating System: Linux/Unix, OS X
54. Mp3splt
Mp3splt is an audio utility that does just one thing—it lets you cut mp3 and ogg files into smaller files and rename them. It’s especially useful if you need to split an entire album into individual tracks. Operating System: Windows, Linux, OS X
55. MuseScore
If you are a musician, teacher or composer interested in generating your own sheet music, MuseScore makes it very easy and offers most of the same features you‘ll find in the proprietary software. The website includes some tutorials and plenty of other help to get you started, and the interface is very intuitive. Operating System: Windows, Linux, OS X
56. Nightingale
Nightingale promises users "a beautiful interface with a wide range of supported audio formats, all with multi-platform support." It has a large library of add-ons that extend its capabilities. Operating System: Windows, Linux, OS X.
57. orDrumbox
Another open source option for creating your own drum loops and feeds, orDrumbox offers an easy-to-use interface. Features include auto-composition, poly-rhythms, an arpeggiator, automatic sounds/track matching , custom soft-synths and low-fi rendering. Operating System: Windows, Linux, OS X.
58. Qmmp
Qmmp, which stands for "Qt-based MultiMedia Player," offers features like support for skins, 10-band equalizer, streaming playback, cover art, cue sheet support and multiple playlists. Its interface is very simple and similar to older apps like Winamp and XMMS. Operating System: Windows, Linux
59. Radio Downloader
If your favorite online radio station only offers streaming content, you can turn it into a podcast you can listen to any time with Radio Downloader. It comes with built-in support for BBC content and a helpful "favourites" tab. Operating System: Windows
60. Rhythmbox
Rhythmbox is a Linux-only audio player for the GNOME desktop. The interface and feature set are fairly basic. Operating System: Linux.
61. SoX
This cross-platform command line tool calls itself the "Swiss Army knife of sound processing programs." It can convert files from one type to another, record and play audio files, and add effects. Operating System: Windows, Linux, OS X.
62. TEncoder
This app provides an interface to three other popular open source video tools: FFMPEG, MEncoder and Mplayer. It can convert video files, rip unprotected DVDs, add subtitles, download from YouTube, extract audio or video and more. Operating System: Windows.
63. XiX Music Player
This cross-platform player supports album art and lyrics, reverse play, crossfading, trimming, shuffle, repeat, song rating, search, and more. It‘s also small enough to run on a Raspberry Pi board. Operating System: Windows, Linux, OS X
64. xwax
This Linux-only tool was designed for beat mixing and scratch mixing. Features include needle drops, pitch changes, scratching, spinbacks and rewinds. Operating System: Linux.
65. Yoshimi
Yoshimi is a Linux-only software synthesizer forked from an older version of ZynAddSubFX. The project name comes from a song by The Flaming Lips. Operating System: Linux.
66. ZynAddSubFX
This software synthesizer comes in Windows and Linux versions. Features include real-time, polyphonic, multitimbral and microtonal capabilities and a long list of effects and filters. Operating System: Windows, Linux.
Backup
67. AMANDA
The Advanced Maryland Automatic Network Disk Archiver, or AMANDA, is a popular network backup solution that can save data from Linux, Unix or Windows systems to hard drives, tape or optical media. Zmanda, which sponsors the project, offers commercial products based on the same technology. Operating System: Windows, Linux, OS X.
68. Areca Backup
For standalone systems, Area is an easy-to-use but versatile backup solution. Key features include delta backup, compression, encryption, filters, as-of-date recovery and more. Operating System: Windows, Linux
69. Attic
If you are looking to minimize the amount of storage space you need for backups, consider Attic, which includes built-in deduplication. It also includes optional 256-bit AES encryption and can transfer files to remote hosts via SSH. Operating System: Linux
70. Backup
This Ruby-based tool promises "easy full stack backup operations on UNIX-like systems." It includes a tool for modeling backups. Operating System: Linux, OS X
71. Backupninja
This tool makes it easier to coordinate and manage backups on your network. It incorporates several of the other tools on this list including Duplicity and rsync. Operating System: Linux
72. BackupPC
Robust enough for enterprise use, BackupPC backs up data from Linux and Windows systems to disk. Noteworthy features include a unique pooling scheme, optional compression, a web interface and support for mobile devices. Operating System: Windows, Linux
73. Back In Time
Inspired by an older solution called FlyBack, Back in Time takes snapshots of specified directories. It‘s easy to setup and includes a simple scheduler. Operating System: Linux
74. Bacula
Another option for enterprises, Bacula is a network backup solution that aims to be easy to use and very efficient. Commercial support and services for the solution are available throughBacula Systems. Operating System: Windows, Linux, OS X
75. Bareos
Forked from Bacula, Bareos is a popular open source backup option that is under very active development. Bareos.com offers paid support and services for the tool. Operating System: Windows, Linux, OS X
76. Box Backup
This "completely automatic" backup solution creates backups continuously and can also create snapshots when desired. It includes encryption and optional RAID capabilities. Operating System: Windows, Linux
77. BURP
Short for "BackUp And Restore Program," BURP is a network backup solution based on librsync (see below). It is designed to be easier to configure than some other open source solutions, and it can do delta backups. Operating System: Windows, Linux
78. Clonezilla
Designed to replace Acronis True Image or Norton Ghost, Clonezilla is useful for both system deployment and backup and recovery. It comes in two flavors: live for standalone systems and SE for network backup or cloning multiple systems at once. Operating System: Linux
Powerful but lightweight, this backup tool takes up only 220KB of space on your drive. It supports multiple languages, has an intuitive interface and includes a scheduler. Operating System: Windows
80. DAR
Disk Archive, a.k.a. DAR, is an older command-line tool for backup. For those who prefer a GUI, one is available through DarGUI. Operating System: Windows, Linux, OS X
81. DirSync Pro
This "small but powerful," utility offers incremental backup, filtering and scheduling capabilities. It also boasts a user-friendly interface, and it offers the ability to analyze two sets of files or folders and detect the changes between them. Operating System: Windows
82. DriverBackup!
While this utility isn‘t a complete system backup solution, it does back up Windows drivers. It can also remove unwanted drivers. Operating System: Windows
83. Duplicity
Based on the librsync library, Duplicity creates encrypted archives and uploads them to remote or local servers. It can use GnuPG to encrypt and sign archives if desired. Operating System: Linux
84. FOG
FOG offers cross-platform cloning and imaging capabilities for networks of any size from 5 to 50,000 systems. It boasts that it "offers commercial-grade support at no cost." Operating System: Linux, Windows, OS X.
85. FreeFileSync
A tool for standalone systems, FreeFileSync aims to save users time when setting up and running backups. It is cross-platform and includes 64-bit support. Operating System: Linux, Windows, OS X
86. FullSync
Although it was designed to help web developers push updates to their sites, FullSync can also be used by anyone to create backups. Key features include multiple modes, flexible rules, buffered filesystems, support for multiple file transfer protocols and more. Operating System: Linux, Windows, OS X
87. Grsync
Grsync takes the older rsync synchronization tool and adds an easy-to-use GUI. Noteworthy features include unlimited sessions, highlighted errors, batch capabilities and more. Operating System: Linux, Windows, OS X
88. LuckyBackup
Like Grsync, LuckyBackup was also based on rsync. It has won several awards, but development on this project has slowed. Operating System: Linux, Windows
89. Mondo Rescue
For Linux and FreeBSD only, Mondo Rescue is a disaster recovery solution that supports tape, disk, network or optical media backups. According to its website, its users include "Lockheed-Martin, Nortel Networks, Siemens, HP, IBM, NASA‘s JPL, the US Dept of Agriculture, dozens of smaller companies." Operating System: Linux, Free BSD
90. Obnam
Easy-to-use and secure, Obnam is a snapshot backup solution with built-in deduplication and encryption capabilities. It stores data to hard disks or online via SFTP. Operating System: Linux
91. Partimage
This tool saves partitions of drives as image files, making it useful for backup or installing the same image on multiple systems. It can run across networks or on a standalone PC. Operating System: Linux
92. Redo
Redo boasts that it can get a crashed system back up and running in as little as 10 minutes. It‘s very easy to use and has bare-metal restore capabilities. Operating System: Windows, Linux
93. Rsnapshot
As you might expect from the name, this utility makes a snapshot of your file system for remote or local backup. According to the website, it can be set up in just a few minutes. Operating System: Linux, OS X
94. Rsync
Rsync is a Unix-based file-transfer utility that has synchronization capabilities that make it suitable for creating backups or mirroring. It‘s a useful tool but is best used by advanced users. Operating System: Linux, Windows, OS X
95. SafeKeep
For Linux users only, SafeKeep focuses on security and simplicity. It‘s a command line tool that is a good option for a small LAN. Operating System: Linux
96. SMS Backup+
This tool allows you to backup your text messages and call logs on Gmail. You can also transfer data from Gmail back to your phone. Operating System: Android
97.SnapBackup
Designed to be as easy to use as possible, SnapBackup backs up files with just one click. It can copy files to a flash drive, external hard drive or the cloud, and it includes compression capabilities. Operating System: Windows, Linux, OS X
98. Synkron
While this app is focused primarily on synchronization, it can be used for creating backups as well. Key features include analysis capabilities, blacklisting, restores and cross-platform support. Operating System: Windows, Linux, OS X
99. Unison
Like Synkron, Unison is a file synchronization tool. It can copy files between any two systems connected to the internet, and it has features in common with source code management tools as well as with backup utilities. Operating System: Windows, Unix
100. UrBackup
This client-server backup solution does both image and file backups. It promises "both data safety and a fast restoration time." Operating System: Windows, Linux
101. Weex
The Weex developers intended it primarily as a tool for pushing content to websites, but it can also be used to synchronize or backup files. It supports FTP file transfer. Operating System: Windows, Linux
102. Win32DiskImager
Averaging more than 50,000 downloads every week, this tool is a very popular way to copy a disk image to a new machine. It‘s very useful for systems administrators and developers. Operating System: Windows
103. XSIbackup
XSIbackup can backup VMwareESXi environments version 5.1 or greater. It‘s a command line tool with a scheduler, and it runs directly on the hypervisor. Operating System: VMwareESXi
Big Data Tools
104. Alluxio
Formerly known as Tachyon, Alluxio describes itself as "a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks." It works with tools like Spark and Hadoop to speed performance on big data queries. Operating System: Linux, OS X
105. Ambari
Part of the Hadoop ecosystem, this Apache project offers an intuitive Web-based interface for provisioning, managing, and monitoring Hadoop clusters. It also provides RESTful APIs for developers who want to integrate Ambari‘s capabilities into their own applications. Operating System: Windows, Linux, OS X.
106. Avro
This Apache project provides a data serialization system with rich data structures and a compact format. Schemas are defined with JSON and it integrates easily with dynamic languages. Operating System: OS Independent.
107. Cascading
Cascading is an application development platform based on Hadoop. Commercial support and training are available. Operating System: OS Independent.
108. Chukwa
Based on Hadoop, Chukwa collects data from large distributed systems for monitoring purposes. It also includes tools for analyzing and displaying the data. Operating System: Linux, OS X.
109. Data Torrent RTS
Data Torrent has been around a while, but it first open sourced its Core RTS technology in June of this year. It claims to be "the industry‘s only open source enterprise-grade unified stream and batch platform." It comes in community, standard and enterprise versions. Operating System: Linux
110. Disco
Originally developed by Nokia, Disco is a distributed computing framework that, like Hadoop, is based on MapReduce. It includes a distributed filesystem and a database that supports billions of keys and values. Operating System: Linux, OS X.
111. Flume
Flume collects log data from other applications and delivers them into Hadoop. The website boasts, "It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms." Operating System: Linux, OS X.
112. Genie
Created by Netflix, Genie allows IT administrators to manage Hadoop jobs running on cloud computing services. Netflix uses it to run many thousands of Hadoop jobs every day. Operating System: Windows, Linux, OS X
113. Hadoop
This Apache-sponsored project is the best-known big data tool available. Numerous companies, including Amazon Web Services, Cloudera, Hortonworks, IBM, Pivotal, SyncSort and VMware, offer related products or commercial support for Hadoop. Well-known users include Alibaba, AOL, eBay, Facebook, Google, Hulu, LinkedIn, Spotify, Twitter and Yahoo. Operating System: Windows, Linux, OS X
114. Hadoop Distributed File System
HDFS is the file system for Hadoop, but it can also be used as a standalone distributed file system. It‘s Java-based, fault-tolerant, highly scalable and highly configurable. Operating System: Windows, Linux, OS X.
115. HPCC
This alternative to Hadoop also offers massive parallel processing and storage of big data workloads. Paid enterprise services are available. Operating System: Linux
116. Hypertable
Very popular with Web companies, Hypertable was developed by Google as a way to make databases more scalable. Its users include Baidu, eBay, Groupon and Yelp. It is compatible with Hadoop, and commercial support and training are available. Operating System: Linux, OS X
117. Ignite
This Apache project describes itself as "a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies." The platform includes data grid, compute grid, service grid, streaming, Hadoop acceleration, advanced clustering, file system, messaging, events and data structure capabilities. Operating System: OS Independent.
118. Kudu
Currently in beta trials, Kudu is an Apache project that is part of the Hadoop ecosystem. It combines a simple data model with columnar storage, low latency and distributed architecture. Operating System: Windows, Linux, OS X
119. Lipstick
This Netflix project provides an easy-to-understand graphical representation of Hadoop Pig jobs. It updates as the job executes so that administrators and developers no longer need to sift through log data. Operating System: Windows, Linux, OS X
120. Lucene
Java-based Lucene performs full-text searches very quickly. According to the website, it can index more than 150GB per hour on modern hardware, and it includes powerful and efficient search algorithms. Development is sponsored by the Apache Software Foundation. Operating System: OS Independent.
121. Lumify
Created by a company called Altamira Technologies, Lumify describes itself as an "open source big data analysis and visualization platform." It makes it easy to create 2D or 3D graphs that show the relationship between entities or to overlay data on maps. For those who are interested in learning more about how it works, the website offers several videos that show Lumify in action, and it also has a demo site that allows users to upload their own data and try out the software. Operating System: Linux.
122. MapReduce
An integral part of Hadoop, MapReduce is a programming model that provides a way to process large distributed datasets. It was originally developed by Google, and it also used by several other big data tools on our list, including CouchDB, MongoDB and Riak. Operating System: OS Independent.
123. Mesos
Apache Mesos is a resource abstraction tool that makes it possible for enterprises to treat their entire data center as a single pool of resources, and it is popular with companies that are also running Hadoop, Spark and similar applications. Organizations that use it include Airbnb, CERN, Cisco, Coursera, Foursquare, Groupon, Netflix, Twitter and Uber. Operating System: Linux, OS X
124. Oozie
This workflow scheduler is specifically designed to manage Hadoop jobs. It can trigger jobs by time or by data availability, and it integrates with MapReduce, Pig, Hive, Sqoop and many other related tools. Operating System: Linux, OS X.
125. Pandas
The Pandas project includes data structures and data analysis tools based on the Python programming language. It allows organizations to use Python as an alternative to R for big data analysis projects. Operating System: Windows, Linux, OS X.
126. Pig
Apache Pig is a platform for distributed big data analysis. It relies on a programming language called Pig Latin, which boasts simplified parallel programming, optimization and extensibility. Operating System: OS Independent.