ETL (extract-transform-load of databases) and formal analytics have long been seen as a competitive weapon employed by large companies with the resources to both collect meaningful data, and manage the process of crunching it. The software segment supplying this market is, if anything, over-supplied. The analytics segment in particular has been in a process of slow motion consolidation for several years. See, for instance, acquisitions by SAS, Informatica, and IBM. Attacking this market with new ventures looks unappealing, given large capital requirements and questionable ROI.
(BTW, I distinguish the general purpose analytics market from web analytics, where you have IBM divesting instead of acquiring, and an very capable free offering from Google that sets a floor under the market.)
Like many parts of the software industry, ETL and analytics have been migrating down market, making tools available to smaller enterprises that have the talent to wield them. The offerings include network based (SaaS) services, and suites of open sourced tools that have been assembled from grass roots projects. Notwithstanding some adoption issues is SaaS, both approaches have the potential to produce offerings better fit to the SMB segment than conventional vendors. So, what's out there? Here's a somewhat raw dump from a recent survey I did in the area.
Open Source Companies and Projects
Like the open source operating and server worlds, many of the underlying OSS analytics and ETL related projects have been aggregated by commercial packagers, some of them venture funded, signs of the VC community looking for a way around the logjam at the top of the market.
- Pentaho - collecting a suite of OSS-based components into a platform, including WEKA, JPivot, Firebird RDBMS, Enhydra Shark, Eclipse BIRT, Kettle ETL, JFreeReport, Mondrian OLAP. Florida-based. Funding from NEA, Index. Embeddable and/or usable as 'workflow platform'. Differentiate themselves as 'not just OSS reporting', a dig at Jaspersoft. Looking for support from ISVs/SIs.
- Kinetic Networks. Based on open-sourced, Java-based KETL ETL engine. Kinetic also offers 'DQP' - data quality profiler. Appears to be SaaS / professional services based, bootstrapped business model. No venture capital. See a year old blog post. The Kinetic website offers "Custom Data Security and Stewardship Tools", including custom work to "desensitize information". Open source community appears weak. (KETL is different from the Kettle OSS project acquired by Pentaho.)
- Talend - Open-source, PERL and Java based, ETL engine. Paris based - Fabrice Bonan and Bertrand Diard are the primary developers. Backing unclear. Only moderate activity in their community area. GPLed open source version. XML storage of objects defining ETL chain; hosted over mySQL, PostGres, or other RDBMS.
- JasperSoft - JasperReports and new BI module. Backing from DCM, Morganthaler, Partech, Discovery
And some more detail on one of the constituent OSS projects, as well as several that don't fit in any of these suites.
- Kettle - Java-based, metadata driven extract, transform, transport, load. Plug-in interface available. Part of the Pentaho suite. Kettle is LGPL'ed with a plug-in interface whose use does not require open-sourcing. Appears to have an active community. Kettle uses the Eclipse SWT library for its GUI, and has a plug-in interface for extensions according to the primary developer. See also a short comment on the plug-in interface with links to minimal documents. Kettle suffers from overcute, idiosyncratic names for its components: Spoon, Pan, Chef and Kitchen. I found it distracting, YMMV.
- CloverETL - CSV, fixed, JDBC connectivity. Default, NULL, and other more advanced value handling. Metadata and transformation graphs from XML. Threaded and parallelizable. This is a Java code generator, a non-traditional approach to the problem space that may make it a hard fit into a suite. LGPL'ed. Includes CloverGUI, an Eclipse-integrated graphic editor for ETL pipelines. Screenshots here. In spite of the nice graphics treatment, it would appear to be the hardest for others to work with, since it wants to manage the whole pipeline in its own style (though I suppose one could 'wrap' other components in a compatible style). There doesn't seem to be a plug-in interface per se, but rather some standard classes to extend in order to build one's one component which will be invoked by the codegen. Unknown what effect the LGPL license has in that circumstance... Seems to have seven active developers, the main developer is Germany.
- Octopus - Java / JDBC connectors to RDBMS and other data sources, including CSV. Ant and JUnit support. Could be useful for building quick, live connections as well as homebrew ETLs. Part of the Enhydra OSS Java app server project.
- BEE Project - an older OSS BI project. Focused on what they call 'Relational OLAP'. Corporate version from Insight Strategy, a Czech company. ETL processes are 'under development'. Implemented in PERL with some C and Tk.
- SpagoBI - part of ObjectWeb group of projects. Integrative framework for various BI related open source tools, including BIRT, Weka, R. Project based in Italy.
For further information, see an older blog post specifically discussing OSS ETL.type
Service Model Vendors
This sector is a grab bag. You've got venture funding specifically aimed at the model, traditional software companies trying SaaS, functional and vertical market specialists. Again, a fairly raw list:
- LucidEra - recently emerged from beta. Pitch deck aimed at small companies outgrowing ad hoc customer info systems. Backing from Benchmark, Matrix. An on-demand service accessed via the web. It consists of two layers: The LucidEra platform and pre-built solutions. The LucidEra platform integrates the following components:
- Pre-built connectors to common data sources
- Data extraction, transformation, and loading technology that transfers source data into the LucidEra database, from customer's data resources, based on refresh schedule established by customer
- A data cleansing service that de-duplicates, merges, and matches data from multiple sources.
- The LucidEra database, optimized for query and reporting houses the user's combined data.
- An OLAP server that provides multidimensional views of the data.
- The LucidEra user interface allows the creation of the business reports (quote: "Most LucidEra users will only interact with this component.")
- Seatab Software - offers 'Pivotlink Business Intelligence', SaaS analytics and reporting. Focuses on retail, consumer packaged goods, hitech. Pretty decent customer base. Seattle-based, funding unclear, but no VCs on the board.
- OCO - outsourced reporting and BI provider, with most emphasis on canned reports and alerts. Claims six week implementation cycle. They seem to use extractors on the customer site that result in flat files, which are batched offsite for cleaning and reporting. Based in Massachusetts. Funding is unclear.
- Actuate - develops business intelligence tools and reporting software, all seem to be focused on enterprise reporting solutions. Actuate has four offerings for enterprise reporting.
- Actuate reports - core product, written in a proprietary Visual Basic derivative, Actuate Basic.
- e.SpreadSheet allows developers to create and distribute reports that follow a spread-sheet metaphor.
- FormulaOne Reporting Engines - a Pure Java reporting toolset for use with J2EE development environments and application servers.
- BIRT Reporting - (see SpagoBI above) an open source reporting solution developed as part of the Eclipse Foundation.
Actuate is publicly traded (NASD: ACTU), based in South San Francisco. Most of its business seems to be in conventional software sales models, rather than SaaS. - Host Analytics - positioned as 'business performance management' SaaS provider, rather than general purpose analytics/BI. For instance, includes budget tracking modules. Built entirely on MSFT technology stack. Missouri/Oklahoma based. Funding unclear.
- Sharp Analytics - SaaS BI firm. Seem focused on sales functions. Emphasis on reports with dashboard and ad-hoc queries. 'SharpView' is wraparound product offering. Salt Lake City based. Funding is unclear.
If you're looking for a big conclusions wrap-up, I don't have one. If you're in the potential user category, there is sufficient variety in approach, business model, functionality and specialty here that some upfront research and even trials are warranted. Investors eyeing the category might take that to heart, and figure out how to arrange a consolidation that makes both the shopping process and implementation easier for the buyer.