Crux Informatics

Economies of scale for data engineering

Hamlin Lovell

There is increasing demand for more data to drive investment decision making and greater need for financial firms to find and harness an increasingly fragmented array of data. The push to derive a competitive edge through interrogating vast volumes and new types of data is coming from buy side, sell side, research and other financial companies. 

Crux Informatics launched in 2017 with a business model that alleviates the burdensome task of managing data, so institutions can focus on creating returns. Crux operates a technology platform that acts as a neutral interface between vendors and buyers of data. This improves the quality and usability of data and speeds its delivery, while also aiming to reduce data management costs by sharing them among multiple clients – and over time, a growing pool of clients. Customers benefit from economies of scale in data management costs. As demand for datasets grow, notable investors have also become Crux’s customers.

Neutrality  

Crux does not buy or sell data or take a revenue share, they just help the data flow. Founder & CEO, Philip Brittan, believes this neutrality is an essential part of the business model: “to get real scale, we need to be completely un-conflicted. Crux is not disintermediating any vendors.” As the message spreads, Crux is receiving plenty of inbound queries from vendors, who license the data directly to clients and maintain direct relationships with their customers. “The Crux entitlement system is granular, and vendors have ongoing sight of who is accessing the data,” Brittan explains. Crux intends to continue accelerating client on-boarding, a process that has historically often taken months or years – and frequently never happens for some vendors as customers wander off during the lengthy process.

To get real scale, we need to be completely un-conflicted. Crux is not disintermediating any vendors.

Philip Brittan, Founder & CEO, Crux Informatics

Pooling resources 

“We aim to short circuit laborious, slow and costly efforts to on-board data, providing it in an actionable format that does not need to be further honed or scrubbed. Traditionally, firms built their own costly infrastructure and sourced, stored, processed, cleaned and managed their own data in house, which meant 50 or 100 firms were repeating identical tasks, often taking months. Errors also resulted in data vendors receiving the same complaints dozens of times,” says Brittan. Now, Crux wires up, stores, cleans, and maps data only once on behalf of many clients and uses technology automation to detect, alert, and remediate data issues as early in the data lifecycle as possible.

Data engineering

Crux’s vision – backed by Goldman Sachs’ Principal Strategic Investment Group, Citi and Two Sigma – is an industry-wide solution for managing data, carrying out the commoditised aspects of data engineering, which includes applying broad industry standards. Crux does the non-controversial data crunching, enabling companies to focus on the proprietary data science. Brittan explains, “Data engineering is anything that involves physically manipulating data. This includes ingesting data, for instance via application program interface (API) or a file transfer protocol (FTP) site, packed or compressed in some way, downloading updates, unpacking files, and checking and validating them, and storing them. Checks can include whether the database schema changed, if the amount of data is correct, if time series start and end dates match up, and if all constituents are present in an index, if each column conforms to rules,” says Brittan. “Our Informatics platform operates and maintains the firms’ data supply chains, so firms get past this first mile challenge of ingesting data and speed ahead to generating returns.  The repetitive tasks are carried out by machines, while our operators provide oversight and judgment.” 

Crux’s managed service for data engineering uses a range of analytical tools that can include Artificial Intelligence (AI), machine learning, and natural language processing, which Brittan studied while at Harvard University pursuing his degree in Computer Science. “These techniques are used for cleaning, tagging, and pattern recognition, but the firm is not positioning itself as an AI or machine learning firm. We are a data engineering and information supply chain operator that offers a managed service through our Informatics Platform, taking on the burdensome information supply chain task,” explains Brittan.

These nitty-gritty routines, readying data for analysis, consume 80% of the time companies spend on harnessing data, according to Crux. Thus, Crux also frees up data consumers’ time for devoting more energies towards using data science to glean differentiated insights and connect with alpha generating information.

Running the buy side gamut

The early adopters for Crux have been hedge funds and systematic funds, such as Two Sigma, which manages $51 billion and is already expert in data. Crux’s buy side clients range from brand name firms to start-ups, and the charging structure is designed to maximise the user base.  “The modular business model means that the minimum commitment is one dataset for a few months. The pricing model is designed to make it possible for firms of all sizes to experiment with different datasets. It is nearly a no risk proposition. Firms can start with three datasets, and work their way up to 3,000,” says Brittan. 

Strategic investors 

Crux’s clients are also its investors. Brittan has been an angel venture capital (VC) investor himself, but Crux deliberately sought out strategic investors rather than financial or pure VC investors. Goldman Sachs’ Principal Strategic Investment Group, Citi and Two Sigma, using proprietary capital, have taken stakes in Crux, raising at least USD 21 million of capital along with other undisclosed investors. “We want investors to be clients and bring a client perspective, to help their firms and the industry become better,” says Brittan. Indeed, Crux’s Board advisors include Goldman Sachs Asset Management’s Chief Data Officer, Jon Neitzell. Crux founders and staff own an undisclosed percentage of the firm.

Data compliance and security

Crux expects to work with any data providers that satisfy its vetting criteria in terms of being legitimate and compliant. Vendors warrant that their data is compliant and legal, and Crux handles all supplier data in accordance with all appropriate measures to preserve content confidentiality and security. If, during their data checks, Crux noticed that a dataset had inadvertently included Personally Identifiable Information (PII), Crux would pre-emptively alert the vendor in order to try and prevent any client having sight of PII. Crux has not spotted any PII so far.

Crux has invested heavily into information security, which includes hiring outside firms to try and hack it as a test of security. Clients also carry out their own security audits. Brittan, whose publications include Turn Your Antivirus Strategy Inside Out, says, “We take information security extremely seriously and are building world class InfoSec capabilities.”

Data types 

Brittan is also the author of Breaking Down Big Data, and says, “We work with traditional and alternative data, which can be private or public, and are constantly seeking new sources and sorts of data, of all sizes, types and across all geographies.” Crux recently co-sponsored the Financial Information Services Association of SIIA Alternative Data Forum, where they participated in a panel regarding opportunities and challenges with sourcing data, making the case for why market data managers should fight to be integral to the sourcing process.

Beyond this, Brittan is a sought-after speaker at industry conferences and has appeared at quantitative investing and data conferences such as BattleFin, Battle of the Quants, the Quandl Alternative Data conference, the Goldman Sachs Systematic Investing Conference, and the Goldman Sachs European Quantitative Investing Conference. 

Examples of vendors that Crux works with include MSCI, Thomson Reuters and FactSet for fundamental and macro data, and eMBS for mortgage data. SafeGraph is one of their alternative data providers, supplying geolocation information. Their model is an open platform for the industry that supports all types of data from a variety of vendors and public sources. Crux ingests, scrubs, and stores all datasets in a consistent way so when a client logs into Crux, all their data is clean, organised, and ready for analysis. 

As Crux is not reselling data nor taking revenue shares, it does not favour any one data provider over any other or recommend one data set over another. While Crux will not express its own opinions over datasets, the Crux platform does facilitate clients running their own tests to evaluate the quality of datasets from multiple vendors and discover the data that is right for their use case.

Data and finance acumen 

Crux employs over 50 staff and is growing fast. “We have a strong team combining their strengths in technology, strategy, investing, and communication to focus on quality and automation. Many of our staff come from financial data and FinTech parts of the industry. It is important that staff have domain expertise to understand problems, businesses and data needs,” says Brittan, who sits on the Board of FINCAD, which received The Hedge Fund Journal’s 2018 award for “Best Derivatives and Risk Solution.” 

The Crux team is a mix of data engineers, software engineers, data operational experts, sales and account managers, with decades of leadership experience in financial services, analytics and data engineering. Staff include Elizabeth Pritchard, formerly COO of AIG Science, and earlier Global Head of Market Data Services at Goldman Sachs, who recently spoke at the Goldman Sachs Hedge Fund Tech seminar. Other staff, such as Head of Platform Engineering, Jonathan Major, Chief Architect, Ben Frank, and Head of Data Engineering, Kesh Iyer, have worked at buy-side firms such as Blackrock, Citadel, and Bridgewater Associates, respectively. Some staff hold the CFA and FRM designations. Front-end client facing staff are mainly in the New York office, while the San Francisco office is home to engineering, architecture, and operations. Crux expects eventually to open offices in Europe and Asia.

Scalability and futureproofing   

“Everything we do is engineered for inherent scalability, because datasets are getting larger, and data and client numbers are unbounded. There has been massive growth of new data sources,” says Brittan.

Crux has built a scalable business model, which is entirely sourced through a cloud platform. Once Crux has ingested data, clients access a unique delivery platform that lets them easily and efficiently connect with and use alpha-generating information. Clients access the data through a rich restful API that lets them write queries, and slice and dice the data, to connect with what they need, when and how they need it. 

“Longer term, we expect to offer a rich set of customisation services, though for now, the priority is more standardised data engineering,” clarifies Brittan. Crux also aspires to stay at the leading edge of technology in all facets, bringing in new technology and infrastructure, so that clients are “futureproof.” Most recently, Crux closed a $20 million Series B funding round, which will enable them to continue to drive innovation and scale their business and platform, so clients can have access to best-in-class technologies and services.