The business of Experian, a global leader
in credit reporting and marketing services with annual revenues exceeding US$4.3
billion (for 2017), is all about data.
Experian has four main business units:
Credit Information Services, Decision Analytics, Business Information Services,
and Marketing Services. Experian Marketing Services (EMS) helps marketers
connect with customers through relevant communications across a variety of
channels, driven by advanced analytics on an extensive database of geographic,
demographic, and lifestyle data. EMS has built its business on the effective
collection, analysis, and use of data.
The company has always handled large
amounts of data, billions and quadrillions of records, on who consumers are,
how they’re connected, how they interact. With today’s proliferation of digital
channels and information of social media likes, web interactions and email
responses, older systems no longer have the capacity to deal with the data
In the past, there was no requirement to
provide data in real-time. Experian sent customer database updates to clients
once a month for campaign adjustments, allowing Experian to process large
volumes of data through a number of diverse platforms, which were mostly mainframe
That’s changing. Today’s consumers leave a
digital trail of behaviors and preferences for marketers to leverage so they
can enhance the customer experience. Experian’s clients, which includes many of
the top retail companies in the world, are asking for more frequent updates on
consumers’ latest purchasing behaviors, online browsing patterns and social
media activity so they can respond in real time. They are increasingly looking
for a single, integrated view of their customer.
infrastructure for real-time reporting
Meeting the need for immediacy of
information and customisation of data in real time for clients, would require a
technological infrastructure that can accommodate rapid processing, large-scale
storage, and flexible analysis of multi-structured data. Experian’s mainframes
were hitting their limits in terms of performance, flexibility and scalability.
EMS set an internal goal to process more
than 100 million records of data per hour, translating to 28,000 records per
The team decided to look for new
architectures that could handle the new volumes of data. About 30 criteria were
identified for the new platform, ranging from depth and breadth of offering to
support capabilities to price to unique distribution features. Two criteria
were prioritized: Both batch and real-time data processing capabilities; and scalability
to accommodate large and growing data volumes.
The North America Experian Marketing
Services group led the evaluation of NoSQL technologies within Experian. Hadoop
and HBase quickly surfaced as a natural fit for Experian’s needs. EMS engineers
downloaded raw Apache Hadoop.
They saw certain gaps that could be filled
by a commercial distribution. EMS evaluated several distributions and selected
Cloudera to meet EMS’ enterprise-level Hadoop needs, such as meeting client
SLAs (service level agreements) and having 24×7 reliability.
Experian invested in Cloudera Enterprise,
which is comprised of three things: Cloudera’s open source Hadoop stack (CDH),
a management toolkit (Cloudera Manager), and expert technical support.
A production version of Experian’s
Cross-Channel Identity Resolution (CCIR) engine was launched. CCIR is a linkage
engine that is used to keep a persistent repository of client touch points.
CCIR runs on HBase,
a high-performance, distributed data store that integrates with Cloudera's
platform to deliver a secure and easy-to-manage NoSQL database.
EMS’ HBase system spanned five billion rows
of data, as of 2017, and the number is expected to grow tenfold in the near
future. HBase offers a shared architecture that is distributed, fault tolerant,
and optimised for storage. In addition, HBase enables both batch and real-time
Experian feeds data into the CDH-powered
CCIR engine using custom extract, transform, load (ETL) scripts from in-house
mainframes and relational databases including IBM DB2, Oracle, SQL Server, and
performance accelerated by 50x
The new platform is delivering operational
efficiency to Experian by accelerating processing performance by 50x, at a
fraction of the cost of the legacy environment. The new system can process 100
million records per hour compared to 50 million matches per day earlier.
Cloudera Enterprise allows Experian to get
maximum operational efficiency out of their Hadoop clusters. Due to a wide
variation in use cases for customers, the team had to do a lot of tweaking on
the platform to get the performance we need. Cloudera Enterprise provides the
ability to store these store different configuration settings and version those
McCullough added, “Not only has Cloudera
Manager simplified our process, but it’s made it possible at all. Without a
Linux background, I would not have been able to deploy Hadoop across a cluster
and configure it and have anything up and running in nearly the timeframe that
Furthermore, Cloudera Manager enabled the
deployment and configuration of Hadoop across a cluster in the timeframe
Experian had. Cloudera Manager monitors services running on cluster and reports
when servers are unhealthy, services have stopped, and/or nodes are bad. It automates
distribution across the cluster, monitors CPU usage across various applications
and data storage availability and provides a single portal to see into all
The deployment allowed Experian to process
orders of magnitude more information through its systems. Experian’s platform is
the first data management platform of its kind that accepts data, links
information together across an entire marketing ecosystem, and puts it into a
usable format for an enhanced customer experience. These data processing capabilities
combined with Experian’s expertise in bringing together data assets provided
new insights into tomorrow’s marketing environments.
In January 2017, it was announced
that Experian was integrating Cloudera Enterprise onto
its cloud environment for its Credit Information Services, Decision Analytics
and Business Information Services business lines, with the aim of improved credit data processing speeds for
clients. Thus, Cloudera continues to transform the way Experian provides
consumer and business credit data to its clients.