This is “Where Does Data Come From?”, section 11.3 from the book Getting the Most Out of Information Systems (v. 2.0). For details on it (including licensing), click here.

For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. To download a .zip file containing this book to use offline, simply click here.

Has this book helped you? Consider passing it on:
Creative Commons supports free culture from music to education. Their licenses helped make this book available to you. helps people like you help teachers fund their classroom projects, from art supplies to books to calculators.

11.3 Where Does Data Come From?

Learning Objectives

  1. Understand various internal and external sources for enterprise data.
  2. Recognize the function and role of data aggregators, the potential for leveraging third-party data, the strategic implications of relying on externally purchased data, and key issues associated with aggregators and firms that leverage externally sourced data.

Organizations can pull together data from a variety of sources. While the examples that follow aren’t meant to be an encyclopedic listing of possibilities, they will give you a sense of the diversity of options available for data gathering.

Transaction Processing Systems

For most organizations that sell directly to their customers, transaction processing systems (TPS)Systems that record a transaction (some form of business-related exchange), such as a cash register sale, ATM withdrawal, or product return. represent a fountain of potentially insightful data. Every time a consumer uses a point-of-sale system, an ATM, or a service desk, there’s a transactionSome kind of business exchange. (some kind of business exchange) occurring, representing an event that’s likely worth tracking.

The cash register is the data generation workhorse of most physical retailers, and the primary source that feeds data to the TPS. But while TPS can generate a lot of bits, it’s sometimes tough to match this data with a specific customer. For example, if you pay a retailer in cash, you’re likely to remain a mystery to your merchant because your name isn’t attached to your money. Grocers and retailers can tie you to cash transactions if they can convince you to use a loyalty cardSystems that provide rewards and usage incentives, typically in exchange for a method that provides a more detailed tracking and recording of customer activity. In addition to enhancing data collection, loyalty cards can represent a significant switching cost.. Use one of these cards and you’re in effect giving up information about yourself in exchange for some kind of financial incentive. The explosion in retailer cards is directly related to each firm’s desire to learn more about you and to turn you into a more loyal and satisfied customer.

Some cards provide an instant discount (e.g., the CVS Pharmacy ExtraCare card), while others allow you to build up points over time (Best Buy’s Reward Zone). The latter has the additional benefit of acting as a switching cost. A customer may think “I could get the same thing at Target, but at Best Buy, it’ll increase my existing points balance and soon I’ll get a cash back coupon.”

Tesco: Tracked Transactions, Increased Insights, and Surging Sales

UK grocery giant Tesco, the planet’s third-largest retailer, is envied worldwide for what analysts say is the firm’s unrivaled ability to collect vast amounts of retail data and translate this into sales.K. Capell, “Tesco: ‘Wal-Mart’s Worst Nightmare,’” BusinessWeek, December 29, 2008.

Tesco’s data collection relies heavily on its ClubCard loyalty program, an effort pioneered back in 1995. But Tesco isn’t just a physical retailer. As the world’s largest Internet grocer, the firm gains additional data from Web site visits, too. Remove products from your virtual shopping cart? Tesco can track this. Visited a product comparison page? Tesco watches which product you’ve chosen to go with and which you’ve passed over. Done your research online, then traveled to a store to make a purchase? Tesco sees this, too.

Tesco then mines all this data to understand how consumers respond to factors such as product mix, pricing, marketing campaigns, store layout, and Web design. Consumer-level targeting allows the firm to tailor its marketing messages to specific subgroups, promoting the right offer through the right channel at the right time and the right price. To get a sense of Tesco’s laser-focused targeting possibilities, consider that the firm sends out close to ten million different, targeted offers each quarter.T. Davenport and J. Harris, “Competing with Multichannel Marketing Analytics,” Advertising Age, April 2, 2007. Offer redemption rates are the best in the industry, with some coupons scoring an astronomical 90 percent usage!M. Lowenstein, “Tesco: A Retail Customer Divisibility Champion,” CustomerThink, October 20, 2002.

The firm’s data-driven management is clearly delivering results. Even while operating in the teeth of a global recession, Tesco repeatedly posted record corporate profits and the highest earnings ever for a British retailer.K. Capell, “Tesco Hits Record Profit, but Lags in U.S.,” BusinessWeek, April 21, 2009; A. Hawkes, “Tesco Reports Record Profits of £3.8bn,” Guardian, April. 19, 2011.

Enterprise Software (CRM, SCM, and ERP)

Firms increasingly set up systems to gather additional data beyond conventional purchase transactions or Web site monitoring. CRM or customer relationship management systems are often used to empower employees to track and record data at nearly every point of customer contact. Someone calls for a quote? Brings a return back to a store? Writes a complaint e-mail? A well-designed CRM system can capture all these events for subsequent analysis or for triggering follow-up events.

Enterprise software includes not just CRM systems but also categories that touch every aspect of the value chain, including supply chain management (SCM) and enterprise resource planning (ERP) systems. More importantly, enterprise software tends to be more integrated and standardized than the prior era of proprietary systems that many firms developed themselves. This integration helps in combining data across business units and functions, and in getting that data into a form where it can be turned into information (for more on enterprise systems, see Chapter 9 "Understanding Software: A Primer for Managers").


Sometimes firms supplement operational data with additional input from surveys and focus groups. Oftentimes, direct surveys can tell you what your cash register can’t. Zara store managers informally survey customers in order to help shape designs and product mix. Online grocer FreshDirect (see Chapter 2 "Strategy and Technology: Concepts and Frameworks for Understanding What Separates Winners from Losers") surveys customers weekly and has used this feedback to drive initiatives from reducing packaging size to including star ratings on produce.R. Braddock, “Lessons of Internet Marketing from FreshDirect,” Wall Street Journal, May 11, 2009. Many CRM products also have survey capabilities that allow for additional data gathering at all points of customer contact.

Can Technology “Cure” U.S. Health Care?

The U.S. health care system is broken. It’s costly, inefficient, and problems seem to be getting worse. Estimates suggest that health care spending makes up a whopping 18 percent of U.S. gross domestic product.J. Zhang, “Recession Likely to Boost Government Outlays on Health Care,” Wall Street Journal, February 24, 2009. U.S. automakers spend more on health care than they do on steel.S. Milligan, “Business Warms to Democratic Leaders,” Boston Globe, May 28, 2009. Even more disturbing, it’s believed that medical errors cause as many as ninety-eight thousand unnecessary deaths in the United States each year, more than motor vehicle accidents, breast cancer, or AIDS.R. Appleton, “Less Independent Doctors Could Mean More Medical Mistakes,”, June 14, 2009; and B. Obama, President’s Speech to the American Medical Association, Chicago, IL, June 15, 2009, -American-Medical-Association.

For years it’s been claimed that technology has the potential to reduce errors, improve health care quality, and save costs. Now pioneering hospital networks and technology companies are partnering to help tackle cost and quality issues. For a look at possibilities for leveraging data throughout the doctor-patient value chain, consider the “event-driven medicine” system built by Dr. John Halamka and his team at Boston’s Beth Israel Deaconess Medical Center (part of the Harvard Medical School network).

When docs using Halamka’s system encounter a patient with a chronic disease, they generate a decision support “screening sheet.” Each event in the system: an office visit, a lab results report (think the medical equivalent of transactions and customer interactions), updates the patient database. Combine that electronic medical record information with artificial intelligenceComputer software that seeks to reproduce or mimic (perhaps with improvements) human thought, decision making, or brain functions. on best practice, and the system can offer recommendations for care, such as, “Patient is past due for an eye exam” or, “Patient should receive pneumovax [a vaccine against infection] this season.”J. Halamka, “IT Spending: When Less Is More,” BusinessWeek, March 2, 2009. The systems don’t replace decision making by doctors and nurses, but they do help to ensure that key issues are on a provider’s radar.

More efficiencies and error checks show up when prescribing drugs. Docs are presented with a list of medications covered by that patient’s insurance, allowing them to choose quality options while controlling costs. Safety issues, guidelines, and best practices are also displayed. When correct, safe medication in the right dose is selected, the electronic prescription is routed to the patients’ pharmacy of choice. As Halamka puts it, going from “doctor’s brain to patients vein” without any of that messy physician handwriting, all while squeezing out layers where errors from human interpretation or data entry might occur.

President Obama believes technology initiatives can save health care as much as $120 billion a year, or roughly two thousand five hundred dollars per family.D. McCullagh, “Q&A: Electronic Health Records and You,” CNET/, May 19, 2009. An aggressive number, to be sure. But with such a large target to aim at, it’s no wonder that nearly every major technology company now has a health solutions group. Microsoft and Google even offer competing systems for electronically storing and managing patient health records. If systems like Halamka’s and others realize their promise, big benefits may be just around the corner.

External Sources

Sometimes it makes sense to combine a firm’s data with bits brought in from the outside. Many firms, for example, don’t sell directly to consumers (this includes most drug companies and packaged goods firms). If your firm has partners that sell products for you, then you’ll likely rely heavily on data collected by others.

Data bought from sources available to all might not yield competitive advantage on its own, but it can provide key operational insight for increased efficiency and cost savings. And when combined with a firm’s unique data assets, it may give firms a high-impact edge.

Consider restaurant chain Brinker, a firm that runs seventeen hundred eateries in twenty-seven countries under the Chili’s, On The Border, and Maggiano’s brands. Brinker (whose ticker symbol is EAT), supplements their own data with external feeds on weather, employment statistics, gas prices, and other factors, and uses this in predictive models that help the firm in everything from determining staffing levels to switching around menu items.R. King, “Intelligence Software for Business,” BusinessWeek podcast, February 27, 2009.

In another example, Carnival Cruise Lines combines its own customer data with third-party information tracking household income and other key measures. This data plays a key role in a recession, since it helps the firm target limited marketing dollars on those past customers that are more likely to be able to afford to go on a cruise. So far it’s been a winning approach. For three years in a row, the firm has experienced double-digit increases in bookings by repeat customers.R. King, “Intelligence Software for Business,” BusinessWeek podcast, February 27, 2009.

Who’s Collecting Data about You?

There’s a thriving industry collecting data about you. Buy from a catalog, fill out a warranty card, or have a baby, and there’s a very good chance that this event will be recorded in a database somewhere, added to a growing digital dossier that’s made available for sale to others. If you’ve ever gotten catalogs, coupons, or special offers from firms you’ve never dealt with before, this was almost certainly a direct result of a behind-the-scenes trafficking in the “digital you.”

Firms that trawl for data and package them up for resale are known as data aggregatorsFirms that collect and resell data.. They include Acxiom, a $1.3 billion a year business that combines public source data on real estate, criminal records, and census reports, with private information from credit card applications, warranty card surveys, and magazine subscriptions. The firm holds data profiling some two hundred million Americans.A. Gefter and T. Simonite, “What the Data Miners Are Digging Up about You,” CNET, December 1, 2008.

Or maybe you’ve heard of Lexis-Nexis. Many large universities subscribe to the firm’s electronic newspaper, journal, and magazine databases. But the firm’s parent, Reed Elsevier, is a data sales giant, with divisions packaging criminal records, housing information, and additional data used to uncover corporate fraud and other risks. In February, 2008, the firm got even more data rich, acquiring Acxiom competitor ChoicePoint for $4.1 billion. With that kind of money involved, it’s clear that data aggregation is very big business.A. Greenberg, “Companies That Profit from Your Data,” Forbes, May 14, 2008.

The Internet also allows for easy access to data that had been public but otherwise difficult to access. For one example, consider home sale prices and home value assessments. While technically in the public record, someone wanting this information previously had to traipse down to their Town Hall and speak to a clerk, who would hand over a printed log book. Not exactly a Google-speed query. Contrast this with a visit to The free site lets you pull up a map of your town and instantly peek at how much your neighbors paid for their homes. And it lets them see how much you paid for yours, too.

Computerworld’s Robert Mitchell uncovered a more disturbing issue when public record information is made available online. His New Hampshire municipality had digitized and made available some of his old public documents without obscuring that holy grail for identity thieves, his Social Security number.R. Mithchell, “Why You Should Be Worried about Your Privacy on the Web,” Computerworld, May 11, 2009.

Then there are accuracy concerns. A record incorrectly identifying you as a cat lover is one thing, but being incorrectly named to the terrorist watch list is quite another. During a five-week period airline agents tried to block a particularly high profile U.S. citizen from boarding airplanes on five separate occasions because his name resembled an alias used by a suspected terrorist. That citizen? The late Ted Kennedy, who at the time was the senior U.S. senator from Massachusetts.R. Swarns, “Senator? Terrorist? A Watch List Stops Kennedy at Airport,” New York Times, August 20, 2004.

For the data trade to continue, firms will have to treat customer data as the sacred asset it is. Step over that “creep-out” line, and customers will push back, increasingly pressing for tighter privacy laws. Data aggregator Intellius used to track cell phone customers, but backed off in the face of customer outrage and threatened legislation.

Another concern—sometimes data aggregators are just plain sloppy, committing errors that can be costly for the firm and potentially devastating for victimized users. For example, in 2005, ChoicePoint accidentally sold records on 145,000 individuals to a cybercrime identity theft ring. The ChoicePoint case resulted in a $15 million fine from the Federal Trade Commission.A. Greenberg, “Companies That Profit from Your Data,” Forbes, May 14, 2008. In 2011, hackers stole at least 60 million e-mail addresses from marketing firm Epsilon, prompting firms as diverse as Best Buy, Citi, Hilton, and the College Board to go through the time-consuming, costly, and potentially brand-damaging process of warning customers of the breach. Epsilon faces liabilities charges of almost a quarter of a billion dollars, but some estimate that the total price tag for the breach could top $4 billion.F. Rashid, “Epsilon Data Breach to Cost Billions in Worst-Case Scenario,” eWeek, May 3, 2011. Just because you can gather data and traffic in bits doesn’t mean that you should. Any data-centric effort should involve input not only from business and technical staff, but from the firm’s legal team as well (for more, see the box “Note 11.32 "Privacy Regulation: A Moving Target"”).

Privacy Regulation: A Moving Target

New methods for tracking and gathering user information appear daily, testing user comfort levels. For example, the firm Umbria uses software to analyze millions of blog and forum posts every day, using sentence structure, word choice, and quirks in punctuation to determine a blogger’s gender, age, interests, and opinions. While Google refused to include facial recognition as an image search product (“too creepy,” said its chairman),M. Warman, “Google Warns against Facial Recognition Database,” Telegraph, May 16, 2011. Facebook, with great controversy, turned on facial recognition by default.N. Bilton, “Facebook Changes Privacy Settings to Enable Facial Recognition,” New York Times, June 7, 2011. It’s quite possible that in the future, someone will be able to upload a photo to a service and direct it to find all the accessible photos and video on the Internet that match that person’s features. And while targeting is getting easier, a Carnegie Mellon study showed that it doesn’t take much to find someone with a minimum of data. Simply by knowing gender, birth date, and postal zip code, 87 percent of people in the United States could be pinpointed by name.A. Gefter and T. Simonite, “What the Data Miners Are Digging Up about You,” CNET, December 1, 2008. Another study showed that publicly available data on state and date of birth could be used to predict U.S. Social Security numbers—a potential gateway to identity theft.E. Mills, “Report: Social Security Numbers Can Be Predicted,” CNET, July 6, 2009,

Some feel that Moore’s Law, the falling cost of storage, and the increasing reach of the Internet have us on the cusp of a privacy train wreck. And that may inevitably lead to more legislation that restricts data-use possibilities. Noting this, strategists and technologists need to be fully aware of the legal environment their systems face (see Chapter 14 "Google in Three Parts: Search, Online Advertising, and Beyond" for examples and discussion) and consider how such environments may change in the future. Many industries have strict guidelines on what kind of information can be collected and shared.

For example, HIPAA (the U.S. Health Insurance Portability and Accountability Act) includes provisions governing data use and privacy among health care providers, insurers, and employers. The financial industry has strict requirements for recording and sharing communications between firm and client (among many other restrictions). There are laws limiting the kinds of information that can be gathered on younger Web surfers. And there are several laws operating at the state level as well.

International laws also differ from those in the United States. Europe, in particular, has a strict European Privacy Directive. The directive includes governing provisions that limit data collection, require notice and approval of many types of data collection, and require firms to make data available to customers with mechanisms for stopping collection efforts and correcting inaccuracies at customer request. Data-dependent efforts plotted for one region may not fully translate in another effort if the law limits key components of technology use. The constantly changing legal landscape also means that what works today might not be allowed in the future.

Firms beware—the public will almost certainly demand tighter controls if the industry is perceived as behaving recklessly or inappropriately with customer data.

Key Takeaways

  • For organizations that sell directly to their customers, transaction processing systems (TPS) represent a source of potentially useful data.
  • Grocers and retailers can link you to cash transactions if they can convince you to use a loyalty card which, in turn, requires you to give up information about yourself in exchange for some kind of financial incentive such as points or discounts.
  • Enterprise software (CRM, SCM, and ERP) is a source for customer, supply chain, and enterprise data.
  • Survey data can be used to supplement a firm’s operational data.
  • Data obtained from outside sources, when combined with a firm’s internal data assets, can give the firm a competitive edge.
  • Data aggregators are part of a multibillion-dollar industry that provides genuinely helpful data to a wide variety of organizations.
  • Data that can be purchased from aggregators may not in and of itself yield sustainable competitive advantage since others may have access to this data, too. However, when combined with a firm’s proprietary data or integrated with a firm’s proprietary procedures or other assets, third-party data can be a key tool for enhancing organizational performance.
  • Data aggregators can also be quite controversial. Among other things, they represent a big target for identity thieves, are a method for spreading potentially incorrect data, and raise privacy concerns.
  • Firms that mismanage their customer data assets risk lawsuits, brand damage, lower sales, fleeing customers, and can prompt more restrictive legislation.
  • Further raising privacy issues and identity theft concerns, recent studies have shown that in many cases it is possible to pinpoint users through allegedly anonymous data, and to guess Social Security numbers from public data.
  • New methods for tracking and gathering user information are raising privacy issues which possibly will be addressed through legislation that restricts data use.

Questions and Exercises

  1. Why would a firm use a loyalty card? What is the incentive for the firm? What is the incentive for consumers to opt in and use loyalty cards? What kinds of strategic assets can these systems create?
  2. In what ways does Tesco gather data? Can other firms match this effort? What other assets does Tesco leverage that helps the firm remain among top performing retailers worldwide?
  3. Make a list of the kind of data you might give up when using a cash register, a Web site, or a loyalty card, or when calling a firm’s customer support line. How might firms leverage this data to better serve you and improve their performance?
  4. Are you concerned by any of the data-use possibilities that you outlined in prior questions, discussed in this chapter, or that you’ve otherwise read about or encountered? If you are concerned, why? If not, why not? What might firms, governments, and consumers do to better protect consumers?
  5. What are some of the sources data aggregators tap to collect information?
  6. Privacy laws are in a near constant state of flux. Conduct research to identify the current state of privacy law. Has major legislation recently been proposed or approved? What are the implications for firms operating in effected industries? What are the potential benefits to consumers? Do consumers lose anything from this legislation?
  7. Self-regulation is often proposed as an alternative to legislative efforts. What kinds of efforts would provide “teeth” to self-regulation. Are there steps firms could do to make you believe in their ability to self-regulate? Why or why not?
  8. What is HIPPA? What industry does it impact?
  9. How do international privacy laws differ from U.S. privacy laws?