Business Analytics from Scratch: Design Patterns

The goal of this article is a guideline for building out a scalable business analytics framework for the modern business stack. The article is broken into three sections: Data, Analytics, and Infrastructure Strategies.

Jesse Orshan
12 min readDec 15, 2022

Quick Intro:

Data-driven decision-making is a fundamental element of business growth. Moreover, the field is undergoing a revival due to many concurrent industry trends including the proliferation of APIs/digital data, increased basic coding literacy of new workforce entrants (high school and college grads), and new business team structures (such as Fusion Teams).

The rise of data and digital workflows have created enormous opportunities and challenges for businesses. This trend helps explain why we are seeing a rise in Business and Data Analytics teams across organizations large and small.

As a Co-Founder at WayScript, I spend a large amount of time talking to business analytics leaders about their strategies for modern analytics. I constantly find myself asking, how would I build a business analytics strategy from scratch? And, how can I apply these learnings to our own product?

Here is what I currently think are the essentials of any homegrown business analytics strategy. The article is broken into three sections: Data, Analytics, and Infrastructure Strategies.

Data Strategy

For comprehensive business analytics, a data strategy is the crucial first step and will yield much faster/scalable analytics across an organization. Before any automation or analytics takes place scalably, first all data sources should be standardized. Without this, you inevitably end up with ‘hacky’, error-prone (and possibly insecure) applications.

As any analyst knows, data is now everywhere. It can exist in various places, including external applications (CRMs, SaaS tools, Marketing Tools, …) and internal sources (databases, data warehouses, digital documents). Some of this data is structured, but a lot is also unstructured (documents, spreadsheets, etc.). Also, each system has unique interfaces, authentication flows, APIs, credentials, etc. for working with them.

The key goal of a data strategy is to establish a platform for interfacing with all business data via a series of standardized, secure REST APIs. This can be referred to as ‘Everything-As-A-Service’ (EaaS).

Create Services

If this sounds intimidating, don’t worry. There is a strategy for growing your EaaS platform over time. The key is to start simple. Let’s start with 3 examples: a SQL database table, a social media platform like Instagram, and a CRM such as Salesforce.

For a SQL database table, there are a few key endpoints we likely want to expose to data analysts: read, write, delete, and create. Here is a quick example of how a writefunction might look (pseudo-code).

# /services/sql_service.py

# DB credentials are 'built-in' as environment variables
# Credentials discussed in-depth in the Infrastructure Section
database_url = os.get('DB_URL')
database_password = os.get('DB_PASSWORD')

# Database connection for all SQL service endpoints
db = database.connect(database_url, database_password)

# Endpoint looks up a customer in a SQL database by email address
@app.get('/sql/customer/<email>')
def get_customer_by_email(str:email):
# Each analyst has a unique API Key
consumer_api_key = request.args.get('api_key') # Implementation discussed later

# Validating if the analyst has permission to use this service
if has_permission(consumer_api_key):
user = db.users.query(email=email).get()
if user:
return json(user), 200
else:
return 'no user found', 204
else:
return 'unauthorized', 401

For Instagram, maybe there are a few data points we want to pull such as the current follower count:

# /services/instagram.py

from instagram_sdk import Instagram

instagram_api_key = os.get('INSTAGRAM_API_KEY')

@app.get('/instagram/get_follower_count')
def follower_count():
# Notice same API key as sql_service.py above
consumer_api_key = request.args.get('api_key')
if has_permission(consumer_api_key):
follower_count = Instagram(instagram_api_key).get_followers()
return follower_count, 200
else:
return 'unauthorized', 401

For a CRM, there are a few data points we initially want to expose: get a customer, update a customer, create a customer, and delete a customer.

# /services/salesforce.py

from salesforce_sdk import Salesforce

salesforce_api_key = os.get('SALESFORCE_API_KEY')

@app.post('/salesforce/create_customer')
def create_customer():
consumer_api_key = request.args.get('api_key')
if has_permission(consumer_api_key):

# Able to pull query parameters, supplied in documentation
# Documentation discussed later
email = request.args.get('email')
address = request.args.get('address')
first_name = request.args.get('first_name')
last_name = request.args.get('last_name')

# Function to validate these user inputs
is_valid, error_message = user_object_validation(
email = email,
address = address,
first_name = first_name,
last_name = last_name,
)

if not is_valid:
return error_message, 400

Salesforce(salesforce_api_key).create_customer(
email=email,
address=address,
first_name=first_name,
last_name=last_name
)
return 'customer added', 200

else:
return 'unauthorized', 401

Let’s take a second to look at some of the key benefits of the code blocks written above:

  1. Each employee (API consumer) has their own API Key for querying these services. This creates two fundamental benefits. First, user identification is standardized across all services. Secondly, the users never touch the underlying credentials to the services themselves.
  2. The has_permission() function enables per-service / per-endpoint permission management. This enables a scalable and auditable way to manage which individuals, teams, groups, etc. can interface with any given service. We will discuss how to architect this in more detail in the Infrastructure section below.
  3. This service is easy to consume. Anyone within the organization who knows how to query an API can readily consume these endpoints without needing to learn the intricacies of each underlying application API. Thirdly, these endpoints can be consumed by other tooling including BI tools, No-Code apps, etc.

Hopefully, the ‘big-picture’ of this type of architecture is setting in. The modern business analytics leader should strive to turn everything into a standardized, permission-managed service. This can include product metrics (how many DAUs today?), documents (add a new row to this spreadsheet on Google Drive), developer tooling (how many open pull requests does our repo have?), marketing (send a custom email), sales (what’s X customer’s current contract size?), etc.

Documentation

Standardized documentation should be a requirement for all services and endpoints that are hosted on your internal services platform. There are so many resources on building beautiful, standardized documentation so I won’t go too deep on that here. However, documentation is only as good as it is kept up to date. It is imperative that updated docs for any service changes are a requirement of the deployment process for these services.

Conclusion — On a Data Strategy — Everything as a Service

By transforming all data into services, you will unlock a speed and consistency of business and data analytics that is unparalleled by other investments. Once even a basic version is in place, it will rapidly improve the ability to perform analytics, workflow automation, and cross-team learning. This also scales and leads to the centralization of code so that there isn’t phantom code or shadow IT around the organization.

As a strategy, whenever a new data source or data pull type is required by an analyst, the goal should be to build it into a service instead of having them hard-code the data connections into their unique application/script code. You can call this ‘Services Driven Development’.

EaaS should be the primary objective of any business analytics team.

Analytics Strategy

Background Concepts

Analytical Categories — Generally speaking, there are 4 categories of business analytics: Descriptive (‘What’s happening to my business right now?’), Diagnostic (‘Why is it happening?’), Predictive (‘What’s likely to happen in the future?’), and Prescriptive(‘What do I need to do to succeed?’) — Harvard Business School.

Analytical Outputs — With all four categories, it is helpful to think about the most common outputs a business analytics team is generating. These often include dashboards, reports, BI tooling data, ML training models, alerts/notifications, etc.

Analytical Pace — The analytical pace is the rate or frequency at which the analytics are queried. It is determined by how up-to-date the data needs to be in order to be useful. For example, if I want to generate a weekly revenue report, then a batch process on Friday afternoons is one possible analytical pace. If I want a dashboard full of continuously updating data, that would be another pace. And if I want to immediately fire off an alert to a product manager if the number of users interacting with a certain feature drops to zero, then that would be what is known as an ‘event drive’ pace.

An Analytics Architecture

Any framework for business analytics needs to ensure multiple key things are in place: scalability, repeatability/consistency, tooling for common automation, discoverability, access to data (now very easy with Everything as a Service), and security (discussed in Infrastructure).

The solution to achieving this is via Templates.

The best way to think about this is with an example:

Microservice Template— It will be common for analysts to have programs that they want to run on a schedule (say to generate a report once a day). For this report, they will want to pull data from a variety of sources, perform programmatic analysis, and generate an output (say an excel file that gets emailed to relevant stakeholders).

The Microservice Template is already configured for easily setting up automated runs on a schedule and might look something like this:

# microservice_template.py
import os
import requests

'''
This is our internal template for python scripts that run on a schedule.
In order to set up your cron, go to the Triggers view and set the time.
Once deployed, this script will automatically run on the given schedule
'''

# Discussed more in infra section
# Note, this is the CONSUMER_API_KEY discussed in the Services section.
# This key will is unique to each user who runs this script.
api_key = os.get('INTERNAL_API_KEY')

'''
Step 1) Pull in relevant data sources: see <internal_docs_url> for available
apis.
Step 2) Perform analysis
Step 3) Generate Output: see <internal_docs_url> for available apis.
'''

As an analyst, I can now provision a template, write code, interface with data sources (via EaaS), and easily deploy this application (discussed later). As an example output, here is a program that writes to a slack channel once a week with new social media follower counts might look like (pretend this is a data analyst for the NY Knicks):

# instagram_followers_to_slack.py
import os
import requests

'''
This is our internal template for python scripts that run on a schedule.
In order to set up your cron, go to the Triggers view and set the time.
Once deployed, this script will automatically run on the given schedule
'''

# Discussed more in infra section
# Note, this is the CONSUMER_API_KEY discussed in the services section.
# This key will be unique to each user who runs this script.
api_key = os.get('INTERNAL_API_KEY')

'''
Step 1) Pull in relevant data sources: see <internal_docs_url> for available
apis.
Step 2) Perform analysis
Step 3) Generate Output: see <internal_docs_url> for available apis.
'''

# Step 1 -- Pull Data from EaaS sources
instagram_followers = requests.get(f'api.knicks.io/instagram/get_follower_count?api_key={api_key}')
tik_tok_views = requests.get(f'api.knicks.io/tiktok/get_views?days=7&api_key={api_key}')
tweets = requests.get(f'api.knicks.io/twitter/tweet_count?days=7&api_key={api_key}')

url = f'api.knicks.io/instagram/get_follower_count_by_date?date=2022-12-13&api_key={api_key}'
last_week_instagram = requests.get(url)

# Step 2 -- Perform Analysis
instagram_pct_change = (instagram_followers-last_week_instagram)/last_week_instagram
# ... analysis for other social platforms

# Step 3 -- Post Analysis to Slack
slack_message = f'''
Today's Instagram Followers: {instagram_followers}
Percent Change from Last Week: {instagram_pct_change}
...
'''

requests.post(f'api.knicks.io/slack/post_message_to_channel?message={slack_message}&channel_name=Marketing&api_key={api_key}')

The above example is focused on a scheduled task (i.e. run every Friday at 9 am EST). Another use case might be to run a script based on an event (say when an email is received with a certain subject line, etc.). For this use case, a similar template can be developed for webhook-based microservices:

# webhook_microservice.py

import os
import requests
from internal_sdk import webhook_response

'''
This is our internal template for python scripts that run via a webhook.

Step 1) Pull in relevant data sources: see <internal_docs_url> for available
apis.
Step 2) Perform analysis
Step 3) Serve Response back to the Web Request
'''

# Step 1 -- Pull Data from EaaS sources
# ... same as above

# Step 2 -- Perform Analysis
#... same as above
#... analysis for other social platforms

# Step 3
'''
Example Code of How to Respond:
'''
payload = {"hello": "world"}
headers = {"content-type": "application/json"}
status_code = 200

webhook_response.send_response(data=payload, headers=headers, status_code=status_code)

With a framework for webhooks, now data analysis can be performed in real-time via requests from other programs, BI tools, web apps, etc.

Hopefully, it is clear that this same templated framework could be applied to pre-built dashboard templates (a NodeJS static site), etc.

Conclusion

Templates are key for a scalable analytics strategy. By creating standardization augmented by Everything as a Service, any analyst can rapidly begin building robust, scalable analytics tooling in a consistent way.

Infrastructure Strategy

Over the past two sections, we’ve established frameworks for structuring data and analytics. Lastly, we get to address the paradigm for all of the development and cloud infrastructure required to accomplish these goals.

The infrastructure concepts displayed in the examples above are user-based API keys for permission validation, environment variables, configurations for common paradigms, and easy deployments.

1) API Keys for User Identification and Permissioning

Here is an earlier snippet of code from the Data Analysis section:

@app.get('/sql/customer/<email>')
def get_customer_by_email(str:email):
consumer_api_key = request.args.get('api_key')
if has_permission(consumer_api_key):
...
return 'success', 200
else:
return 'unauthorized', 401

As shown in this code snippet, this sql service validates if the user provided api_key in the request has_permission() to interface with this endpoint. This permission logic can be standardized across all services within your organization and should do the following:

Mainly, validate that the key belongs to a person within the organization. By generating a unique API key for every individual within the organization (you can leverage your SSO provider / active directory for this), you can quickly identify users and protect your service endpoints. Secondly, you can pull metadata on each user such as teams or groups for more custom permissions.

A more explicit version might look something like this:

def has_permission(api_key:str):
user = sso_provider.get_user(api_key)
if not user:
return None
return user

Moreover, a User object might look something like this:

# USER object
{
"First Name" : "Julius",
"Last Name" : "Randle",
"Teams" : ["data_science","marketing"],
"Email" : "julius@nyknicks.com",
}

The /sql/customer/ endpoint above is only checking if the api_key is affiliated with anyone in the organization. However, you can hopefully see how you could build more sophisticated logic such as “only allow to hit endpoint if on X team”, etc.

2) Environment Variables

In the above sections, environment variables are used in 2 places:

  1. Pulling credentials into your services:
salesforce_api_key = os.get('SALESFORCE_API_KEY') 

Notice, this code snippet is available in your data services but is not available to end users in their analytics application code. These credentials are injected into your services as environment variables. In order to facilitate this, you need to design a way to pull appropriate credentials (say from a secrets/password manager) into your code environment. Using docker containers can make this simpler to do at scale. I plan to follow up in future blog posts with a more in-depth example of how to do this. The key idea, however, is that underlying credentials are available to your data services but not to end users.

2. Allowing analysts to automatically pull their API keys into their scripts for interfacing with published services:

api_key = os.get('INTERNAL_API_KEY')

# User can use api_key to query a service
data = requests.get(f'api.knicks.io/instagram/get_follower_count?api_key={api_key}'

The second instance of environment variables is the ability for end-users to automatically pull their unique API keys into their scripts. Similarly, these keys should be injected as environment variables into the development environment at runtime. This enables code shareability where different analysts could all run the same program with their own API keys (and without needing to hardcode keys into scripts).

3) Deployment Criteria: Scheduled Tasks, Webhooks, and Servers

Besides code that runs manually, there are 3 most common ways developers will want to their code to run: Scheduled Jobs, Webhooks, and Servers. The easiest way to enable analysts to ‘productize’ their code is to create easy configuration files that handle all deployment infrastructure. Let’s explore a quick example:

a) Scheduled / Cron / Background Tasks — As an example, you could include a cron.json file in your Cron Task Template for a user to fill out. It might look something like this:

# cron.json
{
"cron_frequency" : "0 0 0 0 0", # cron syntax
"script_to_run" : "instagram_followers_to_slack.py"
}

When deployed (discussed below), this file is identified and this program is set to automatically run at the scheduled time on behalf of the end programmer.

b) Triggered via Webhook — As a similar example, you may want to create a webhook configuration via a file webhook.json that is automatically included with the Webhook Task Template. It might look something like this:

# webhook.json
{
"sub_domain" : "salesforce_data_lookup",
"script_to_run" : "webhook_microservice.py"
}
# The program webhook_microservice.py would now run
# when https://salesforce_data_lookup.nyknicks.io is queried

c) As a Deployed Hosted Server / Static Site — This should now feel similar to the two previous examples. In this case, you might want a configuration file called server_deployment.json

# server_deployment.json
{
"sub_domain" : "my_flask_server"
"build command" : "pip install -r requirements.txt",
"start command" : "flask run --port 8080 --host 0.0.0.0",
"port" : 8080
}

# This flask server would now run at
# https://my_flask_server.nyknicks.io

4) Deploying to Production

Once templated, deploying should be as easy as a single button or terminal command (depending on the preferred UX) accompanied by a users api_key. Depending on the configuration file, the deployment would be handled and all infrastructure automatically provisioned for the program to run appropriately.

On WayScript, we’ve created a single-click Deployment button in each development environment that does this on the user's behalf.

5) Bonus — Other ‘Advanced’ Considerations

As your Infrastructure is being built, there may be some added considerations.

A) Include your SRE/IT team on the Infrastructure Side! They will want control and visibility into these applications. Also, they can facilitate lower-level configurations such as security parameters, instance sizes, etc.

B) Having built-in observability and auditability. Ideally, all programs are sending logs to DataDog, New Relic, another internal dashboard, etc.

C) Built-in Alerting — Because the program author is known, you can notify them if their programs ever fail, have errors, etc.

Conclusion

This article is meant to give a playbook for a ground-up strategy for a modern business and data analyst strategy. This includes Data, Analytics, and Infrastructure frameworks/design patterns. While this is by no means exhaustive, it is meant to give direction to patterns for creating a scalable, secure internal data system.

Quick disclaimer — At WayScript, we are building a platform specifically geared to help teams create this type of infrastructure without needing to build it all from scratch. This article is based on our own learnings and experience around the space. It is by no means exhaustive and we will surely add more learnings over time.

If you have feedback, comments, or like to talk about this stuff — please reach out to me at jesse[at]wayscript.com

--

--