Unlocking AI potential in e-commerce and online business – ethics

Ethics in AI

Welcome! Our “Unlocking AI potential in e-commerce and online business” series aims to provide basic guidance on applying AI (Artificial Intelligence) in businesses which use the internet as a primary source for delivering value to their customers. In this article, we focus on ethical aspects of training and using AI models.

  • AI is biased
  • AI is sexist
  • AI is racist
  • AI is unfair
  • AI is not reliable
  • Can we trust AI? Who is responsible for the decisions it makes?

We see headlines like that more and more frequently. Especially in human resources, in healthcare and in almost every field where people data is being used for machine learning. Our last article revealed what is happening behind the scenes of selected artificial intelligence models. Let’s put these insights into context. Enjoy your reading!


Are you new to AI and Machine Learning? If so, we recommend the first article in our “Unlocking AI potential in e-commerce and online business” dealing with basic concepts.


Black mirror

The Confederation of British Industry published a report recently, recommending companies to “monitor” decisions made by artificial intelligence and conduct an audit of how personal data is being used. Many officials and decision makers are aware of the disruption AI could bring to many industries and everyday life.

AI is not a black box, AI is a black mirror

In the EU, for example, the General Data Protection Regulation (GDPR) could be considered as an official response to new, data-powered technologies. Similarly, the California Consumer Privacy Act (CCPA) sets the bar for U.S. companies. It is probably just a matter of time before lawmakers in other states and countries follow.

In this article, we will not comment on the political dimension of the whole thing. Instead, we will focus on practical information which could be used on a daily basis.

During the last Fall, there was much talk about Amazon scraping its AI recruiting tool. The reason for that was a bias towards the word “women” in CVs. Similarly, the Google Photos tool faced embarrassment caused by machine learning because it labeled a photo of a black couple as “gorillas”. How is something like that even possible? Data.

As you sow…

In a way, we can think of machine learning as a tool for making a mathematical representation of the training data. In her research, Dr. J.J. Bryson, an AI expert from the University of Bath, states, that if we apply machine learning to historical text data, we will implicitly include historical biases as well. Likewise, if we train our NLP (Natural Language Processing) model on texts from the current human discussions, we need to deal with current stereotypes.



AI is not a black box, AI is a black mirror

But here comes the question – who will decide what is still acceptable and what has crossed the line? Is it possible to develop “fair” artificial intelligence? If we want to try, we need to think about related issues:

  • What kinds of stereotypes could be present in the population that produces our data?
  • Is it possible to identify bias in our training data for machine learning?
  • Which features are connected to discrimination and could potentially cause the model to behave unfairly?

The “risky” variables are obviously race, gender and similar “human features”. The problem is that in some cases it is precisely this information that determines the right product or service for the customer. For instance, skin color and skin type could be quite important when training a recommender for cosmetic products.


How to deal with data selection and knowledge representation in AI projects? You can find a few tips in the third article in our “Unlocking AI potential in e-commerce and online business” series!


When preparing the training data, it is good to consider not only the business domain (or the product category) but also the moral boundaries of the target customer segment which will be the source of our data or the end user of our model.

Training the “censor”

Another tough thing for AI is to distinguish what is meant for real and what is just humor, irony or sarcasm. If we think about it, the discovery of such language constructions needs a deeper understanding of the context. Let’s face it, sometimes it is not only difficult for an artificial intelligence. Joking aside, the real problem is when it comes to online content moderation. The Verge nicely describes such troubles in companies like Facebook.

We can always prepare the training data by hand and sort particular phrases as “right” or “bad”; nevertheless, without capturing what exactly makes things funny, our AI will probably just memorize particular sentences without the ability to generalize. It makes much more sense to build a rule-based tool which sends every suspicious content to the moderator for approval or to filter it out, just to be sure.

Falsifying data

Anyway, we need to decide to what extent we are willing to censor free speech. The fundamental question is:

  • Do we want to use AI to improve our products and services or as a tool for enforcing our own beliefs?

In the previously mentioned example of labeling a black couple as “gorillas” we can see how Google approached the problem. Two years after the embarrassment, Google Photos preferred to ignore gorillas completely.

Everything is relative, everything is subjective

We have already suggested that some problems are difficult not only for AI but for humans as well. Are we, humans, intelligent enough to create something which might one day replace our own decisions? Let’s think about how our thought process works and how to use this knowledge to design AI.

Illusion

Rorschach test
Rorschach test

No matter whether we are reading a text or looking at a picture, we always base our decisions on our knowledge and experience but, most importantly, on our own subjective feelings. This principle inspired the Swiss psychologist Hermann Rorschach to create his famous inkblot test. A subject (human) is presented with a series of inkblots and is asked to describe what is on the picture. The individuals’ thoughts and feelings are involuntarily projected into the vague shape of the inkblot.

Using a bit of imagination, we can say that artificial intelligence faces the Rorschach test each time it’s presented with testing data. The input is usually a feature vector – a particular combination of numbers or encoded information. Unfortunately, it often lacks context and the option to ask for more details. The output of the model is basically a “melt-down product” of training samples, which are most similar to the input by the chosen metric.

A duck or a rabbit?
A duck or a rabbit?

Data with multiple meanings is tough for AI as well. This applies to images and language but also to the other types of data such as signal or online events.

Connecting the pieces

As mentioned earlier, our brain uses knowledge and context information in addition to historical data. Let’s illustrate this in these two images. Which bear is alive?

Which bear is alive?
Which bear is alive?

The correct answer is picture 1. What kind of knowledge and contextual information did we use to solve this puzzle?

  • The bear is moving: We know about the existence of such a thing as movement. We are aware that some kind of force is needed for movement to happen. We also know that this force needs some kind of source. We have a basic knowledge of a bear’s anatomy as well. The position of the bear’s body suggests that the source of the movement force is probably the bear itself. Only live bears are able to move on their own.
  • Water splashing and wet fur: We know about the existence of such a thing as water. We are able to detect it because it looks transparent. We know it could be presented in the form of a river or lake. We have a basic knowledge of physics and we know that some kind of force must be applied to make the water splash. We also know that a bear’s fur looks different when it comes into contact with water. The bear’s fur has wet spots and the position of the bear’s body suggests that it’s the source of the force which causes the water to splash. Only live bears are capable of such a thing.
  • The bear is fighting with another bear: We have some basic knowledge about bears and their natural habits such as fighting for food or territory. We know what a fight is and what it looks like. We also know that it could occur between two live individuals. Only live bears are capable of fighting.

How do we know that the bear in picture 2 is not alive? Apart from the water splashing and the fighting, we could apply several thoughts in the same way as in picture 1. The bear looks quite natural and it also seems to be moving. The key evidence is the small label stating “Grizzly Bear” which is placed in front of the bear. We know about the existence of such things as a museum and taxidermy. We also know that in the museum, exhibits are usually marked with labels. Live bears don’t use labels.

What did we just do?

  1. A decomposition of the problem into parts
  2. Each part of the problem was solved separately, using relevant domain knowledge and experience
  3. A synthesis of partial solutions into the final result

In this case, our decision-making process could be described as a fallback model which returns the most probable option and improves its accuracy with every further piece of evidence.

We can use a similar approach to machine learning when solving a complex problem. The final solution can be a combination of several models specializing in particular domains. As well as image recognition, fake news detection could be used as an example for this case – suspicious content can be decomposed into single statements, these statements can be validated, and these partial results can be synthesized into the final decision. Our success depends on how well we are able to capture the context information using available data.


How to successfully execute AI-related projects? You can find a few tips in this article in our “Unlocking AI potential in e-commerce and online business” series!


I know how to deal with ethical issues in AI, what next?

If you want to benefit from AI, you need a few things; proper understanding of an AI project life cycle, the right data and the right people. We will cover all these aspects in our series “Unlocking AI potential in e-commerce and online business”. Stay tuned!

Subscribe

Please select all the ways you would like to hear from us:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Unlocking AI potential in e-commerce and online business – project management

Welcome! Our “Unlocking AI potential in e-commerce and online business” series aims to provide basic guidance on applying AI (Artificial Intelligence) in businesses which use the internet as a primary source for delivering value to their customers. In this article, we focus on the processes and workflow in AI-related projects. Enjoy your reading!

AI project life cycle

A common beginner’s mistake is considering an AI project to be a regular software development project. That’s why we think it’s important to point out how these two differ in the workflow, goals and deliverables.


Are you new to AI and Machine Learning? We recommend the first article in our “Unlocking AI potential in e-commerce and online business” dealing with basic concepts.


AI development is not just software development

Sometimes we work with companies that are trying to use tools and procedures typical for engineering processes when it comes to prototyping an AI. In those cases, software engineers are in charge of analytical tasks. This often leads to a wrong goal metric, missed deadlines and time wasted on unnecessary meetings.

Software development is like building a house

Software development is like building a house. The goal is obvious from the very beginning, defined via product features whose specification usually does not change during the whole development process. Milestones are clear, progress can be easily measured and each product feature can be tested using unit tests. You can easily split independent tasks among several development teams.

AI development could be compared to solving a puzzle

AI development, on the other hand, is iterative research which could be compared to solving a puzzle or finding a path in a maze. The primary goal can be changed repeatedly based on new findings from the data. Instead of product features, there is the precision and consistence of mathematical models which can be tricky to measure correctly. Success or failure depends on the quality of the data and the ability to simulate reality. All these aspects mean you need to bear in mind greater uncertainty which is caused by data variability and changing conditions. In any case, you will gain valuable insights which will help you with strategic decisions and push your business forward. When it comes to creating a team, quality is much more important than quantity. Two or three experts who are familiar with the business domain and state-of-the-art technologies always outperform an army of freshers and junior analysts. That’s because AI projects are about the right ideas and inventive thinking, not just about the number of man-days spent coding.

If you want to leverage AI in e-commerce and online business, you need the right tools. One of them is a good understanding of AI project processes.

CRISP-DM and its role in AI projects

Do you already have some experience with analytics? If you do, you might be familiar with CRISP-DM (Cross Industry Standard Process for Data Mining). It’s been around since 1996, originally crafted by five companies: Daimler AG, Integral Solutions Ltd (ISL), NCR Corporation, OHRA and Teradata. CRISP-DM is a widely used process model which describes workflow typical for almost any data-related project. Thankfully, AI projects are in many ways similar to regular data-related projects.

CRISP-DM
CRISP-DM
  • A good process for AI development enables quick orientation in the project life cycle and helps you not to forget about important things.

At pbi.ai we use a three phases methodology inspired by CRISP-DM:

  1. Discovery phase – The first phase focuses on problem understanding and initial data analysis. The main purpose of this phase is to assess available options, make a decision regarding the overall strategy and define clear, quantifiable and measurable goals.
  2. Prototyping phase – Mathematical modeling, simulations and experiments with AI algorithms happen in the second phase. Most of the time is usually spent on feature engineering and optimization of machine learning hyperparameters.
  3. Deployment phase – In the final phase, codes of the AI prototype are optimized and deployed into the production environment. Depending on the situation, the final AI program is used as a stand-alone module or integrated into the streaming pipelines and ETL procedures (ETL – Extract, Transform, Load).
pbi.ai approach to project cycle

Real case scenario

Now that you know the basics, you might be wondering how this works in the real world. Imagine a fictional company named Fictional Online Fashion Store which would like to personalize its website content in real time and offer customers relevant products. Now let’s see how to approach the project from the management point of view.

Discovery phase

The essential part of each AI project is understanding the problem. We’ve seen many companies underestimate this and dive head-first into coding complex mathematical models and torturing data to achieve performance slightly better than a random guess. Instead, the most valuable activity at the start of the project is brainstorming and thinking about knowledge representation.

One of the most important tasks is to create an ontology. Ontology is a description of entities and their relations. In our case, this could mean performing a revision of the product tagging, defining custom variables for online tracking and creating a nomenclature for sequences of customer online actions.

Another part of this phase is crafting a data strategy. We need to analyze available data sources and decide whether we are able to get sufficient information about online customer behavior and the data context. In terms of content personalization, it would be good to check the availability and consistency of the historical online events data. It would also be useful to perform a basic correlation and clustering analysis to get initial insights. This will help determine the most promising direction for the next steps and define baseline goals. A well-defined goal in our case could be, for example, to increase the number of product detail views in the target customer segment, during a particular period of time, in the selected location and validated through A/B testing.


How to deal with data selection and knowledge representation in AI projects? You can find a few tips in the third article in our “Unlocking AI potential in e-commerce and online business” series!


Prototyping phase

Once an overall strategy is crafted, we can proceed to the next phase. Prototyping is a repetitive hypothesis-driven research process focused on finding the best performing combination of mathematical models and features.

In the case of our Fictional Online Fashion Store and the website content customization, we will probably dig into the transaction history. Good features enable a simple and robust AI model to be developed. A simple model based on good features always outperforms a complex model based on bad features. It is also much better in terms of future maintenance and adjustments.

In our case, we need to find patterns in the online shopping behavior and to identify customer and product attributes linked to the buying decision process. Besides features for our model, we will get valuable insights for other marketing tasks like session scoring or lead scoring.

When dealing with online data, make sure its context is examined as well. Common external factors include weather, what competitors offer or trends among celebrities. All these aspects may influence the customer in a particular season or time period and cause anomalies in the data. If we manage to represent this information in our feature vector, we are on the right track to a machine learning model with consistent performance.

Deployment phase

The final phase covers implementation of the AI prototype into the production environment and its validation on real data. AI models, for the most part, work as stand-alone modules. Thus, the deployment phase is usually about API and a database scheme to store the model values.

Another task typical for this phase is the development of ETL pipelines to clean data and prepare features for machine learning. It is useful to integrate some logging and define inputs for reporting so that performance can be measured and recorded continuously.

Once everything is ready, we can test the model, check whether it met the goals and decide on the next steps. Make sure you perform A/B testing from time to time to check the model performance. The external context should be reviewed on a regular basis as well in order to identify new factors which influence customers.

I know how to manage the AI project, what now?

Are you interested in learning more about AI and how it can benefit your business? Now that you know the basics of AI project management, you might be interested in how to select the right data sources and machine learning algorithms for particular use cases. We will cover all these aspects in our series “Unlocking AI potential in e-commerce and online business”. Stay tuned!

Subscribe

Please select all the ways you would like to hear from us:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.

Unlocking AI potential in e-commerce and online business – basic concepts

Welcome! Our “Unlocking AI potential in e-commerce and online business” series aims to provide basic guidance on applying AI (Artificial Intelligence) in businesses which use the internet as a primary source for delivering value to their customers. In this article, we focus on explaining some terms and buzzwords which are often used in connection with AI. Enjoy your reading!

How can we profit from AI?

This is a question many CEOs and CMOs are asking us. No wonder, according to a recent article published by Google:

  • 85% of executives believe AI will allow their companies to obtain or sustain a competitive advantage
  • 66% of marketing leaders agree automation and machine learning will enable their team to focus more on strategic marketing activities

How to successfully execute AI-related projects? You can find a few tips in the second article in our “Unlocking AI potential in e-commerce and online business” series!


AI has been around for 60 years and evolved from simple rule-based systems into advanced mathematical models, capable of adapting to changing conditions in a real environment. Powerful hardware in combination with data availability enables machine learning solutions to be developed for almost every e-commerce and online business. Nevertheless, there are several questions each executive should ask before starting an AI-oriented project:

  • What exactly does AI mean in the context of our business?
  • Which phases will the AI project have and what kind of preparation is needed?
  • Should we build an in-house team? What tasks should we outsource?
  • How do we measure progress?
  • Will the investment into AI pay off?

The answers to these questions leads to a precise plan of how to get value from AI and machine learning; we will help you get them.

Is AI just a fancy name for statistics?

Is AI just a fancy name for statistics?

One issue with Artificial Intelligence is that there is no exact definition of what it really stands for. The most common approach is to relate AI to human behavior:

Artificial Intelligence is the science of making machines do things that would require intelligence if done by men.” (Marvin Minsky, 1967)

But here comes the problem – is human behavior always intelligent? Can we expect “rational” behavior from our customers when developing a predictive machine learning solution? The most important part of each AI project is the initial business analysis which should reveal general aspects of the problem we are trying to solve. We want experiments and simulation to be as realistic as possible, otherwise, we are at risk of developing an inconsistent mathematical model with poor performance in a real environment. The amount of customer irrationality and the level of abstraction of the business case indicates the requirements on the solution.

Human behavior vs. intelligent behavior

Based on the complexity, we can split AI projects into four categories:

  1. Rule-based systems without internal representation of the real world: simple IF-THEN rules; e.g. an e-mailing program which sends messages based on the day of the week and the customer demographic characteristics
  2. Rule-based systems with internal representation of the real world: similar to the previous one, but enhanced by definition of states and relations between the entities in the environment; e.g. a recommendation engine recommending products to the customer based on his or her recent online journey
  3. Machine learning systems without continuous learning: a mathematical model is trained on historical or prepared training data; no further learning is done after deployment; e.g. a random forest for churn prediction trained on the historical behavioral data
  4. Machine learning systems with continuous learning: a mathematical model is updated continuously, using recent or real-time data; e.g. a dynamic customer segmentation used by social media companies

Therefore, as a scientific discipline, AI covers everything from simple decision trees to advanced mathematical models.

Sometimes people ask us “Hey, can we borrow your AI for a while and test it?” – we should point out that AI is much more like an approach to problem-solving instead of one particular computer program or algorithm. From a philosophical point of view, we can define two types of AI:

  • Weak AI: a simulation of human decision making process using mathematical models and available data – this basically covers all instances and implementations of AI nowadays
  • Strong AI: a machine or algorithm which realizes its own existence, has its own desires and goals and can reproduce itself (make another machine or algorithm with similar properties) – this is the subject of philosophical and scientific debates and the inspiration for Sci-Fi books and movies, nevertheless, we will probably not see anything like this in the near future

How to deal with data selection and knowledge representation in AI projects? You can find a few tips in the third article in our “Unlocking AI potential in e-commerce and online business” series!


How to leverage AI in e-commerce?

There are many drivers of success in e-commerce and internet companies. Based on data sources and the nature of the AI solution we can distinguish two main directions:

  1. Improvement of customer satisfaction
  2. Optimization of internal processes

1. Improvement of customer satisfaction

A happier customer leads to bigger profits. Better personalization and customer support leads to a happier customer. AI could be a real game changer in these areas.

Marketing persona
Marketing persona

The most commonly known type of solution is probably a product recommendation engine which chooses products to show to the customer based on their transaction and browsing history. In some cases, that alone can boost online sales by tens of percent. Besides the product recommendation, almost every piece of online communication, including banners, navigation and messages, could be personalized in real time using the right data.

Advanced algorithms can dynamically define marketing personas or identify the ones you have already created. It is much easier to increase sales when you know which prospects are ready to buy. Customer segmentation is a big deal and can be a basis for the whole marketing strategy.

2. Optimization of internal processes

Another way to leverage AI in e-commerce is to use NLP techniques (Natural Language Processing) for semantic modeling and key-word optimization. This will help you create product tags and categories automatically. Many companies are doing this manually, employing several people to read product descriptions each time a new product comes out. This is not just time consuming, it also results in the creation of lots of synonyms among tags and keywords, which makes further semantic analysis almost impossible and reduces performance of the search engine. With NLP, you can standardize the whole process, save time and money and improve the product search for your customers.

NLP techniques - word embedding
NLP techniques – word embedding

Many E-commerce companies use a BI (Business Intelligence) solution to support their strategic decisions. If you are one of them, it may interest you that more and more manual BI related tasks are being replaced by machine learning driven automation. In fact, it is quite easy to delegate simple BI tasks to AI algorithms so that your experts can focus on the tough ones. Typical areas for this kind of optimization are pricing, buying, logistics and warehousing. You can also use smart algorithms as an alarm system to detect unusual development among your data, either for discovering possible threats or interesting business opportunities.

Should we implement AI in our company?

If you want to benefit from AI, you need a few things; proper understanding of an AI project life cycle, the right data and the right people. We will cover all these aspects in our series “Unlocking AI potential in e-commerce and online business”. Stay tuned!

Subscribe

Please select all the ways you would like to hear from us:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.