Machine learning is constantly evolving and plays a huge role in the global economy, as it allows for quick and automatic analysis of large portions of data.
In order to bring machine learning technology even closer to programmers, Amazon currently offers over 10 machine learning and artificial intelligence services on its AWS platform. With these services, you can start building models in a simple way, which can raise your business to the next level.
Most of these services are fully managed, which means that in order to use them, you do not need any machine learning experience as these tools leverage pre-trained models for working with data. Depending on your business problem, you can choose from pre-trained ML services in areas like computer vision, natural language processing, recommendations, and forecasting. The graph below shows a Machine Learning solution workflow, along with AWS tools you can use at each of the stages.
How to apply Machine Learning to business with AWS
Firstly: Collecting the Data
The most important element in creating ML solutions is data. There are 3 types of data: structured, semi-structured, and unstructured.
- Structured data’s elements are addressable and can be stored in a relational database. This type of data has a predefined schema. An example of structured data is a relational database with numeric and string (text) data.
- Semi-Structured datasets do not reside in relational databases, but they nonetheless have some predefined elements (schema) that make them easier to analyze. Examples of semi-structured data file types are XML, HTML, RDF or JSON.
- Unstructured data is everything else. This data type doesn’t have a predefined structure and they are usually stored as a set of files. The most popular unstructured data examples are text documents, photos, video, and audio files and application logs.
Data Loading – what is Kinesis?
AWS Kinesis service ingests data that can be generated continuously from various sources, e.g. web and mobile applications. It is a real-time data streaming service that can very quickly capture gigabytes of data. Kinesis offers the following tools:
- Kinesis Video Streaming – a tool that can help you stream video from devices to AWS
- Kinesis Data Streaming – a tool that can help you collect data such as IT logs, website clicks or financial transactions
- Kinesis Data Firehose – a tool to load streamed data into data stores (e.g. S3, Redshift) or analytics tools
- Kinesis Data Analytics – a tool that processes streamed data in real-time with SQL or Java
Data loading – what is Glue?
Another AWS service that can help with data loading is Glue which is managed by Apache Spark. It is an extract, transform, and load tool (ETL) which can be used to prepare data before it is used for analytics. Glue can work with both structured and semi-structured data.
Glue’s elements are Data Catalog, ETL engine, and a scheduler. The Glue Data Catalog is the most important part of the tool. It saves the metadata about the given data, automatically discovered by crawlers that go through the data sources and detect their schema.
ETL engines can generate Python and Scala code for use in the ETL process for non-programming users. It can also process data with a code provided by the user. The scheduler can monitor jobs, run tasks, and trigger them based on some events (e.g. at a specific time every Monday, or when another task completes or fails).
Secondly: Choosing the Right Machine Learning Tools
After we have collected the data we need, we can start building our ML solutions. AWS offers a few Machine Learning Tools that can process data of various types.
Let us now take a look at each of these tools, and present their main possible areas of application in business.
What is SageMaker?
SageMaker is most useful for machine learning developers and data scientists. This service is a complete solution that helps take machine learning models from concept to production with minimal effort. Amazon SageMaker has a rich set of tools (Ground Truth, Notebooks, Experiments, Debugger, Model Monitor, Neo) that can help in labeling data, building, optimizing, training, testing and deploying models.
Finding the right algorithm manually for a given problem often requires hours of training and testing. SageMaker has an AutoPilot option, which uses 50 different pre-trained ML models to automatically find the best ML model for the case at hand. Developers can use this solution to quickly find a baseline model.
What is Personalize?
Personalize is a machine learning service that helps to build recommendation systems. Personalize can process activity streams from applications, e.g. clicks, page views, purchases, and use them to create personalized recommendations. You can also use additional information about your users, such as age, or geographic location. Showing recommendation results in your application can be simplified with short API calls. Machine learning technology in Personalize has been improved for years of use by Amazon.com.
What is Comprehend?
Comprehend is a Natural Language Processing (NLP) service that uses machine learning to extract valuable insights from unstructured textual data. This service applies sentiment analysis, part-of-speech extraction, and tokenization to detect key features of the text. Comprehend can be helpful in understanding how positive or negative a given text is.
Comprehend has an additional tool: Amazon Comprehend Medical, specifically for the medical industry. Amazon Comprehend Medical can analyze medical documentation (like medical records of patients, clinical notes) and extract information about medications, doses and frequencies. Comprehend is a fully managed service.
What is Forecast?
Forecast uses machine learning to build time-series prediction models. It can combine historical time series data with additional variables (which you believe may impact forecasts) to build predictive models. This Amazon solution applies for predicting values like stock prices or customer product demand. Forecast is also a fully managed service and can be scaled to business needs.
What is Lex?
Lex uses automatic speech recognition (ASR) to convert speech to text, and Natural Language Understanding (NLU) to recognize the intent of the text. This solution enables the user to build conversational bots.
For example, you can use Lex as a replacement for manual customer support that will automatically answer customer queries. Amazon Lex uses the same deep learning technology as Amazon Alexa (Amazon’s virtual assistant AI).
What is Polly?
Polly is a cloud service that uses deep learning algorithms to convert text to lifelike speech. It currently supports 60 male and female voices across 29 languages, including Japanese, Chinese, Korean, and Arabic. Polly can also handle time, dates, units, fractions, and abbreviations. This solution allows the user to create applications that can talk.
What is Fraud Detector?
Fraud Detector is an AWS service that can help identify fraudulent online activities, such as payment frauds or fake accounts. This service is fully managed so a fraud detection model can be created with just a few clicks.
What is Textract?
Textract is a service that can automatically read data from scanned documents. Textract can process millions of pages in a matter of hours and can help in automating document workflows. This service is useful in processing documents like loan applications or medical documentation.
What is Translate?
Translate is an AWS machine learning serviceable to perform language-to-language text translation. It uses deep learning models to deliver more accurate and more natural sounding translation, compared to traditional statistical algorithms. Translate supports 54 languages (including e.g. Afrikaans, Bulgarian, Estonian), and 2,804 language pairs.
What is Rekognition?
Rekognition is a computer vision service that can recognize objects, people, and text from images and movies. Rekognition is able to identify and compare faces, analyze them and identify some facial features, like mouth, nose, or eyes.
Rekognition has a module to automatically detect emotions such as happiness, sadness or surprise in facial images. It can also perform user face verification, which will confirm the user’s identity by comparing the real-time image with the stored reference image.
Thirdly: Deploying Machine Learning Solutions
The most widely used method of deploying models is SageMaker Service, which you can use in one of two ways:
- Using SageMaker Hosting Service to set up HTTPS endpoints. In this solution, clients applications send requests to HTTPS endpoints to get predictions from deployed models. To use this solution, you must provide it with your Docker image. If you need to deploy multiple models, you can also use multi-model endpoints.
- Using SageMaker Batch Transform, which helps you to get predictions for an entire dataset. To deploy a model using Batch Transform you need an S3 bucket to store the model, datasets, and predictions.
The deploying alternative is using AWS IoT Greengrass. This service extends AWS to the internet of things (IoT) devices. Using this service, devices can collect, filter, process data and they also can run Lambda functions, Docker containers and execute predictions based on ML models even without cloud connection. When connected to the internet, Greengrass synchronizes all data with cloud services.
As you can see, Amazon Web Service offers a rich set of tools that can help you to create impactful machine learning solutions for your business. With ML AWS tools you can add new features to your applications, like face detection, chatbots, speech recognition, sentiment analysis of social media content. AWS adds new ML services, based on new use cases, every few months, which makes it one of the fastest-growing platforms for creating AI solutions.