Introduction to
Amazon Web Services

Outline

  • Context
  • Overview of the different AWS services
  • The Web Console
  • Focus on EC2 & S3
  • Cost management

A bit of History…

One uppon a time…

  • 1995: first major online shop, amazon.com
  • 1996: $15,700,000,000 revenue during its 1st year
  • About 0.5% Walmart’s sales volume

Business Week cover

Scaling up soon became an issue.

From a shop to a platform

  • 2003: Specification of a novel platform for Amazon, based on Web Services
  • 2004: First AWS service launched for the public: Amazon Simple Queuing Service
  • 2006: Official lauch of AWS
  • 2010: All Amazon.com retail activities moved to AWS

timeline

AWS follows a Service Oriented Architecture(SOA) design.

AWS: a few figures

  • 9 regions, including special ones
    • 1 for the US Government in North America
    • 1 for the Chinese market in Beijing
  • Revenue: $7.88 billion (2015)
  • Profit: ≈ $1 billion
  • Growth ≈ 50%/year since 2010

AWS: services

Main AWS services

  • 55 services accessible through the Web Console
  • Stack organization: higher level services rely on low-level services

AWS in details

A resilient infrastructure

  • 9 isolated regions
  • Each region divided into availability zones
  • Availability zones inside a region are
    • Physically separate (distinct DCs)
    • Interconnected with a single network

AWS map Availability zones

Regions

  • North America
    • us-east-1 (Northern Virginia)
    • us-west-1 (Oregon)
    • us-west-2 (Northern California)
    • GovCloud (limited to the US Government and their agencies)
  • Europe
    • eu-west-1 (Ireland)
    • eu-west-2 (London)
    • eu-central-1 (Frankfurt)
  • South America
    • sa-east-1 (São Paulo)
  • Asia Pacific
    • ap-northeast-1 (Tokyo)
    • ap-northeast-2 (Seoul)
    • ap-southeast-1 (Singapore)
    • ap-southeast-2 (Sydney)
    • ap-south-1 (Mumbai)
    • China (limited to the Chinese domestic market)

Plus ≈ 40 Edge regions.

Edge locations

  • North America
    • Ashburn, VA
    • Atlanta, GA
    • Chicago, IL
    • Dallas/Fort Worth, TX
    • Hayward, CA
    • Jacksonville, FL
    • Los Angeles, CA
    • Miami, FL
    • Montreal, QC
    • Newark, NJ
    • New York, NY
    • Palo Alto, CA
    • San Jose, CA
    • Seattle, WA
    • South Bend, IN
    • St. Louis, MO
    • Toronto, ON
  • South America
    • Rio de Janeiro, Brazil
    • São Paulo, Brazil
  • Europe
    • London, England
    • Marseille, France
    • Paris, France
    • Frankfurt, Germany
    • Milan, Italy
    • Dublin, Ireland
    • Amsterdam, the Netherlands
    • Warsaw, Poland
    • Madrid, Spain
    • Stockholm, Sweden
  • Asia Pacific
    • Melbourne, Australia
    • Sydney, Australia
    • Hong Kong, China
    • Chennai, India
    • Mumbai, India
    • New Delhi, India
    • Osaka, Japan
    • Tokyo, Japan
    • Seoul, Korea
    • Manila, the Philippines
    • Singapore
    • Taipei, Taiwan

These edge locations are used by CloudFront (CDN) and Route53 (DNS) only.

Region layout

Each region is composed of multiple isolated Data Centers called Availability Zones

Example: eu-west-1 region

eu-west-1 map

Average distance between two AZs: ≈40km.

AWS: main services

  • EC2 (Elastic Compute Cloud)
    • Manage virtual appliances (VMs)
  • S3 (Simple Storage Service)
    • Manage storage (buckets)
  • EBS (Elastic Block Store)
    • Manage block storage (raw devices)
  • ELB (Elastic Load Balancing)
    • Manage load balancers
  • RDS (Relational Database Service)
    • Manage relational databases
  • DynamoDB
    • Manage NoSQL databases
  • ElastiCache
    • Manage in-memory caches

EC2

  • Manage Virtual Machines (VMs)
    • Based on the Xen hypervisor
  • Many instance types (or flavors)
    • CPU, RAM, disk, GPU, etc.
    • Pricing varies
  • VMs rely on Disk Images
    • Marketplace (predefined)
    • Custom
  • Multiple billing options
    • On-demand, Reserved, Spot
    • Pricing varies (a lot!)

EC2 terminology

EC2 instances

An EMI (Amazon Machine Image) is a snapchot of a system that is used as a base for VMs. It can be seen as an archive of an entire filesystem.

  • Each VM uses its own copy of the EMI
  • 2 types of instance:
    • Instance storage: non-persistent storage
    • EBS storage: the root filesystem is a persistant EBS volume

Instance types

Category Usage
t2 General purpose
m4 General purpose
m3 General purpose
c4 Compute intensive
c3 Compute intensive
x1 Compute & memory intensive
r3 Memory intensive
g2 GPU
i2 I/O intensive
d2 Storage

Focus: t2 instances

The cheapest category of instances.

Model vCPU Mem (GiB) Storage
t2.nano 1 0.5 EBS only
t2.micro 1 1 EBS only
t2.small 1 2 EBS only
t2.medium 2 4 EBS only
t2.large 2 8 EBS only

These are Burstable Performance Instances! (Good for unregular performance requirements.)

Focus: m4 instances

The new generation of general purpose instances.

Model vCPU Mem (GiB) SSD Storage Dedicated EBS Bandwidth (Mbps)
m4.large 2 8 EBS only 450
m4.xlarge 4 16 EBS only 750
m4.2xlarge 8 32 EBS only 1,000
m4.4xlarge 16 64 EBS only 2,000
m4.10xlarge 40 160 EBS only 4,000
m4.16xlarge 64 256 EBS only 10,000

Focus: c4 instances

The new generation of instances optimized for computation intensive.

Model vCPU Mem (GiB) SSD Storage Dedicated EBS Bandwidth (Mbps)
c4.large 2 3.75 EBS only 500
c4.xlarge 4 7.5 EBS only 750
c4.2xlarge 8 15 EBS only 1,000
c4.4xlarge 16 30 EBS only 2,000
c4.8xlarge 36 36 EBS only 4,000

Focus: r3 instances

Instances optimized for memory-intensive applications.

Model vCPU Mem (GiB) SSD Storage (GB)
r3.large 2 15.25 1 x 32
r3.xlarge 4 30.5 1 x 80
r3.2xlarge 8 61 1 x 160
r3.4xlarge 16 122 1 x 320
r3.8xlarge 36 244 1 x 320

Focus: g2 instances

Instances optimized for graphics and general purpose GPU compute applications.

Model GPUs vCPU Mem (GiB) SSD Storage (GB)
g2.2xlarge 1 8 15 1 x 60
g2.8xlarge 4 32 60 1 x 120

Focus: i2 instances

Instances optimized for storage, with high I/O performance.

Model vCPU Mem (GiB) SSD Storage (GB)
i2.xlarge 4 30.5 1 x 800
i2.2xlarge 8 61 2 x 800
i2.4xlarge 16 122 4 x 800
i2.8xlarge 32 244 8 x 800

Good for databases and clustered filesystems.

Focus: d2 instances

Dense-storage instance.

Model vCPU Mem (GiB) Storage (GB)
d2.xlarge 4 30.5 3 x 2000 HDD
d2.2xlarge 8 61 6 x 2000 HDD
d2.4xlarge 16 122 12 x 2000 HDD
d2.8xlarge 36 244 24 x 2000 HDD

Good data warehouses, parallel filesystems, Hadoop MapReduce.

EC2: 3 types of instances

…that are actually billing options:

  • On-demand instances
  • Reserved instances
  • Spot instances

For all of them, license fees can be added!

EC2: On-demand instances

The most common type of instances:

  • No long-term commitment
  • Charged by the hour
  • The most handy

EC2: Reserved instances

Pay upfront for a defined period and save up to 75%. Two modes:

  • Always-on: on 24x7, always available (and always charged)
  • Scheduled: reserved on a recurring schedule basis, accomodates important but irregular needs

EC2: Spot instances

The cheapest: save 50-90% renting unused instances for a low price. the price fluctuates according to the platform capacity and demand.

The customer fixes a bid. At any point in time:

  • If the current instance price is lower than the bid, your instance can run.
  • Otherwise, your instance cannot run, you have to wait for the price to lower or increase your bid.

Difficult to use: instances can disappear anytime due to price variations!

Well fitted for batch on Bag of Tasks (BoT) applications.

Other important AWS services

EC2: Security Groups

A software firewall for instances

  • Defines a set of firewall rules
  • By default: out traffic allowed, in traffic blocked (including SSH!)
  • Every instance is in a security group (default group)
  • Custom rules can be added

Elastic IP

Addressing: customers can rent public IPs from Amazon

Each instance has:

  • 1 private IPv4 address
  • 1 public IPv4 address

By default, the instance public IP address is dynamic (NAT).

Optionally, a static address (paid) can be assigned from customer Elastic IP pool.

IAM

Identity & Access Management

  • Roles and access management
  • User groups
  • Limited API access depending on user roles
  • Detailed billing per roles/groups
  • Fine grained access policies
    • Per role
    • Per resource

EBS

Elastic Bloc Storage

  • Offers persistent storage volumes
  • Volumes can be attached to/detached from one or more EC2 instances
  • Snapchots: create an copy of an EBS volume at a given time
  • Create new volumes from snapchots


What Identifier
Volume vol-XXXX
Snapshot snap-XXXX

ELB

Elastic Load Balancing

Balances incoming requests to a pool of EC2 instances

  • Supports TCP, HTTP(S) and SSL protocols
  • Intances can be attached/detached dynamically
  • Compatible with auto-scaling features:
    • provision/deprovision instances dynamically
    • according to pre-defined policies (response time, CPU load, etc.)

S3

Simple Storage Service

  • Object storage (up to 5 TB)
  • Each object associated to a unique key
  • Unlimited number of objects
  • Object are stored in containers call buckets
  • Each bucket is assigned to a region
  • Fine grained authentication and ACL
    • Object can be public/private

An Infrequent Access offers lower cost for data that are accessed less frequently.

Billing:

  • Put/Get requests
  • Transfered volume

Glacier

Long-term storage, archival.

  • For very infrequent access
  • Very cheap (from $0.0.7/GB/month)
  • Long restoration delays (3 to 5 hours)
  • Accessed though S3

Route 53

DNS-as-a-Service

  • Manages DNS inside and outside AWS
  • Routing policies based on
    • Latency
    • Locality
    • Weighted Round Robin
  • Compatible with EC2, EIP and ELB
  • Very low propagation latency

SQS

Simple Queuing Service

A messaging queue for web applications & services to communicate reliably.

  • Distributed design
  • Build for SOA

General principles

  • Producer/consumer paradigm
  • File queues (buffers)
  • A message: 64 KB of text
  • Delivery guarenteed

Messaging queue principle

Simple Notification Service

Notification service for applications and clients

  • Publish/subscribte (PubSub) paradigm

Principles

  • Topic: a named messaging queue
  • Push model
  • Interfaces with iOS, Android, Windows Phone, SMS, email, etc.
. SQS SNS
Consumers No subscription Mandatory subscription/confirmation
Multiplicity 1-to-1 1-to-N (Broadcast)
Com. model Pull Pull

Relational Database Service

  • DBaaS (Amazon Aurora, Oracle,MS SQL Server, PostgreSQL, MySQL, MariaDB)
  • High-availability SQL DB with replication
  • Easy scaling
  • Backup/Restore
  • Effortless
  • Import/export feature

DynamoDB

Amazon’s NoSql (key/value) database.

  • SSD based: very low latency
  • Replicated over 3 Availability Zones inside any given region
  • Strong consistency
  • Very scalable

(EMR) Elastic MapReduce

Hadoop is great (but hard to setup!) EMR is an integrated Hadoop-based MapReduce framework:

  • Put your data in S3
  • Lauch EMR:
    • EC2 cluster automatically provisioned
    • MR tasks automatically run on top of the EC2 cluster
  • Get the results back from S3
  • EC2 cluster automatically de-provisioned

More than just Hadoop: HDFS, Pig/Latin, Hive, Spark, …

Virtual Private Cloud

Simulates a private cloud with EC2 instances and VPN techniques.

  • Isolated network (software)
  • A dedicated IP pool
  • Security: limit access to S3 to a VPC
  • Allows to bind an EIP to an instance inside a VPC
  • Cost: $0.05/h/VPN access

Amazon EC2 Container

The new buzzword in Cloud Computing

  • Allows easy deployment of Docker containers in EC2 instances
  • AMI that includes the Docker daemon
  • No virtualized OS: faster deployment*
  • Same instances charge

*once the EC instances and Docker daemons are deployed

Deployment services

Elastic Beanstalk

AWS PaaS service.

Allows to deploy a complete stack…

  • Java, .NET, PHP, Node.js, Python, Ruby, Go, Docker…
  • Relies on other AWS services to deploy and maintain the platform
  • Does not deploy DB backends
  • Scenario based deployments

…without having to maintain the underlying software.

CodeCommit

Private git repo hosting.

  • No administrative duty
  • IAM takes care of users and ACL
  • Elastic: grows with your data
  • Low cost: $1/user/month
    • 10GB and 2000 queries/month

CodeDeploy

Deploy development code on EC2 instances

  • The app must be in S3 or in a git repo
  • Integrates with common development tools & platforms
    • GitHub, Jenkins, Tavis CI, CloudBees, etc.
    • Apache, Nginx, Passenger and IIS.
  • Deployment configured by recipies (declarative text)
  • Web Console, CLI and Python SDK
  • Minimizes downtime

CodePipeline

Rolling release of development code.

  • Based on various external events
  • A GUI to define deployment pipelines

CodePipeline workflow

OpsWorks

Automated infrastructure deployment based on Chef.

  • Deploy complete appliances filled with industry-standard software (HAProxy, Memcached, MySQL, etc.)
  • Compatible with your existing Chef recipies
  • Manageable with APIs

Amazon Workspaces

A virtual desktop in EC2

  • A set of desktop applications inside a web browser
  • Windows-based
  • Let you chose applications to deploy from a catalog
  • From $21/month (BYOL)

Do you want more?

  • Direct Connect: connect your network directly to Amazon
  • Cloud Front: CDN (Content Delivery Network) on the Edge
  • AWS storage Gateway: connect your local IT infrastructure to S3
  • ElastiCache: distributed in-memory cache (Redis, Memcached)
  • RedShift: managed Data Warehouse
  • Cloud Formation: Infrastructure-as-Code
  • CloudTrail: logging service for AWS API calls
  • CloudWatch: monitoring service
  • Directory Service: MS Active directory

AWS Console

https://console.aws.amazon.com

Cost

Tips and tricks

  • EC2
    • Every hour started is charged
    • Incoming traffic is free
    • Data transfers within an AZ is free
    • Free-tier on the 1st year: 730h of a t1.micro instance
    • EIP are charged even when unused!

Prefer reserved and spot instances when possible