Top 8 Tips for picking the right AWS Storage Option

For organizations making a move to the AWS cloud there are always a lot of questions regarding the storage solution that they should use

As with most services offered by AWS, there are plenty of choices to select from. The selection boils down to a couple of criteria:

  1. What do you need the storage for
  2. How often do you need to access it
  3. What is the immediacy of access and
  4. Of course – Cost!

The main options to choose from as of 2018 are:

  1. S3 – the primary file/object based storage platform. This is the primary offering from AWS which we will cover in more detail.
  2. EFS – elastic file system which is a network attached storage.
  3. Glacier – used for data archival  
  4. Snowball – way to store day without the need for network. Physical disk replication
  5. Storage Gateway – virtual appliances for data replication

Let’s get into a little more detail on each of them….

S3

If regular flat file or object based storage is what you are looking for, then S3 is the right option to go with. This is a bucket based storage with unlimited capacity where you can store files from 0-5TB of size. Data stored here is secure, durable and highly scalable . S3 uses simple web service interfaces to store and retrieve unlimited amount of data from all over the web. Its built to achieve 99.99% availability and 99.999999999% durability.

Within S3 AWS offers different Storage Tiers

  1. S3 Standard – 99.99% availability and 99.999999999% durability – redundant across multiple devices can sustain loss of 2 facilities
  2. S3- Infrequent Access. As the name suggests this storage is suitable for files that are not accessed on a regular basis.  S3 Infrequent Access has a lower fee for data storage but users are charged for data retrieval. Data can be accessed rapidly
  3. S3 – Reduced Redundancy Stroage or recently renamed as One Zone. The content here can be regenerated if necessary. This solution is ideal for things like thumbnails of images that can be regenerated if required. This solution only offers 99.99% durability and availability of one year
  4. Glacier – This solution is mainly meant for archival of data and not for data that is used regularly. This is the cheapest storage option but the data takes around 3-5 hours to restore.

Elastic File System

Also called Elastic block storage. These can be thought as disks in the cloud that you can attach to your EC2 instance. These are Storage volumes on the cloud that can be used as a file system or for a DB. They can also in some cases be used as a root volume. There are 5 types of EFS systems:

  1. General Purpose SSD – GP2. General purpose –  This is the most generic variety for low intensity I/O.
  2. Provisioned IOPS SSD – Designed for I/O intensive applications like large RDMS or NoSQL DB etc. Whenever the need is for greater than 10,000 IOPS.
  3. Throughput Optimized HDD (ST1) – This is the magnetic solution for high throughput. This cannot be used as a boot volume. Can be used for Big Data, Data Warehouse, log processing etc
  4. Cold HDD – Basically a magnetic File server. Its the lowest cost for infrequently accessed workloads and this also can’t be a boot volume
  5. Magnetic Standard – Lowest cost per GB that is bootable. This is for Infrequently accessed data where low cost is priority. Ideal for Test and Development environments.

Snowball

Snowballs are appliances that are sent to customers who can use them to transfer large amounts of data to the appliance by connecting it locally. The devices are then shipped to AWS where the data is transferred to the AWS infrastructure. It uses multiple layers of security including 256 bit encryption and the appliances are tamper proof.

There are three flavors or Snowballs:

  1. Regular Snowball – this used to transfer 80 TB or information per Snowball.  They cost One fifth the cost of using network based AWS storage. This is a good solution to store large amounts of data without the use of network making it more secure.
  2. Snowball edge – These are like datacenters in a box. These appliances come with up to 100TB of storage and also includes Compute power. These are used as mobile data centers in places like airplanes to capture and process large amounts of information without the need for a network.
  3. Snowmobiles – These are the largest option in this group and can store up to100PB per snowmobile. These are nothing but mobile data centers which come in 45 feet long shipping containers! These are mainly used for a complete data center migration.

Storage Gateway

Storage Gateways connects an on premise software appliance with cloud based storage. So your local IT infrastructure is directly connected to AWS storage.

The software appliance can be downloaded as a VM image to be installed at the hosts datacenter.  It supports VMWare ESXi or Microsoft Hyper-V.

There are 3 types of Storage gateways

  1. File Gateways (NFS) – These are for flat files stored on S3 – once transferred they can be managed as regular S3 objects with bucket policies including versioning, lifecycle management, cross region replication
  2. Volume gateways – these are virtual hard disks for block storage and are best used for Database storage like SQL server etc. Can be asynchronously backed up for “point of time” snapshots of your volumes. Snapshots are incremental and are compressed. These are further sub categorized as Store Volumes and Cached volumes.  
    1. Store volumes – data is stored on site and all your primary data is backed up to Amazon S3 in form of Amazon Elastic Block Store(EBS). This is used for low latency requirements.
    2. Cached volumes – all data that are stored on premises is moved to S3. Only limited cached data stays on premises. It reduces need of local storage. Can be used to create storage volumes of 32 TB.
  3. Tape Gateway – This is a durable and affordable solution for data archival to a virtual tape in the AWS cloud using a VTL interface. This is supported by Netbackup, Backup Exec, Veeam etc and most existing software backup solutions being used in the market today.

So, as mentioned earlier, there are lots of options to choose from. Based on your needs of size, throughput, accessibility, scale and business requirements you should be able to narrow down one of the options presented above.

Hope this helps you choosing the right AWS cloud storage solution. We can work with you to further help with your decision making process.

Please reach out to us at [email protected] for further enquiries.  

Dashboard Designs Principles using Jaspersoft

Jaspersoft is gaining ground rapidly and as users get accustomed to using Jaspersoft on a daily basis, the problem of designing optimal dashboards and/or visualizations becomes urgent.

Having designed dashboards and other BI artifacts for a number of years, I have come to adopt a few simple fundamental principles that have helped me a great deal.

The five core principles are described below:

  1. Data complexity: Generally it is important to identify the complexity of the data at the very beginning. The complexity of data is usually reliant on the source of record system as well as the use cases attached with the data. As an example, an accountant will be able to understand accounting data (and KPIs) a lot easily than an average joe. So if you are designing a dashboard for data sourced from accounting system, it is better to “simplify” the data for general consumption based on the user groups. This directly gets us to principle#2…
  2. User Expertise: You should make user expertise with the data the best evaluator for your dashboard design. I have often found, that depending on the end user expertise, sometimes even a simple combo chart with to Y axis is difficult to read for some users. The user expertise problem is sometimes multiplied by the volume of data and the refresh frequency, which gets us to principle#3…
  3. Data Refresh: Providing timestamp context to users as you design the dashboards is fairly important. Most organizations would like to see data refreshed in real time or near real time BUT a key consideration is to determine WHO is monitoring the data refreshes and towards what end?
  4. Screen Resolution: Screen size and resolution should play a critical role in your consideration as well. I have seen requirements from customers where dashboards needed to be part of shop floors, manufacturing plants, retail space etc. Clearly having 20 inch monitors would not work for these venues. Having access to more “real estate” makes our job of designing dashboards a little bit easier.
  5. Dashboard Delivery: Knowing the technology that you have to use for dashboard delivery is also important. Some technologies make it easier to distribute dashboards on mobile devices vs others that more geared towards desktop delivery.

Hope this provides you with a good starting point. Do not hesitate to reach out to us [email protected] if you have further questions.