Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud” (AWS, no date). Redshift is cheap, and it is possible to start with a rate of $0.25 per hour, and it does not require any upfront payments or any commitments. Redshift can scale to a petabyte or more for $1000 per terabyte per year (AWS, no date).

Redshift can be configured as follows: It can run on one node of 160gb, or it can run on a multi-node configuration. It has a leader node that handles the connections from the client and receives petitions to execute queries. Also, it contains different compute nodes that are used to store data and process complex queries. It is possible to have up to 128 compute nodes.

Additionally, Redshift uses advanced compression. The data that is store in columns can be compressed much more than row-based data stores because comparable data is collected sequentially on the disk. Redshift applies various compression procedures and can usually deliver significant compression compared to conventional relational data stores. Moreover, it does not require indexes or materialized views, so it uses less space than other relational database systems. While storing data into an empty table, Redshift automatically inspects the data and decides the most suitable compression scheme.

Redshift has another outstanding characteristic called massively parallel processing (MPP). Redshift automatically allocates data and query load across all nodes. The system makes it extremely simple to add nodes to the data warehouse to maintain performance on query execution while the data grows (AWS, no date)

In terms of backups, they are enabled by default with a day retention period, and the maximum is 35 days. Redshift always attempts to maintain at least three copies of the data: the original and replica on the compute node plus a backup in AWS S3. Furthermore, Redshift can also asynchronously replicate snapshots to S3 in a different AWS region for disaster recovery.

Regarding security, Redshift encrypts data at rest using AES-256 encryption, and by default, it takes care of the Key management. Moreover, it is possible to manage the keys through HSM service or using the Key Management service.

Lastly, Redshift is currently available to be used in one Availability Zone (AZ) only. It is not possible at the moment to use Multi-AZ, but as an alternative, snapshots can be restored to a different AZ in case there is an outage (AWS, no date).

According to Kamp (2017), Redshift has several use cases:

  • Collect data through traditional Data warehouses. Many companies use Redshift for their traditional data warehouse capabilities in the cloud, such as business reporting and complex queries.
  • Store and process data with log analysis. As Redshift is powerful and cheap companies are using it to analyze machine-generated logs. Lyft, as an example, is using it to analyze pricing and product development.
  • Analyze data for business applications. Accenture uses Redshift to provide Analytics as a Service to other business that does not have these capabilities.
  • Time-sensitive data reporting for mission-critical workloads. Data that is stored in Redshift can be critical for business. In the case of NASDAQ, for example, they deliver reporting daily and cannot be incorrect.

References

AWS (no date) What is Amazon Redshift? [Online] Available at: https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html (Accessed 8 April 2020)

AWS (no date) Amazon Redshift FAQs [Online] Available at: https://aws.amazon.com/redshift/faqs/ (Accessed 8 April 2020)

Kamp, L (2017) Amazon Redshift Use Cases [Online] Available at: https://www.intermix.io/blog/amazon-redshift-use-cases/ (Accessed 8 April 2020)

Leave a Reply

Your email address will not be published. Required fields are marked *