Thursday, August 27, 2015

UPSERT in Amazon Redshift

Introduction:

I am writing this post as this methodology is not known to many.

What is an UPSERT?

UPSERT is a combination of both INSERT and UPDATE, which insert data if the same data do not exist in the table and update data if the same data exist in the table.

Does AMAZON REDSHIFT support UPSERT?

The answer is NO. But we can achieve the same by using the following methodology.

Suppose I have a table named, "TableA", in which the data need to be inserted and updated, and we have another table "TableATemp", which contains the data that need to be inserted and updated.

Say TableA has below data.


EmpId
EmpName
1
AAA
2
BBB
3
CCC
4
DDD
5
EEE

and TableATemp has below data


EmpId
EmpName
4
DDDD
5
EEEE
6
FFF
7
GGG






The data that need to be inserted into TableA is 


EmpId
EmpName
6
FFF
7
GGG





and the data that need to be updated into TableA is 



EmpId
EmpName
4
DDDD
5
EEEE





How to achieve this in Amazon Redshift?

As a first step, update the existing data into TableA using below query

UPDATE TableA
   SET a.EmpId = temp.EmpId,
       a.EmpName = temp.EmpName
FROM TableATemp temp
  JOIN TableA a ON temp.EmpId = a.EmpId;


 Now we are going to INSERT the new data into TableA using below query

INSERT INTO TableA
SELECT *
FROM TableATemp temp
  LEFT JOIN TableA a ON temp.EmpId = a.EmpId
WHERE a.EmpId IS NULL;


 These two queries would help to achieve the UPSERT in Amazon Redshift.