Introduction:
I am writing this post as this methodology is not known to many.
What is an UPSERT?
UPSERT is a combination of both INSERT and UPDATE, which insert data if the same data do not exist in the table and update data if the same data exist in the table.
Does AMAZON REDSHIFT support UPSERT?
The answer is NO. But we can achieve the same by using the following methodology.
Suppose I have a table named, "TableA", in which the data need to be inserted and updated, and we have another table "TableATemp", which contains the data that need to be inserted and updated.
Say TableA has below data.
EmpId
|
EmpName
|
1
|
AAA
|
2
|
BBB
|
3
|
CCC
|
4
|
DDD
|
5
|
EEE
|
and TableATemp has below data
EmpId
|
EmpName
|
4
|
DDDD
|
5
|
EEEE
|
6
|
FFF
|
7
|
GGG
|
The data that need to be inserted into TableA is
EmpId
|
EmpName
|
6
|
FFF
|
7
|
GGG
|
and the data that need to be updated into TableA is
EmpId
|
EmpName
|
4
|
DDDD
|
5
|
EEEE
|
How to achieve this in Amazon Redshift?
As a first step, update the existing data into TableA using below query
SET a.EmpId = temp.EmpId,
a.EmpName = temp.EmpName
FROM TableATemp temp
JOIN TableA a ON temp.EmpId = a.EmpId;
Now we are going to INSERT the new data into TableA using below query
SELECT *
FROM TableATemp temp
LEFT JOIN TableA a ON temp.EmpId = a.EmpId
WHERE a.EmpId IS NULL;
These two queries would help to achieve the UPSERT in Amazon Redshift.