Table Partition Split partition part 5

Advertisements

Split

When we have data stored in wrong partition we need to split it to the right partition. This is usually comes in last partition in right range if we forgot to add next partition for the given boundary.

We can do this in two ways 1) with data split and 2) without data split.

Method 1 is easy, we can do this for small tables only, since Split will grow the log file very large.

Method 2 is proper one this will not grow the log file bigger for this we need to move(Switch) data from partition to another table then create split and we need t o move (Switch) data back in the data to right or correct newly created partition numbers.

Example: 2024 & 2025 are stored together. We need to create two staging table and then insert into both tables based on select from $partition with condition. Truncate the partition and then Switch to current partition and create next partition with split the range then switch in.

To use split we need both of the following 2 commands and once we have given next used file group. That will create a new empty partition and we can specify the split range the data will start use the partition with that range.

How do you create new empty partition?

By using Split option you can create new empty partition

MSDN: Adds a filegroup to a partition scheme or alters the designation of the NEXT USED filegroup for the partition scheme.

Alter partition scheme PS_myPartitionScheme next used [FG_2025_01_01];

MSDN: Alters a partition function by splitting or merging its boundary values. Running an ALTER PARTITION FUNCTION statement can split one table or index partition that uses the partition function into two partitions. The statement can also merge two partitions into one partition.

alter partition function PF_myDateRange() split range (‘2025-01-01 00:00:00.000’)

MSDN:

SPLIT RANGE ( boundary_value )

Adds one partition to the partition function. boundary_value determines the range of the new partition, and must differ from the existing boundary ranges of the partition function. Based on boundary_value, the database engine splits one of the existing ranges into two. Of these two ranges, the one with the new boundary_value is the new partition.

we should split with correct boundary.

Splitting a non-empty partition is an expensive operation, requiring about 4 times the logging compared to DML. Checkout the below links.

Solution.
Make your partition empty before split. That is switch the data to a staging table and split the range then switch data back in.

Test Code:

Use the same script used in part 1 for creating database and table.

--============
-- Split and Merge
--============


-- Split method 1 

-- you can split with data (Non empty partition). This is easy method can only run for small table, say less than 5 GB.

/*
-- we have to create new split partition with their boundary from 06 to 11th month
Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-06-01 00:00:00.0000000')

Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-07-01 00:00:00.0000000')

Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-08-01 00:00:00.0000000')

Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-09-01 00:00:00.0000000')

Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-10-01 00:00:00.0000000')

Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-11-01 00:00:00.0000000')

*/

-- Split method 2

/*

NOTE: 
We cannot move data to a backup partition table for split , since it will use the same partition function and scheme in the new backup partition table will split automatically

we cannot move all the data into a non partition table using Switch option, since we cannot create a CHECK constraint for max date.

Here we have only option is to create new parition table each respective month and move data using insert into and empty current partition and create split and finally switch in to current partition table.


*/

-- This is proper method and this will not grow the log file bigger. But here we need to do lot of tasks.


--select * from [tbl_partition_test]

-- Load some data into last partition (after '2024-01-05 00:00:00.0000000' ) including next month data then split it
-- in real case mostly we forgot to ad partition for upcoming data years, months or days whatever partition we used.
-- For those cases we need to split the boundary and create partition and merge it

-- Here in this example data may and june both are stored in the same partition 6
-- we need to move to right partition

-- create one table for month may

CREATE TABLE tbl_Staging_05 (
ID Int identity NOT NULL,
Date_Col datetime2(7) NOT NULL,
 CONSTRAINT [PK_C1810429_tbl_Staging_05] PRIMARY KEY CLUSTERED 
(
    [ID] ASC,
    [Date_Col] ASC
)
) ON [PRIMARY];


CREATE UNIQUE CLUSTERED INDEX [PK_C1810429_tbl_Staging_05] ON [tbl_Staging_05]
(
    [ID] ASC,
    [Date_Col] ASC
) 
WITH (DROP_EXISTING=ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)
ON PS_myPartitionScheme([Date_Col]);
GO

-- create one table for month june
CREATE TABLE tbl_Staging_06 (
ID Int identity NOT NULL,
Date_Col datetime2(7) NOT NULL,
 CONSTRAINT [PK_C1810429_tbl_Staging_06] PRIMARY KEY CLUSTERED 
(
    [ID] ASC,
    [Date_Col] ASC
)
) ON [PRIMARY];


CREATE UNIQUE CLUSTERED INDEX [PK_C1810429_tbl_Staging_06] ON [tbl_Staging_06]
(
    [ID] ASC,
    [Date_Col] ASC
) 
WITH (DROP_EXISTING=ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)
ON PS_myPartitionScheme([Date_Col]);
GO


---Say example 7th mnoth is coming and i wanted to keep 7th month in seperate partition and 8th, 9th month so on... 

--insert into [tbl_partition_test] ([Date_Col]) values ('2024-01-07 00:00:00.0000000')

---- create pre request check constraint with exact right range boundary

---1
alter table tbl_Staging_05 with check add constraint ch_tbl_Staging_05_min
check ([Date_Col] is not null and [Date_Col] >='2024-05-01 00:00:00.0000000')

alter table tbl_Staging_05 with check add constraint ch_tbl_Staging_05_max
check ([Date_Col] is not null and [Date_Col] <'2024-06-01 00:00:00.0000000')


----2

alter table tbl_Staging_06 with check add constraint ch_tbl_Staging_06_min
check ([Date_Col] is not null and [Date_Col] >='2024-06-01 00:00:00.0000000')

alter table tbl_Staging_06 with check add constraint ch_tbl_Staging_06_max
check ([Date_Col] is not null and [Date_Col] <'2024-07-01 00:00:00.0000000')


-- insert the data out from the partition to staging table and then truncate the partition and load again
-- This is time and resource consumming task that is the reason we need to create more extra partition and set up alert for upcoming partition months

-- insert may month data to tbl_Staging_05
SET IDENTITY_INSERT tbl_Staging_05 ON

insert into tbl_Staging_05 (ID,Date_Col)
select ID, Date_Col
from [tbl_partition_test]
--where $partition.PF_myDateRange([Date_Col]) =6 and 
where [Date_Col] between '2024-05-01 00:00:00.0000000' and '2024-05-31 23:59:59.9999999' order by [Date_Col]

SET IDENTITY_INSERT tbl_Staging_05 OFF
-------

-- insert may june data to tbl_Staging_05

SET IDENTITY_INSERT tbl_Staging_06 ON

insert into tbl_Staging_06 (ID,Date_Col)
select ID, Date_Col
from [tbl_partition_test]
--where $partition.PF_myDateRange([Date_Col]) =6 and 
where [Date_Col] between '2024-06-01 00:00:00.0000000' and '2024-06-01 23:59:59.9999999' order by [Date_Col]

SET IDENTITY_INSERT tbl_Staging_06 OFF

-- check how many data stored here
select * from tbl_Staging_05
select * from tbl_Staging_06



-- check how many data available in partition 6 before truncate
select [Date_Col] , $partition.PF_myDateRange([Date_Col]) as [partition]
from tbl_partition_test 
where $partition.PF_myDateRange([Date_Col]) in (6)

-- here in this example I just wanted to get may and june data. If we have data after that, we need to create seperate table and move those data into it
-- I am wantedly skipping the month JUL, Aug, SEP, OCT,NOV, DEC month. So we loose this all month data

TRUNCATE TABLE [tbl_partition_test] WITH (PARTITIONS (6));


-- create new partition by spliting for May, june . Since upto april we have partition. After April all stored in last partition. 
Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-06-01 00:00:00.0000000')

-- create partition for july
Alter partition scheme PS_myPartitionScheme next used [Primary];
alter partition function PF_myDateRange() split range ('2024-07-01 00:00:00.0000000')

--Switch data from staging to partition 

select * from tbl_Staging_05
select * from tbl_Staging_06

ALTER TABLE tbl_Staging_05 SWITCH PARTITION 6 TO [tbl_partition_test] PARTITION 6; 
ALTER TABLE tbl_Staging_06 SWITCH PARTITION 7 TO [tbl_partition_test] PARTITION 7; 


select * from tbl_Staging_05
select * from tbl_Staging_06


-- verify the data moved in
select [Date_Col] , $partition.PF_myDateRange([Date_Col]) as [partition]
from tbl_partition_test 
where $partition.PF_myDateRange([Date_Col]) in (6)


select [Date_Col] , $partition.PF_myDateRange([Date_Col]) as [partition]
from tbl_partition_test 
where $partition.PF_myDateRange([Date_Col]) in (7)

-- if we have data the july data we could move it here as well
select [Date_Col] , $partition.PF_myDateRange([Date_Col]) as [partition]
from tbl_partition_test 
where $partition.PF_myDateRange([Date_Col]) in (8,9,10,11,12)

Table Partition Split partition part 5

Muthukkumaran

Leave a Reply Cancel reply