Sunday, February 19, 2012

Execute multiple statements at once...

I am not even sure if this is possible but I want to be able to excute several statements (or SPs) at once from inside a SPs (or any other method).

What I am doing is a I am taking data from a single column, multiple rows and making it into one row (i.e. data1 + data2 + data3....) But I am doing this to a total of 2.1 million individual rows and the result will be about 204k rows.... what I have written is basically a nested loop and it works fine but very slow... slow as in it has been running for 24 hours now and it is about 60% done... I need this to finish in under 36 hours preferably...

If I could handle more than one set of data at once (there will be no duplicates) I could speed up the process by how ever many I feel like working with...

So is there a way to execute a statement (sp or function) and go to the next statement without waiting for the first to finish?

Thanks Big Time... I hope this is possible.

OH.... SQL 2000 SP4 on Windows 2003.

You could open multiple connections from the client, but something tells me that there may be a better way to process your data.

Could you post the relevent schema and code?

|||

AFAIK within SQL server 2000 you do not have such a chance. On a single CPU, I doubt the case would be different under 2005 (but not an MSSQL expert).

However "24 hours" and even "hours" for 2.1 million rows processing to create a 204K data sounded to be very slow to me. Maybe if you select the data outside of SQL server with an isolation like uncommitted, prepare outside and do a single bulk load, it'd be much faster.

How do you do this? All in a single T-SQL batch?

|||

It is all within a single tsql SP.... It is extremely slow and I can't seem to get it to use more processor or memory...

I am thinking using a DTS package and working with multiple temp tables. Then combining all the temp tables in the end... it should be quicker that way and I might be able to push my server (which is enterprise with 12GB of ram and 2 processors)...

current schema is :

Final Table = ID_NUM (Key, char(10)), data (varchar(6000))

Working table is ID_NUM, date varchar(100), importance (smallint) - key is ID_NUM and DATA

Working temp table data varchar(100), importance smallint -- no key, always less than 200 rows.
Plus one variable @.data that is a varchar(6000)

Loop 1

Grab an ID_NUM

get list of data for ID_NUM into working table

Loop 2

Loop through till all rows have been added together or it goes over 6000 characters

@.data = Data + data + data

end loop 2

insert ID_NUM, @.DATA into final table

end loop 1

That is what I am going... no cursor, just while loops...

|||

William:

Where you say, "... date varchar (100), ... " for your "Working table" do you really mean "... data varchar (100) ... "?

Dave

|||

Yes that is what I mean...

sorry... working on a million things a once and don't always proof read..

|||

It sounds like you're just trying to concatenate all of your rows in the child table into a single row in a third table. If so, the following scenario may help.

http://databases.aspfaq.com/general/how-do-i-concatenate-strings-from-a-column-into-a-single-row.html

You'll likely not get much parallelism in looping code that you desribed so if you can find a set-based method of processing then you'll be much better off.

|||

That actually looks good... the problem will be hitting the varchar limit.

I will have to see if I hit it or not...

What fun... lol...

Thanks for all the help.

|||

My apologies - I forgot you're on SQL Server 2000. The XML method of processing won't work in this situation. You still want a process where you can build the comma-separated list in a SET operation rather than one row at a time.

There's a method whereby you can assign a value to a variable in one row and use it in subsequent rows of an UPDATE statement. AFAIK, it's undocumented but it works in SQL 2005 as well so you're safe for a little while.

The pseudo-code would look something like this:

-- Place all of the data into a work table

CREATE TABLE #WorkTable (ID_NUM char(10), data varchar(6000), importance smallint, IdentCol INT IDENTITY, PRIMARY KEY(ID_NUM, IdentCol)

INSERT INTO #WorkTable (ID_NUM, data, importance)

SELECT ID_Num, data, Importance

FROM SourceTable

ORDER BY ID_NUM, Importance

-- Update work table. Each row has the data of itself + the row preceding it when there is a matching ID_NUM

DECLARE @.ID_NUM CHAR(10)

DECLARE @.data VARCHAR(6000)

UPDATE #WorkTable

SET @.data = data = CASE WHEN ID_NUM <> @.ID_NUM THEN data

ELSE @.data + data

END

,@.ID_NUM = ID_NUM

INSERT INTO FinalTable(ID_NUM, Data)

SELECT ID_NUM, MAX(DATA)

FROM #WorkTable

GROUP BY ID_NUM

In case the pseudo-code doesn't work, here's a working sample for Adventureworks database:

CREATE TABLE #JobTitles (job_id INT, job_desc VARCHAR(100), employees VARCHAR(1000), EmployeeName VARCHAR(100), PRIMARY KEY (job_id,

EmployeeName))

DECLARE @.job_id INT, @.EmployeeNames VARCHAR(1000)

INSERT INTO #JobTitles(job_id, job_desc, EmployeeName) SELECT J.job_id, J.job_desc, E.lname +', ' + E.fname AS EmployeeName FROM jobs J INNER JOIN employee E ON J.job_id = E.job_id

SET @.job_id = -1 -- Initialize this to a number that does not exist

-- in the list of values.

UPDATE #JobTitles

SET @.EmployeeNames = employees = CASE

WHEN @.job_id <> job_id THEN EmployeeName

ELSE @.EmployeeNames + '; ' + EmployeeName

END

, @.job_id = job_id

-- Grab the largest row of each grouping.

SELECT job_id, job_desc, MAX(employees) AS EmployeeList FROM #JobTitles GROUP BY job_id, job_desc ORDER BY job_id

DROP TABLE #JobTitles

|||

William:

Here is yet another example. I mocked this up with 2.1 million rows. I was able to get this to run in about 3 minutes and 20 seconds. This problem is definitely easier to solve with 2005 than 2000 and there has to be a better way of doing this, but this example might also help you see some direction; I hope it is helpful.

Dave

I used these two tables and faked the data:

create table dbo.final
( [key] char (10) not null,
[data] varchar (6000) not null
)

create table workingTable
( ID_Num varchar (10) not null,
[data] varchar (100) not null,
importance smallint not null,

constraint pk_mockInput primary key (ID_Num, [data])
)


insert into workingTable
select convert (varchar (5), a.iter),
'{ ' + convert (varchar (5), b.iter) + ' }'
+ replicate ('-', 1 + b.iter % 29 ) + '-->',
b.iter
from small_iterator a (nolock)
inner join small_iterator b (nolock)
on a.iter <= 21300
and b.iter <= 40 + a.iter % 119
and b.iter <= 200
order by a.iter,
b.iter

update statistics workingTable
exec sp_recompile workingTable

I would prefer not to use the function because there is a fair amount of overhead associated with functions; nonetheless, I implemented the method using the following function:

alter function dbo.assembleData
( @.arg_key varchar (10),
@.arg_minData varchar (100),
@.arg_maxData varchar (100)
)
returns varchar (6000)
as

begin

declare @.retValue varchar (6000)
set @.retValue = ''

select @.retValue = @.retValue + [data]
from workingTable
where [id_num] = @.arg_key
and [data] >= @.arg_minData
and [data] <= @.arg_maxData

return ( @.retValue )

end

I then tested out the overall procedure with this query:

--truncate table dbo.final

-- The whole thing takes about 3 minutes and 20 seconds
create table #sequencer
( [key] varchar (10) not null,
[data] varchar (100) not null,
importance smallint not null,
seq integer not null,
siz integer not null,

constraint pk_#sequencer primary key ([key], [data])
)

insert into #sequencer
select a.id_num,
a.[data],
a.importance,
count(*) as seq,
sum (datalength (b.[data])) as siz
from workingTable a
inner join workingTable b
on a.id_num = b.id_num
and b.[data] <= a.[data]
group by a.id_num, a.[data], a.importance
order by a.id_num, a.[data]

insert into dbo.final
select [key],
dbo.assembleData ( [key], min_data, max_data )
from ( select [key],
min ([data]) as min_data,
max ([data]) as max_data,
min (seq) as min_seq,
max (seq) as max_seq,
case when siz = 0 then 0
else siz - 1
end / 6000 as segment
from #sequencer
group by [key],
case when siz = 0 then 0
else siz - 1
end / 6000
) x

drop table #sequencer

select top 20 *
from dbo.final

|||

I will give that method a try...

I have been trying all kinds or things including DTS with multiple connections

all have been unsuccessful in speedy results... my last attempt took about 10 hours

|||

THAT WORKED GREAT!!!!!

My job now completes in 1 min 43 sec.... HUGE improvement...

Thanks.

|||

Great you solved it.

Would you tell me which method you used (I know that not the one I said:). I'm asking because I'm not SQL server oriented and for the last few days seriously diving into SQL server. Before I was using it "superficially". Though I was impressed with T-SQL in 2000, 2005 made me an addict:)

|||I used the solution posted by Jared Ko

No comments:

Post a Comment