I am not even sure if this is possible but I want to be able to excute several statements (or SPs) at once from inside a SPs (or any other method).
What I am doing is a I am taking data from a single column, multiple rows and making it into one row (i.e. data1 + data2 + data3....) But I am doing this to a total of 2.1 million individual rows and the result will be about 204k rows.... what I have written is basically a nested loop and it works fine but very slow... slow as in it has been running for 24 hours now and it is about 60% done... I need this to finish in under 36 hours preferably...
If I could handle more than one set of data at once (there will be no duplicates) I could speed up the process by how ever many I feel like working with...
So is there a way to execute a statement (sp or function) and go to the next statement without waiting for the first to finish?
Thanks Big Time... I hope this is possible.
OH.... SQL 2000 SP4 on Windows 2003.
You could open multiple connections from the client, but something tells me that there may be a better way to process your data.
Could you post the relevent schema and code?
|||AFAIK within SQL server 2000 you do not have such a chance. On a single CPU, I doubt the case would be different under 2005 (but not an MSSQL expert).
However "24 hours" and even "hours" for 2.1 million rows processing to create a 204K data sounded to be very slow to me. Maybe if you select the data outside of SQL server with an isolation like uncommitted, prepare outside and do a single bulk load, it'd be much faster.
How do you do this? All in a single T-SQL batch?
|||It is all within a single tsql SP.... It is extremely slow and I can't seem to get it to use more processor or memory...
I am thinking using a DTS package and working with multiple temp tables. Then combining all the temp tables in the end... it should be quicker that way and I might be able to push my server (which is enterprise with 12GB of ram and 2 processors)...
current schema is :
Final Table = ID_NUM (Key, char(10)), data (varchar(6000))
Working table is ID_NUM, date varchar(100), importance (smallint) - key is ID_NUM and DATA
Working temp table data varchar(100), importance smallint -- no key, always less than 200 rows.
Plus one variable @.data that is a varchar(6000)
Loop 1
Grab an ID_NUM
get list of data for ID_NUM into working table
Loop 2
Loop through till all rows have been added together or it goes over 6000 characters
@.data = Data + data + data
end loop 2
insert ID_NUM, @.DATA into final table
end loop 1
That is what I am going... no cursor, just while loops...
|||William:
Where you say, "... date varchar (100), ... " for your "Working table" do you really mean "... data varchar (100) ... "?
|||Dave
Yes that is what I mean...
sorry... working on a million things a once and don't always proof read..
|||It sounds like you're just trying to concatenate all of your rows in the child table into a single row in a third table. If so, the following scenario may help.
http://databases.aspfaq.com/general/how-do-i-concatenate-strings-from-a-column-into-a-single-row.html
You'll likely not get much parallelism in looping code that you desribed so if you can find a set-based method of processing then you'll be much better off.
|||That actually looks good... the problem will be hitting the varchar limit.
I will have to see if I hit it or not...
What fun... lol...
Thanks for all the help.
|||My apologies - I forgot you're on SQL Server 2000. The XML method of processing won't work in this situation. You still want a process where you can build the comma-separated list in a SET operation rather than one row at a time.
There's a method whereby you can assign a value to a variable in one row and use it in subsequent rows of an UPDATE statement. AFAIK, it's undocumented but it works in SQL 2005 as well so you're safe for a little while.
The pseudo-code would look something like this:
-- Place all of the data into a work table
CREATE TABLE #WorkTable (ID_NUM char(10), data varchar(6000), importance smallint, IdentCol INT IDENTITY, PRIMARY KEY(ID_NUM, IdentCol)
INSERT INTO #WorkTable (ID_NUM, data, importance)
SELECT ID_Num, data, Importance
FROM SourceTable
ORDER BY ID_NUM, Importance
-- Update work table. Each row has the data of itself + the row preceding it when there is a matching ID_NUM
DECLARE @.ID_NUM CHAR(10)
DECLARE @.data VARCHAR(6000)
UPDATE #WorkTable
SET @.data = data = CASE WHEN ID_NUM <> @.ID_NUM THEN data
ELSE @.data + data
END
,@.ID_NUM = ID_NUM
INSERT INTO FinalTable(ID_NUM, Data)
SELECT ID_NUM, MAX(DATA)
FROM #WorkTable
GROUP BY ID_NUM
In case the pseudo-code doesn't work, here's a working sample for Adventureworks database:
CREATE TABLE #JobTitles (job_id INT, job_desc VARCHAR(100), employees VARCHAR(1000), EmployeeName VARCHAR(100), PRIMARY KEY (job_id,
EmployeeName))
DECLARE @.job_id INT, @.EmployeeNames VARCHAR(1000)
INSERT INTO #JobTitles(job_id, job_desc, EmployeeName) SELECT J.job_id, J.job_desc, E.lname +', ' + E.fname AS EmployeeName FROM jobs J INNER JOIN employee E ON J.job_id = E.job_id
SET @.job_id = -1 -- Initialize this to a number that does not exist
-- in the list of values.
UPDATE #JobTitles
SET @.EmployeeNames = employees = CASE
WHEN @.job_id <> job_id THEN EmployeeName
ELSE @.EmployeeNames + '; ' + EmployeeName
END
, @.job_id = job_id
-- Grab the largest row of each grouping.
SELECT job_id, job_desc, MAX(employees) AS EmployeeList FROM #JobTitles GROUP BY job_id, job_desc ORDER BY job_id
DROP TABLE #JobTitles
|||William:
Here is yet another example. I mocked this up with 2.1 million rows. I was able to get this to run in about 3 minutes and 20 seconds. This problem is definitely easier to solve with 2005 than 2000 and there has to be a better way of doing this, but this example might also help you see some direction; I hope it is helpful.
Dave
I used these two tables and faked the data:
create table dbo.final
( [key] char (10) not null,
[data] varchar (6000) not null
)create table workingTable
( ID_Num varchar (10) not null,
[data] varchar (100) not null,
importance smallint not null,constraint pk_mockInput primary key (ID_Num, [data])
)
insert into workingTable
select convert (varchar (5), a.iter),
'{ ' + convert (varchar (5), b.iter) + ' }'
+ replicate ('-', 1 + b.iter % 29 ) + '-->',
b.iter
from small_iterator a (nolock)
inner join small_iterator b (nolock)
on a.iter <= 21300
and b.iter <= 40 + a.iter % 119
and b.iter <= 200
order by a.iter,
b.iterupdate statistics workingTable
exec sp_recompile workingTable
I would prefer not to use the function because there is a fair amount of overhead associated with functions; nonetheless, I implemented the method using the following function:
alter function dbo.assembleData
( @.arg_key varchar (10),
@.arg_minData varchar (100),
@.arg_maxData varchar (100)
)
returns varchar (6000)
asbegin
declare @.retValue varchar (6000)
set @.retValue = ''select @.retValue = @.retValue + [data]
from workingTable
where [id_num] = @.arg_key
and [data] >= @.arg_minData
and [data] <= @.arg_maxDatareturn ( @.retValue )
end
I then tested out the overall procedure with this query:
|||--truncate table dbo.final
-- The whole thing takes about 3 minutes and 20 seconds
create table #sequencer
( [key] varchar (10) not null,
[data] varchar (100) not null,
importance smallint not null,
seq integer not null,
siz integer not null,constraint pk_#sequencer primary key ([key], [data])
)insert into #sequencer
select a.id_num,
a.[data],
a.importance,
count(*) as seq,
sum (datalength (b.[data])) as siz
from workingTable a
inner join workingTable b
on a.id_num = b.id_num
and b.[data] <= a.[data]
group by a.id_num, a.[data], a.importance
order by a.id_num, a.[data]insert into dbo.final
select [key],
dbo.assembleData ( [key], min_data, max_data )
from ( select [key],
min ([data]) as min_data,
max ([data]) as max_data,
min (seq) as min_seq,
max (seq) as max_seq,
case when siz = 0 then 0
else siz - 1
end / 6000 as segment
from #sequencer
group by [key],
case when siz = 0 then 0
else siz - 1
end / 6000
) xdrop table #sequencer
select top 20 *
from dbo.final
I will give that method a try...
I have been trying all kinds or things including DTS with multiple connections
all have been unsuccessful in speedy results... my last attempt took about 10 hours
|||THAT WORKED GREAT!!!!!
My job now completes in 1 min 43 sec.... HUGE improvement...
Thanks.
|||Great you solved it.
Would you tell me which method you used (I know that not the one I said:). I'm asking because I'm not SQL server oriented and for the last few days seriously diving into SQL server. Before I was using it "superficially". Though I was impressed with T-SQL in 2000, 2005 made me an addict:)
|||I used the solution posted by Jared Ko
No comments:
Post a Comment