I am not even sure if this is possible but I want to be able to excute several statements (or SPs) at once from inside a SPs (or any other method).
What I am doing is a I am taking data from a single column, multiple rows and making it into one row (i.e. data1 + data2 + data3....) But I am doing this to a total of 2.1 million individual rows and the result will be about 204k rows.... what I have written is basically a nested loop and it works fine but very slow... slow as in it has been running for 24 hours now and it is about 60% done... I need this to finish in under 36 hours preferably...
If I could handle more than one set of data at once (there will be no duplicates) I could speed up the process by how ever many I feel like working with...
So is there a way to execute a statement (sp or function) and go to the next statement without waiting for the first to finish?
Thanks Big Time... I hope this is possible.
OH.... SQL 2000 SP4 on Windows 2003.
You could open multiple connections from the client, but something tells me that there may be a better way to process your data.
Could you post the relevent schema and code?
|||AFAIK within SQL server 2000 you do not have such a chance. On a single CPU, I doubt the case would be different under 2005 (but not an MSSQL expert).
However "24 hours" and even "hours" for 2.1 million rows processing to create a 204K data sounded to be very slow to me. Maybe if you select the data outside of SQL server with an isolation like uncommitted, prepare outside and do a single bulk load, it'd be much faster.
How do you do this? All in a single T-SQL batch?
|||It is all within a single tsql SP.... It is extremely slow and I can't seem to get it to use more processor or memory...
I am thinking using a DTS package and working with multiple temp tables. Then combining all the temp tables in the end... it should be quicker that way and I might be able to push my server (which is enterprise with 12GB of ram and 2 processors)...
current schema is :
Final Table = ID_NUM (Key, char(10)), data (varchar(6000))
Working table is ID_NUM, date varchar(100), importance (smallint) - key is ID_NUM and DATA
Working temp table data varchar(100), importance smallint -- no key, always less than 200 rows.
Plus one variable @.data that is a varchar(6000)
Loop 1
Grab an ID_NUM
get list of data for ID_NUM into working table
Loop 2
Loop through till all rows have been added together or it goes over 6000 characters
@.data = Data + data + data
end loop 2
insert ID_NUM, @.DATA into final table
end loop 1
That is what I am going... no cursor, just while loops...
|||William:
Where you say, "... date varchar (100), ... " for your "Working table" do you really mean "... data varchar (100) ... "?
|||Dave
Yes that is what I mean...
sorry... working on a million things a once and don't always proof read..
|||It sounds like you're just trying to concatenate all of your rows in the child table into a single row in a third table. If so, the following scenario may help.
http://databases.aspfaq.com/general/how-do-i-concatenate-strings-from-a-column-into-a-single-row.html
You'll likely not get much parallelism in looping code that you desribed so if you can find a set-based method of processing then you'll be much better off.
|||That actually looks good... the problem will be hitting the varchar limit.
I will have to see if I hit it or not...
What fun... lol...
Thanks for all the help.
|||My apologies - I forgot you're on SQL Server 2000. The XML method of processing won't work in this situation. You still want a process where you can build the comma-separated list in a SET operation rather than one row at a time.
There's a method whereby you can assign a value to a variable in one row and use it in subsequent rows of an UPDATE statement. AFAIK, it's undocumented but it works in SQL 2005 as well so you're safe for a little while.
The pseudo-code would look something like this:
-- Place all of the data into a work table
CREATE TABLE #WorkTable (ID_NUM char(10), data varchar(6000), importance smallint, IdentCol INT IDENTITY, PRIMARY KEY(ID_NUM, IdentCol)
INSERT INTO #WorkTable (ID_NUM, data, importance)
SELECT ID_Num, data, Importance
FROM SourceTable
ORDER BY ID_NUM, Importance
-- Update work table. Each row has the data of itself + the row preceding it when there is a matching ID_NUM
DECLARE @.ID_NUM CHAR(10)
DECLARE @.data VARCHAR(6000)
UPDATE #WorkTable
SET @.data = data = CASE WHEN ID_NUM <> @.ID_NUM THEN data
ELSE @.data + data
END
,@.ID_NUM = ID_NUM
INSERT INTO FinalTable(ID_NUM, Data)
SELECT ID_NUM, MAX(DATA)
FROM #WorkTable
GROUP BY ID_NUM
In case the pseudo-code doesn't work, here's a working sample for Adventureworks database:
CREATE TABLE #JobTitles (job_id INT, job_desc VARCHAR(100), employees VARCHAR(1000), EmployeeName VARCHAR(100), PRIMARY KEY (job_id,
EmployeeName))
DECLARE @.job_id INT, @.EmployeeNames VARCHAR(1000)
INSERT INTO #JobTitles(job_id, job_desc, EmployeeName) SELECT J.job_id, J.job_desc, E.lname +', ' + E.fname AS EmployeeName FROM jobs J INNER JOIN employee E ON J.job_id = E.job_id
SET @.job_id = -1 -- Initialize this to a number that does not exist
-- in the list of values.
UPDATE #JobTitles
SET @.EmployeeNames = employees = CASE
WHEN @.job_id <> job_id THEN EmployeeName
ELSE @.EmployeeNames + '; ' + EmployeeName
END
, @.job_id = job_id
-- Grab the largest row of each grouping.
SELECT job_id, job_desc, MAX(employees) AS EmployeeList FROM #JobTitles GROUP BY job_id, job_desc ORDER BY job_id
DROP TABLE #JobTitles
|||William:
Here is yet another example. I mocked this up with 2.1 million rows. I was able to get this to run in about 3 minutes and 20 seconds. This problem is definitely easier to solve with 2005 than 2000 and there has to be a better way of doing this, but this example might also help you see some direction; I hope it is helpful.
Dave
I used these two tables and faked the data:
create table dbo.final
( [key] char (10) not null,
[data] varchar (6000) not null
)create table workingTable
( ID_Num varchar (10) not null,
[data] varchar (100) not null,
importance smallint not null,constraint pk_mockInput primary key (ID_Num, [data])
)
insert into workingTable
select convert (varchar (5), a.iter),
'{ ' + convert (varchar (5), b.iter) + ' }'
+ replicate ('-', 1 + b.iter % 29 ) + '-->',
b.iter
from small_iterator a (nolock)
inner join small_iterator b (nolock)
on a.iter <= 21300
and b.iter <= 40 + a.iter % 119
and b.iter <= 200
order by a.iter,
b.iterupdate statistics workingTable
exec sp_recompile workingTable
I would prefer not to use the function because there is a fair amount of overhead associated with functions; nonetheless, I implemented the method using the following function:
alter function dbo.assembleData
( @.arg_key varchar (10),
@.arg_minData varchar (100),
@.arg_maxData varchar (100)
)
returns varchar (6000)
asbegin
declare @.retValue varchar (6000)
set @.retValue = ''select @.retValue = @.retValue + [data]
from workingTable
where [id_num] = @.arg_key
and [data] >= @.arg_minData
and [data] <= @.arg_maxDatareturn ( @.retValue )
end
I then tested out the overall procedure with this query:
|||--truncate table dbo.final
-- The whole thing takes about 3 minutes and 20 seconds
create table #sequencer
( [key] varchar (10) not null,
[data] varchar (100) not null,
importance smallint not null,
seq integer not null,
siz integer not null,constraint pk_#sequencer primary key ([key], [data])
)insert into #sequencer
select a.id_num,
a.[data],
a.importance,
count(*) as seq,
sum (datalength (b.[data])) as siz
from workingTable a
inner join workingTable b
on a.id_num = b.id_num
and b.[data] <= a.[data]
group by a.id_num, a.[data], a.importance
order by a.id_num, a.[data]insert into dbo.final
select [key],
dbo.assembleData ( [key], min_data, max_data )
from ( select [key],
min ([data]) as min_data,
max ([data]) as max_data,
min (seq) as min_seq,
max (seq) as max_seq,
case when siz = 0 then 0
else siz - 1
end / 6000 as segment
from #sequencer
group by [key],
case when siz = 0 then 0
else siz - 1
end / 6000
) xdrop table #sequencer
select top 20 *
from dbo.final
I will give that method a try...
I have been trying all kinds or things including DTS with multiple connections
all have been unsuccessful in speedy results... my last attempt took about 10 hours
THAT WORKED GREAT!!!!!
My job now completes in 1 min 43 sec.... HUGE improvement...
Thanks.
|||Great you solved it.
Would you tell me which method you used (I know that not the one I said:). I'm asking because I'm not SQL server oriented and for the last few days seriously diving into SQL server. Before I was using it "superficially". Though I was impressed with T-SQL in 2000, 2005 made me an addict:)
|||I used the solution posted by Jared Ko
No comments:
Post a Comment