Wolfmarsh What? 5975 Posts user info edit post |
I figured I would turn to TWW to see if anyone has any suggestions, I am running low on them.
I have a huge dataset (50 million rows) in Table A that needs to be moved to Table B in SQL 2008.
Table A has to have some stuff done to it programmatically, so it has to be processed somehow in .net.
Table A also has very few constraints on it, whereas Table B has stricter constraints.
I would like to know which specific rows from Table A get excepted out by Table B.
The slowest, easiest method is to read Table A into a dataset, and roll through it, formatting an Insert statement for each row and inserting it into Table B. If the row throws an exception, log that somewhere. This runs at around 1500 records per second.
The next fastest I have found is by using two dataadapters, a data set with two tables, and moving the data from Dataset A to Dataset B, then doing DataAdapterB.Update. Do this on chunks of like 100,000 records, to keep it from timing out or running out of memory. This runs at around 2000-2500 records per second.
The ultimate in speed I found was to use a SqlBulkCopy object. This runs at around 8000 records per second. The caveat here is that the data being copied has to fit within the constraints of table B, or the entire bulk operation fails, I also lose row level reporting on exceptions.
Does anyone know of any faster ways of obtaining the goal? This is starting to frustrate the hell out of me. 3/24/2009 9:33:41 PM |
dakota_man All American 26584 Posts user info edit post |
Not specifically
But I'd probably use the SqlBulkCopy. When a bulk fails, split it and repeat a few times maybe, then take the failing bulks and go through those row by row.
[Edited on March 24, 2009 at 9:38 PM. Reason : seems like something should do that for you though] 3/24/2009 9:36:55 PM |
qntmfred retired 40726 Posts user info edit post |
been there, feel your pain. don't really have any great suggestions 3/24/2009 9:40:32 PM |
ScHpEnXeL Suspended 32613 Posts user info edit post |
just copy paste that shit 3/24/2009 9:45:13 PM |
HaLo All American 14263 Posts user info edit post |
Quote : | "Table A has to have some stuff done to it programmatically, so it has to be processed somehow in .net." |
don't understand this point, do you mean after the programmatic processing, the data then needs to be added to table b, or just thats why you need to use .net, no processing before move to table b
can you create a tableset that only selects items from A that meet B's restrictions, then insert this tableset or just maketable with it (don't know if maketable works with sql2008)3/24/2009 9:48:56 PM |
Wolfmarsh What? 5975 Posts user info edit post |
Thanks for the suggestions so far, to answer your questions:
Quote : | "don't understand this point, do you mean after the programmatic processing, the data then needs to be added to table b, or just thats why you need to use .net, no processing before move to table b " |
Table A's structure doesn't match exactly with Table B. Table A will be populated by non-technical people, so certain "liberties" were taken with its design. For example, lets say Table A has biographical information in it. So it has fields like FirstName, LastName, Title, Suffix. In Table A, we allow them to put "Mr." in the title field. In Table B, thats actually a guid thats a foreign key to a code table. I know I can do a join, but I also need to be able to call some functions to add code table entries if necessary.
This also has to be a repeatable process, and there are hundreds of source tables and hundreds of destination tables, not just a simple A to B. Basically converting from one database schema to another.
Quote : | "can you create a tableset that only selects items from A that meet B's restrictions, then insert this tableset or just maketable with it (don't know if maketable works with sql2008)" |
Its actually really fast to flip through a dataset in code and throw exceptions when I know ive violated B's restrictions, the issue is that it would be nice not to have to code for those restrictions. By throwing all of the data at the destination table and letting SQL server throw the exceptions on the rows that are bad, the program doing the moving between tables doesnt have to understand the constraints in the destination.
I also dont have control over the database design for the destination, so if they release a new database and the constraints have changed, if the program is hard coded to work with data within those constraints, someone has to go back and update the program with the new constraints.
This would be much easier if datatables in .net allowed the full constraints that the actual sql tables allow (minus foreign keys, i can deal with those).3/24/2009 10:15:55 PM |
|