Monday, March 19, 2012

Log File Help!

I have a database that has a data file of size 50GB (500 million records). It's a lot of data. For speed, we have set up a 9 disc SCSI RAID 5 (4 drives striped, mirrored, and parity). Total space of 130GB. As I said, the data file consumes 50GB.

I needed to run a pretty vanila SQL INSERT/UPDATE on the data. 7 hours later, the query had failed due to a "log file full" error. The log file was at 80+GB and shortly dropped to 1MB by itself.

I'm running SQL Server 2000 SP3. Recovery model is simple and auto shrink is enabled. What could possibly consume 80GB of disc? That is way larger than the amount of data that I have!

What is the best solution? Get a 200GB IDE disc to house the log file?

I need a solution fast! This project is way over schedule; these 7 hour statement run times are killing me.

Thank you in advance!you will be better off breaking your operation into separate insert and then update. but i'd recommend to replace insert with bulk insert, and update with bcp...out that would result in the contents of the file with desired values, then truncate the table, and then bulk insert clean "updated" data. from what you describe it sounds like you are updating the majority of 500 mil records.|||SCSI RAID 5 (4 drives striped, mirrored, and parity)

I don't know how you set this up, but this "is not" RAID 5. What you should probably have done from a performance perspective was have your logs on RAID 1 or 10 and the data files on RAID 5.

In addition, it does sound like you need to break your operation up into smaller steps. If you want to post the script, we can probably give you some pretty fast pointers on how to make it run faster and not use as much log.|||Thanks for the feedback!

Last night, I ran out and bought a 250GB IDE disc, put the log file on that and reran. It finished in 5.5 hours and consumed 150GB of log (which shortly after shrunk down to 1MB). That's a damn lot of log space! My entire data file is only 50GB!

Here are is the exact SQL I ran:

INSERT INTO Domains (Domain)
SELECT DISTINCT Domain FROM Stages WHERE Domain NOT IN (SELECT Domain FROM Domains)

UPDATE Stages SET DomainID = (SELECT Domains.[ID] FROM Domains WHERE Domains.Domain = Stages.Domain)

Now for the more complex TSQL... Hopefully the 250GB will hold and everything will be complete in 48 hours...|||I don't suppose you've wrapped a BEGIN TRANSACTION around this whole thing have you? Just for fun, put a CHECKPOINT between each one of your statement. This shouldn't help keep the log small, but it will force it to write to disk after each statement. You will want to begin and commit each statement seperately also.|||Now, I didn't use transactions at all. I ran the exact SQL that I posted in a Query Analyzer window.

That piece is done. However, if you can suggest an optimization for the dedup process, I would be really grateful. I'm posting that under a separate thread: "Optimize Dedup TSQL?"

No comments:

Post a Comment