Monthly Archives: February 2013

Concurrent reading and writing in an Oracle database

I’m working on a job scheduler. I have an IMPORTJOBS table in an Oracle database, in which thousands of import jobs are inserted. Initially they all have a status ‘JOBEXTRACTED’.

Then I have a Quartz scheduler with 10 Quartz jobs executing in parallel. Each Quartz job retrieves the first import job from the IMPORTJOBS table where status is ‘JOBEXTRACTED’ and changes that status to ‘EXECUTIONINPROGRESS’ so that other Quartz jobs don’t retrieve this import job again.

Of course, due to the fact that this has to happen from multiple Quartz jobs that are executed in parallel, I had a concurrency problem that resulted in import jobs being executed more than once, because the same import table row was retrieved multiple times by different Quartz jobs.

The solution was to have a lock strategy that would enable a READ/LOCK/UPDATE/UNLOCK pattern so that each import job is retrieved only once; also known as pessimistic locking: the resource is locked from its first access in a transaction until the end of the transaction and so it cannot be accessed by other transactions (as opposed to optimistic locking).

To do that, I created the following operation:

public Guid? PutNextImportJobToExecuting(string connectionString)
{
Guid? id = null;

const string sqlQuery = "SELECT * FROM IMPORTJOBS WHERE STATUSCODE = 'JOBEXTRACTED' AND ROWNUM <= 1 FOR UPDATE SKIP LOCKED";
const string sqlUpdate = "UPDATE IMPORTJOBS SET STATUSCODE = 'EXECUTIONINPROGRESS' WHERE ID = :id";
using (var transaction = new TransactionScope())
{
using (var oracleConnection = new OracleConnection(connectionString))
{
oracleConnection.Open();
using (var command = oracleConnection.CreateCommand())
{
command.CommandText = sqlQuery;
var returnedImportJobId = command.ExecuteScalar() as byte[];
if (returnedImportJobId != null) id = new Guid(returnedImportJobId);
}
if (id.HasValue)
{
using (var command = oracleConnection.CreateCommand())
{
command.CommandText = sqlUpdate;
command.Parameters.Add(":id", OracleDbType.Raw, id, ParameterDirection.Input);
command.ExecuteNonQuery();
}
}
}
transaction.Complete();
}

return id;
}

You see that first we get the first import job with status ‘JOBEXTRACTED’ but also use the ‘FOR UPDATE SKIP LOCKED’ statement, which makes sure that the returned row remains locked for the current transaction and so will not be retrieved again by other queries.

Next, we update the status of this import job to ‘EXECUTIONINPROGRESS’ and commit the transaction. So it is important to do the SELECT and UPDATE in a transaction!

If you now call this in parallel from different threads, you always get back unique import job id’s.

Later on I noticed that during heavy load sometimes no new import job id was found although there were still import jobs available. So I added some logic to retry this a few times:

private Guid? PutNextImportJobToExecuting(string connectionString, short retries)
{
short retryCounter = retries;
while (retryCounter > 0)
{
Guid? id = PutNextImportJobToExecuting(connectionString);
if (id.HasValue) return id;
retryCounter--;
}
return null;
}

Bulk insert into an Oracle database

One of the things I worked on today was to import thousands of records (originating from an xml file) into an Oracle database in a .NET component. Normally the data access layer uses entity framework and Devart to handle database operations, but obviously this was not going to be a good solution in this case, as this would definitely be way too slow. A quick test proved this: inserting 25.000 records took more than half an hour and locked up the database server during this time.

So I needed a better way to bulk import data into my Oracle database.

After some research I found two options:

Both allow you to do bulk inserts using something called array binding. This basically means that you can insert multiple records at once in one database call, thereby avoiding unnecessary round trips.

Because in the company we were already using Devart and we have a license for it, I decided to use that one (but the way of working is about the same with ODP.NET).

The table that I had to fill was called IMPORTJOBS:

CREATE TABLE IMPORTJOBS
(
ID RAW(16) NOT NULL,
IMPORTBATCHID RAW(16) NOT NULL,
DATA VARCHAR2(4000 BYTE) NOT NULL,
STATUSCODE VARCHAR2(20 BYTE) NOT NULL,
RESULTID RAW(16)
)

The component that has to call the bulk insert is an operation in the business layer. Because I like abstracting out components to reduce coupling and dependencies, I’ll created an OracleBulkInsertProvider that will contain the actual data logic to insert the records, so that my operations in the business layer do not have to worry about the technical details.

public interface IOracleBulkInsertProvider
{
void BulkInsertImportBatches(string connectionString,
List<ImportJob> importJobs);
}

public class OracleBulkInsertProvider : IOracleBulkInsertProvider
{
public void BulkInsertImportBatches(string connectionString,
List<ImportJob> importJobs)
{
var ids = importJobs.Select(ib => ib.Id).ToList();
var importBatchIds =
importJobs.Select(ib => ib.ImportBatchId).ToList();
var datas = importJobs.Select(ib => ib.Data).ToList();
var statusCodes =
importJobs.Select(ib => ib.ImportJobStatus.Code).ToList();

const string sql = "insert into IMPORTJOBS (ID, IMPORTBATCHID, DATA,
STATUSCODE) values (:id, :importBatchId, :data, :statusCode)"
;

using (var oracleConnection =
new OracleConnection(connectionString))
{
oracleConnection.Open();
using (var command = oracleConnection.CreateCommand())
{
command.CommandText = sql;
command.CommandType = CommandType.Text;
command.Parameters.Add(":id", OracleDbType.Raw,
ids.ToArray(), ParameterDirection.Input);
command.Parameters.Add(":importBatchId", OracleDbType.Raw,
importBatchIds.ToArray(), ParameterDirection.Input);
command.Parameters.Add(":data", OracleDbType.VarChar,
datas.ToArray(), ParameterDirection.Input);
command.Parameters.Add(":statusCode", OracleDbType.VarChar,
statusCodes.ToArray(), ParameterDirection.Input);
command.ExecuteArray(importJobs.Count);
}
}
}
}

I think the code is straightforward. The goal is to create a command, add input parameters and at the same time provide a list of actual values to insert for that parameter. At the end we simply call the ExecuteArray operation.

This way, the time needed to insert 25.000 records takes about 2 seconds.

Parsing large XML files

Today I had to parse large xml files and in various ways: getting the value of an attribute, counting elements, and getting all elements and their contents.

There are various ways of parsing xml files: using DataSet, XmlSerializer, XPathDocument, XmlDocument, XDocument and XmlTextReader. Now because my xml files could be large, it was obvious that I did not want to load them completely in memory before parsing them.

Obviously the choice of method would then be XmlTextReader, because this allows you to advance to the next nodes and attributes without having to load the file completely.

For simplicity, assume that my xml file looks like this:

<?xml version="1.0" encoding="utf-8" ?> 
<personImport batchType="PmH">
<persons>
<person>
<senderData>
<name>Lincoln</name>
<fileCreationDate>25/09/2012</fileCreationDate>
</senderData>
<personData>
<lastName>Steward</lastName>
<firstName>Michael</firstName>
</personData>
</person>
<person>
<senderData>
<name>Mercator</name>
<fileCreationDate>25/09/2012</fileCreationDate>
</senderData>
<personData>
<lastName>Miles</lastName>
<firstName>David</firstName>
</personData>
</person>
</persons>
</personImport>

The first thing I needed to do was to retrieve the batchType value from the root element. So I created an ImportFileTypeResolver to do the work:

public class ImportFileTypeResolver : IImportFileTypeResolver
{
public ImportFileType Resolve(string filePath)
{
using (var reader = new XmlTextReader(filePath))
{
reader.ReadToFollowing("personImport");
switch (reader.GetAttribute("batchType"))
{
case "PmH": return ImportFileType.Person;
default: return ImportFileType.Unknown;
}
}
}
}
 
As you see there are a number of convenience methods that make it very easy to advance to the next element/attribute, using ReadToFollowing and MoveToAttribute.
 
The next thing was to return the number of person elements, so I created a PersonImportFileJobCountResolver:
 
public class PersonImportFileJobCountResolver : IImportFileJobCountResolver
{
public int GetNumberOfJobs(string filePath)
{
using (var reader = new XmlTextReader(filePath))
{
var nodeCount = 0;
while (reader.ReadToFollowing("person")) nodeCount++;
return nodeCount;
}
}
}
 
Again, all I needed to do was use the ReadToFollowing operation until the file has been read completely and for each iteration increase the counter.
 
The third thing I wanted to do was to retrieve all person elements as Person objects. To do that I created a PersonImportBatchFileExtractor as follows:
 
public interface IImportBatchFileExtractor<TEntity> : IDisposable
{
TEntity ExtractNext();
}

 
public class PersonImportBatchFileExtractor : IImportBatchFileExtractor<PersonImportBatchFileExtractor.Person>
{
private XmlReader _xmlReader;

public Person ExtractFirst(string personImportBatchFilePath)
{
_xmlReader = new XmlTextReader(personImportBatchFilePath);
return ExtractNext();
}

public Person ExtractFirst(XDocument data)
{
var stream = new MemoryStream();
data.Save(stream);
stream.Position = 0;
_xmlReader = XmlReader.Create(stream);
return ExtractNext();
}

public Person ExtractNext()
{
if (_xmlReader == null) throw new ApplicationException("Call ExtractFirst before calling ExtractNext");

_xmlReader.ReadToFollowing("person");
if (_xmlReader.NodeType == XmlNodeType.None) return null;

var xDocument = XDocument.Parse(_xmlReader.ReadOuterXml());
if (xDocument.Root == null) throw new ApplicationException("Something went wrong during parsing a person");

var name = xDocument.Root.XPathSelectElement("senderData/name").Value;
var fileCreationDate = xDocument.Root.XPathSelectElement("senderData/fileCreationDate").Value;
var lastName = xDocument.Root.XPathSelectElement("personData/lastName").Value;
var firstName = xDocument.Root.XPathSelectElement("personData/firstName").Value;

return new Person(xDocument, new SenderData(name, DateTime.Parse(fileCreationDate)), new PersonData(lastName, firstName));
}

public void Dispose()
{
if (_xmlReader!=null) _xmlReader.Close();
}

public class Person
{
public Person(XDocument xmlRepresentation, SenderData senderData, PersonData personData)
{
XmlRepresentation = xmlRepresentation;
SenderData = senderData;
PersonData = personData;
}
public XDocument XmlRepresentation { get; set; }
public SenderData SenderData { get; set; }
public PersonData PersonData { get; set; }
}

public class SenderData
{
public SenderData(string name, DateTime fileCreationDate)
{
Name = name;
FileCreationDate = fileCreationDate;
}
public string Name { get; set; }
public DateTime FileCreationDate { get; set; }
}

public class PersonData
{
public PersonData(string firstName, string lastName)
{
FirstName = firstName;
LastName = lastName;
}
public string FirstName { get; set; }
public string LastName { get; set; }
}
}

 
This simply looks for the next person element, creates an XDocument based on it and then uses xpath to find the elements within that person to build a Person object and return it.
 
Usage is like:
 
using (var personImportBatchFileExtractor 
= new PersonImportBatchFileExtractor())
{
var person = personImportBatchFileExtractor.ExtractFirst
(@"C:\temp\batchimport\pmh.xml");
while (person != null)
{
person = personImportBatchFileExtractor.ExtractNext();
}
};

 

Expanding and compacting a virtual disk of a VirtualBox machine

Today I suddenly got errors due to low disk space in my VirtualBox virtual machine. It appeared that I only had a few bytes of free space left on my dynamically expanding hard disk of 60 GB, so I was getting into trouble!

To solve this, I closed all programs, removed unneeded data and applications on the virtual hard disk (by deleting or uninstalling them manually and then using the disk cleanup utility):

compact1

After that I was able to free up 3 Gb, but of course this was not enough. It appears you have to do a number of additional things in order to really shrink the hard disk – and maybe at the same time make it a little bigger.

The next thing to do was to run defragmenter. In my case however the disk was not fragmented, so this step was not really necessary.

Then I downloaded a tool called sdelete and executed it in the virtual machine in an elevated command prompt (run as administrator), as follows:

compact2

This tool zeroes out unused bytes, which is needed before actually compacting the disk. After this, I shut down the virtual machine.

Now because the format of my virtual disk was VMDK, I had to download a tool called CloneVDI, which enables you to clone a virtual hard disk and at the same time compact and expand it. Cloning is also a safer method because if something goes wrong, you don’t screw up your original hard disk and so you can still go back and try again:

compact3

Note that I also checked the ‘Keep old UUID’, as I want the exact same machine (otherwise windows would have to be reactivated). If the new size is bigger, make sure to also check the ‘Increase partition size’ (alternatively you can extend the partition later in disk management of windows 7).

The clone process may take a while to complete:

compact4

The end result however is a virtual disk in native VDI format. Now it was just a matter of creating a new virtual machine and selecting the new VDI hard disk for it. But before that you can do this, you have to remove the original hard disk from the Virtual Media Manager, otherwise it will complain that a disk with the same UUID already exists:

compact5

After this, starting the new virtual machine resulted in a blue screen. The reason is that the hard disk was added under SATA controller, so I removed it and added it again under IDE controller:

compact6

After that it worked as expected.

Toad for Oracle freeware corruption

After an out of disk space on my virtual hard drive in VirtualBox, starting the schema browser in Toad for Oracle resulted in a Stream Read Error. Uninstalling and reinstalling Toad did not resolve the issue. What did help was resetting Toad for Oracle, which you can do as follows:

  • Close Toad for Oracle.
  • Go to ‘c:\users\[user name]\AppData\Roaming\Quest Software\Toad for Oracle’ and rename the existing Toad folder, in my case I renamed ‘11.0 freeware’ to ‘_old_11.6 freeware’.
  • Start Toad for Oracle again.

You have to add your connections again, but as the corrupted user data is now gone, the error will also be fixed.