Export Data from Flat files to database:The Groovy way

June 3, 2009

Recently i was supposed to port a lot of data from flat files into a database. Looking at alternatives i thought it would be neat to accomplish the same using a small groovy script. Here’s the crux of the script

def sql = groovy.sql.Sql.newInstance(‘jdbc:mysql://localhost:3306/test’ ,
‘root’, ‘ ‘, ‘com.mysql.jdbc.Driver’ )
new File(“C:/data/”).eachFile{file->
file.eachLine{line->
def col1
def col2
def col3
def args=line.split(“,”)
if(args.length==3){
col1=args[0]
col2=args[1]
col3=args[2]
sql.execute”””
INSERT INTO table (col1, col2, col3)
VALUES (${col1}, ${col2}, ${col3});
“””
}
}
}

Advertisements

Unit Testing Recipes

April 22, 2009

A lot of times i tend to come across unit tests where the developer has to create a List and compare the equality of a returned List with some expected List. As an example consider the following test Case

public void testResultsHasSameElements(){

List()expectedResults=new ArrayList();
expectedResults.add(“James”);
expectedResults.add(“Jack”);

List customerNames=CustomerDTO.getCustomerNames();
for(String name:customerNames){
assertEquals(name,expectedResult.get(0));
assertEquals(name,expectedResult.get(1));
}

}

If we look at the above testcase, it involves a lot of typing. We can simplify the above test using the fact that 2 Lists are equal if they contain the same objects at the same index. Another optimization which can be used is the Arrays.toList method to create the expectedResults list
The improved version of the test would look like

public void testResultsHasSameElements(){

List()expectedResults=Arrays.asList(
new String[]{“James”,”Jack”}
);

customerNames=CustomerDTO.getCustomerNames();
assertEquals(customerNames,expectedResult);
}

}


Getting teams started quickly on projects

September 30, 2008

The ability to get teams started on a project is something that takes a lot of time and resources. A few ways to mitigate that and get teams to speed fast are
1) Use Buildix(http://buildix.thoughtworks.com/) for setting up the infrastructure for the project like version control, continuous integration, agile project management and wiki and bug tracker. Buildix is open sourced under the apache license and automates setting up the basic infrastructure required fro the project.
2) Use Panopticode(http://www.panopticode.org/) to setup tools for gathering code metrics. Panopticode provides customized build scripts to integrate tools like Emma, CheckStyle, JDepend,JavaNCSS, Simian etc.
3) Use a VMWare mirror image to setup a developer box so as to remove any discrepancies across developer environments.


Windows Shortcuts and useful tools for developers

September 28, 2008

Here are a list of windows shortcuts and tools which every developer should know to work effectively.

Firefox Shortcuts
CTRL + number — can be used to navigate between tabs in firefox. For example CTRL+1 gives the first tab,CTRL+2 gives the second tab etc.
Explorer Shortcuts
Alt+D — leads to the address bar. The address bar as auto-completion like the tab in shells.
Command Prompt
F7 key–shows the command history
F8 key can be used to navigate across the history. Type the first few characters of the command and use the F8 key for auto-completion
Use pushd and popd. When inside a directory, pushd can be used to navigate to another directory and popd can be used to get back to the other directory. These commands work like a stack(LIFO), so prefer these over the simple cd.

Tools
CLCL — a multi-clipboard utility
Command Prompt Explorer bar — A sticky exlplorer-command prompt utility
PowerToys
Tweak UI
TaskSwitch
Virtual Desktop Manager


Software Architecture Structures

August 28, 2008

This post talks about the Software Architecture Structures, and how these structures relate in the creation of an architecture. Software Architecture Structures can be divided into 3 broad categories

1) Modules

2) Component and Connectors

3) Allocations

1) Modules — These are the units of implementation. These represent code based views of the system. The intent here is not related to any runtime considerations. Module based structures can be further sub-divided into

1. Decomposition –The units addressed here are that of submodules of a particular module. The intent is to break modules into smaller units such that each unit can be understood. This normally is the starting point of high level design and subsequent low level design. It also incorporates things like implementations, test plans etc. In addition a decomposition can also address questions related to changes made to the system. The intent should be to keep the changes local to a few sub-modules.
2. Uses — One unit uses another if the correctness of one depends on the existence of the correct version of another. The uses structure can be used to extend the capabilities of a system as well as extract subsets of functionality which can be used for incremental development
3. Layered — When the Uses structure is carefully modeled, normally layers emerge. These layers are areas of common functionality. Layered structures can be used to assess the dependence of one layer on the other as well as ensuring that the interactions of one layers are only with the layers above or below this layer. In addition, a layer at n should only depend for services on the n-1 layer and none other.
4. Class or Generalization — The units in this structure are classes. Classes can be used to reason about collection of similar behavior or capabilities. Class structures is used for assessing reuse and incremental addition of functionality.

2) Components and Connectors –Component-and-connector structures help answer questions such as What are the major executing components and how do they interact? What are the major shared data stores? Which parts of the system are replicated? How does data flow through the system? What parts of the system can run in parallel? How can the system’s structure change as it executes? These can be further sub-divided into

1. Processes –The process structure shows processes connected to each other through connectors, snychronizers,exclusions etc. This structure is important when looking at a systems performance and availibility.
2. Concurrency — This structure allows for determining the areas where there can be a resource contention or where parallelism can be employed.
3. Shared data — This comprises connectors and components that create,store or access shared persistent data. It shows how data is produced and consumed by the runtime components and can be used to ensure performance and data integrity.
4. Client -Server — This structure shows the components as client server units and the connectors are the underlying protocols which are used for communication.

3) Allocation Structures — This structure shows the relation between the software elements and the environment in which the software executes. These can be classified as follows

1. Deployment — The deployment structure shows how software is assigned to hardware-processing and communication elements. The elements are software, hardware entities and communication pathways. Relations are allocated-to, showing on which physical units the software elements reside. This view allows one to reason about performance, data integrity, availability, and security.
2. Implementation –This structure shows how usually modules are mapped to the file structure in the system’s development, integration, or configuration environments. This is critical for the management of development activities and build processes


Software Architecture Structures

August 28, 2008

This post talks about the various Software Architecture Structures, and how these structures relate in the creation of an architecture. Software Architecture Structures can be divided into 3 broad categories

1) Modules

2) Component and Connectors

3) Allocations

1) Modules — These are the units of implementation. These represent code based views of the system. The intent here is not related to any runtime considerations. Module based structures can be further sub-divided into

  1. Decomposition –The units addressed here are that of submodules of a particular module. The intent is to break modules into smaller units such that each unit can be understood. This normally is the starting point of high level design and subsequent low level design. It also incorporates things like implementations, test plans etc. In addition a decomposition can also address questions related to changes made to the system. The intent should be to keep the changes local to a few sub-modules.
  2. Uses — One unit uses another if the correctness of one depends on the existence of the correct version of another. The uses structure can be used to extend the capabilities of a system as well as extract subsets of functionality which can be used for incremental development
  3. Layered — When the Uses structure is carefully modeled, normally layers emerge. These layers are areas of common functionality. Layered structures can be used to assess the dependence of one layer on the other as well as ensuring that the interactions of one layers are only with the layers above or below this layer. In addition, a layer at n should only depend for services on the n-1 layer and none other.
  4. Class or Generalization — The units in this structure are classes. Classes can be used to reason about collection of similar behavior or capabilities. Class structures is used for assessing reuse and incremental addition of functionality.

2) Components and Connectors –Component-and-connector structures help answer questions such as What are the major executing components and how do they interact? What are the major shared data stores? Which parts of the system are replicated? How does data flow through the system? What parts of the system can run in parallel? How can the system’s structure change as it executes? These can be further sub-divided into

  1. Processes –The process structure shows processes connected to each other through connectors, snychronizers,exclusions etc. This structure is important when looking at a systems performance and availibility.
  2. Concurrency — This structure allows for determining the areas where there can be a resource contention or where parallelism can be employed.
  3. Shared data — This comprises connectors and components that create,store or access shared persistent data. It shows how data is produced and consumed by the runtime components and can be used to ensure performance and data integrity.
  4. Client -Server — This structure shows the components as client server units and the connectors are the underlying protocols which are used for communication.

3) Allocation Structures — This structure shows the relation between the software elements and the environment in which the software executes. These can be classified as follows

  1. Deployment — The deployment structure shows how software is assigned to hardware-processing and communication elements. The elements are software, hardware entities and communication pathways. Relations are allocated-to, showing on which physical units the software elements reside. This view allows one to reason about performance, data integrity, availability, and security.
  2. Implementation —This structure shows how usually modules are mapped to the file structure in the system’s development, integration, or configuration environments. This is critical for the management of development activities and build processes

Programming Problems Job Interviews Post I

August 25, 2008

Inspired from the programming problems blog, i decided to post a few which demonstrate some interesting concepts as well as being moderately tough. Some of these are inspired by Programming pearls, a gem and a must every developer should read, re-read and possess.Heres the first programming problem.

Given a sequence of characters, find the longest duplicated substring of characters in it.For example for the String ‘ Ask not what your country can do for, but what you can do for your country.’ ‘Can do for you’ is the longest duplicated sub-string. How would you write a program to solve this problem?

Please do not post the solution on the site. You can mail the solutions across to maneesh.chaturvedi@gmail.com or alternatively post the solutions on your own blogs.