esProc Joins Text files and Generates a Computed Column

There are two tab-separated structured text files. chr column in AssociatedMarkers.txt is the logical foreign key pointing toChr column in DiseaseMarkers.txt. We want to create a new structured text file, in which one column comes from AssociatedMarkers.txt’s snps_BCG24 column and the other is a computed column that will get its values through the following algorithm: If a value of AssociatedMarkers.txt’s hg19pos column falls within the startLoc and endLoc in DiseaseMarkers.txt, then output it as inLocus; otherwise output it as an empty string. Selections of the two files are as follows:

AssociatedMarkers.txt

esProc_text_computed_column_1

DiseaseMarkers.txt

esProc_text_computed_column_2

esProc approach

esProc_text_computed_column_3

A1,A2: Import the files into memory. @t means importing column names at the same time.

A3: Perform join operation. Result is as follows:

esProc_text_computed_column_4

A4: Retrieve desired columns from A3. _1.hg19pos column corresponds AssociatedMarkers.txt’s hg19pos column. The final result is as follows:

esProc_text_computed_column_5

Advertisements

About datathinker

a technical consultant on Database performance optimization, Database storage expansion, Off-database computation. personal blog at: datakeywrod, website: raqsoft
This entry was posted in esProc/R/Python/Perl, Structured Data Process and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s