So I had to write a perl script that finds all the k-mere of length 15 in a 2L chromosome of Drosophila melanogaster. Lead them into a hash, and counts the number of occurrences of each k-mer. A k-mer is a sequence of length K taken from a longer sequence.  I need to loop through the hash and print each k-mer on a line, followed by a tab, then the number of occurrences of that k-mer. Then I have open a file handle for writing output to uniqueKmersEndingGG.fasta, change the window length from 15-23, go through the hash of k-mere and only print out the first 1000 that occur and end with GG, put a FASTA header before each k-mer.

---------------------------------------------------------------------------------------------------------------------------------------

 

So I had to write a perl script that

·         finds all the k-mere of length 15 in a 2L chromosome of Drosophila melanogaster.

·         Lead them into a hash

·         counts the number of occurrences of each k-mer.

 

A k-mer is a sequence of length K taken from a longer sequence.

 

I need to loop through the hash and print each k-mer on a line, followed by a tab, then the number of occurrences of that k-mer.

 

Then I have open a file handle for writing output to uniqueKmersEndingGG.fasta, change the window length from 15-23, go through the hash of k-mere and only print out the first 1000 that occur and end with GG, put a FASTA header before each k-mer.

 

Vocab:

·         Drosophila melanogaster : fruit fly

·         2L chromosome of Drosophila melanogaster

·         k-mere of length 15

·         FASTA

·         K-mere length 15 end with GG 2L chromosome of Drosophila melanogaster

·          

Perl References

·         http://stackoverflow.com/questions/5948360/perl-read-a-file-into-an-array

·         http://www.perlmonks.org/?node_id=73439

 

 

Biology References

·         http://flybase.org/reports/FBsp00000001.html

·        

·         https://www.biostars.org/p/16396/

·        mysql  --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,size from chromInfo limit 5'

·        The sequences are long strings : http://genome.ucsc.edu/cgi-bin/hgTracks?db=dm3&chromInfoPage=

·        http://blast.ncbi.nlm.nih.gov/Blast.cgi

·        FASTA Format http://prodata.swmed.edu/promals/info/fasta_format_file_example.htm

·        http://en.wikipedia.org/wiki/FASTA_format

·        http://code.izzid.com/2011/10/13/How-to-write-a-fasta-file-in-perl.html


 

 

mysql  --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,size from chromInfo limit 5'

 

--defaults-file=#       Only read default options from the given file #.

--defaults-extra-file=# Read this file after the global files are read.

--defaults-group-suffix=#

                        Also read groups with concat(group, suffix)

--login-path=#          Read this path from the login file.

 

Variables (--variable-name=value)

and boolean options {FALSE|TRUE}  Value (after reading options)

--------------------------------- ----------------------------------------

auto-rehash                       TRUE

auto-vertical-output              FALSE

bind-address                      (No default value)

character-sets-dir                (No default value)

column-type-info                  FALSE

comments                          FALSE

compress                          FALSE

debug-check                       FALSE

debug-info                        FALSE

database                          hg19

default-character-set             auto

delimiter                         ;

enable-cleartext-plugin           FALSE

vertical                          FALSE

force                             FALSE

named-commands                    FALSE

ignore-spaces                     FALSE

init-command                      (No default value)

local-infile                      FALSE

no-beep                           FALSE

host                              genome-mysql.cse.ucsc.edu

html                              FALSE

xml                               FALSE

line-numbers                      TRUE

unbuffered                        FALSE

column-names                      TRUE

sigint-ignore                     FALSE

port                              0

prompt                            mysql>

quick                             FALSE

raw                               FALSE

reconnect                         FALSE

shared-memory-base-name           (No default value)

socket                            (No default value)

ssl                               FALSE

ssl-ca                            (No default value)

ssl-capath                        (No default value)

ssl-cert                          (No default value)

ssl-cipher                        (No default value)

ssl-key                           (No default value)

ssl-crl                           (No default value)

ssl-crlpath                       (No default value)

ssl-verify-server-cert            FALSE

table                             FALSE

user                              genome

safe-updates                      FALSE

i-am-a-dummy                      FALSE

connect-timeout                   0

max-allowed-packet                16777216

net-buffer-length                 16384

select-limit                      1000

max-join-size                     1000000

secure-auth                       TRUE

show-warnings                     FALSE

plugin-dir                        (No default value)

default-auth                      (No default value)

histignore                        (No default value)

binary-mode                       FALSE

connect-expired-password          FALSE

 

C:\Program Files\MySQL\MySQL Workbench CE 6.1.6>mysql  --user=genome --host=geno

me-mysql.cse.ucsc.edu  -D hg19

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 14577255

Server version: 5.6.10-log MySQL Community Server (GPL)

 

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

 

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

 

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

 

mysql> describe chromeInfo

    -> ;

ERROR 1146 (42S02): Table 'hg19.chromeInfo' doesn't exist

mysql> describe chromInfo

    -> ;

+----------+------------------+------+-----+---------+-------+

| Field    | Type             | Null | Key | Default | Extra |

+----------+------------------+------+-----+---------+-------+

| chrom    | varchar(255)     | NO   | PRI |         |       |

| size     | int(10) unsigned | NO   |     | 0       |       |

| fileName | varchar(255)     | YES  |     | NULL    |       |

+----------+------------------+------+-----+---------+-------+

3 rows in set (0.10 sec)

 

 

mysql> select * from chromInfo;

+-----------------------+-----------+----------------------+

| chrom                 | size      | fileName             |

+-----------------------+-----------+----------------------+

| chr1                  | 249250621 | /gbdb/hg19/hg19.2bit |

| chr2                  | 243199373 | /gbdb/hg19/hg19.2bit |

| chr3                  | 198022430 | /gbdb/hg19/hg19.2bit |

| chr4                  | 191154276 | /gbdb/hg19/hg19.2bit |

| chr5                  | 180915260 | /gbdb/hg19/hg19.2bit |

| chr6                  | 171115067 | /gbdb/hg19/hg19.2bit |

| chr7                  | 159138663 | /gbdb/hg19/hg19.2bit |

| chrX                  | 155270560 | /gbdb/hg19/hg19.2bit |

| chr8                  | 146364022 | /gbdb/hg19/hg19.2bit |

| chr9                  | 141213431 | /gbdb/hg19/hg19.2bit |

| chr10                 | 135534747 | /gbdb/hg19/hg19.2bit |

| chr11                 | 135006516 | /gbdb/hg19/hg19.2bit |

| chr12                 | 133851895 | /gbdb/hg19/hg19.2bit |

| chr13                 | 115169878 | /gbdb/hg19/hg19.2bit |

| chr14                 | 107349540 | /gbdb/hg19/hg19.2bit |

| chr15                 | 102531392 | /gbdb/hg19/hg19.2bit |

| chr16                 |  90354753 | /gbdb/hg19/hg19.2bit |

| chr17                 |  81195210 | /gbdb/hg19/hg19.2bit |

| chr18                 |  78077248 | /gbdb/hg19/hg19.2bit |

| chr20                 |  63025520 | /gbdb/hg19/hg19.2bit |

| chrY                  |  59373566 | /gbdb/hg19/hg19.2bit |

| chr19                 |  59128983 | /gbdb/hg19/hg19.2bit |

| chr22                 |  51304566 | /gbdb/hg19/hg19.2bit |

| chr21                 |  48129895 | /gbdb/hg19/hg19.2bit |

| chr6_ssto_hap7        |   4928567 | /gbdb/hg19/hg19.2bit |

| chr6_mcf_hap5         |   4833398 | /gbdb/hg19/hg19.2bit |

| chr6_cox_hap2         |   4795371 | /gbdb/hg19/hg19.2bit |

| chr6_mann_hap4        |   4683263 | /gbdb/hg19/hg19.2bit |

| chr6_apd_hap1         |   4622290 | /gbdb/hg19/hg19.2bit |

| chr6_qbl_hap6         |   4611984 | /gbdb/hg19/hg19.2bit |

| chr6_dbb_hap3         |   4610396 | /gbdb/hg19/hg19.2bit |

| chr17_ctg5_hap1       |   1680828 | /gbdb/hg19/hg19.2bit |

| chr4_ctg9_hap1        |    590426 | /gbdb/hg19/hg19.2bit |

| chr1_gl000192_random  |    547496 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000225        |    211173 | /gbdb/hg19/hg19.2bit |

| chr4_gl000194_random  |    191469 | /gbdb/hg19/hg19.2bit |

| chr4_gl000193_random  |    189789 | /gbdb/hg19/hg19.2bit |

| chr9_gl000200_random  |    187035 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000222        |    186861 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000212        |    186858 | /gbdb/hg19/hg19.2bit |

| chr7_gl000195_random  |    182896 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000223        |    180455 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000224        |    179693 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000219        |    179198 | /gbdb/hg19/hg19.2bit |

| chr17_gl000205_random |    174588 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000215        |    172545 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000216        |    172294 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000217        |    172149 | /gbdb/hg19/hg19.2bit |

| chr9_gl000199_random  |    169874 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000211        |    166566 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000213        |    164239 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000220        |    161802 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000218        |    161147 | /gbdb/hg19/hg19.2bit |

| chr19_gl000209_random |    159169 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000221        |    155397 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000214        |    137718 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000228        |    129120 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000227        |    128374 | /gbdb/hg19/hg19.2bit |

| chr1_gl000191_random  |    106433 | /gbdb/hg19/hg19.2bit |

| chr19_gl000208_random |     92689 | /gbdb/hg19/hg19.2bit |

| chr9_gl000198_random  |     90085 | /gbdb/hg19/hg19.2bit |

| chr17_gl000204_random |     81310 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000233        |     45941 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000237        |     45867 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000230        |     43691 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000242        |     43523 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000243        |     43341 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000241        |     42152 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000236        |     41934 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000240        |     41933 | /gbdb/hg19/hg19.2bit |

| chr17_gl000206_random |     41001 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000232        |     40652 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000234        |     40531 | /gbdb/hg19/hg19.2bit |

| chr11_gl000202_random |     40103 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000238        |     39939 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000244        |     39929 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000248        |     39786 | /gbdb/hg19/hg19.2bit |

| chr8_gl000196_random  |     38914 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000249        |     38502 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000246        |     38154 | /gbdb/hg19/hg19.2bit |

| chr17_gl000203_random |     37498 | /gbdb/hg19/hg19.2bit |

| chr8_gl000197_random  |     37175 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000245        |     36651 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000247        |     36422 | /gbdb/hg19/hg19.2bit |

| chr9_gl000201_random  |     36148 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000235        |     34474 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000239        |     33824 | /gbdb/hg19/hg19.2bit |

| chr21_gl000210_random |     27682 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000231        |     27386 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000229        |     19913 | /gbdb/hg19/hg19.2bit |

| chrM                  |     16571 | /gbdb/hg19/hg19.2bit |

| chrUn_gl000226        |     15008 | /gbdb/hg19/hg19.2bit |

| chr18_gl000207_random |      4262 | /gbdb/hg19/hg19.2bit |

+-----------------------+-----------+----------------------+

93 rows in set (0.10 sec)