lf374, SoftwareDevelopment: LINUX & PERL, 学习锟酵凤拷锟斤拷锟斤拷锟斤拷学锟斤拷息锟侥碉拷锟皆癸拷锟斤拷

<-- | 站锟斤拷锟酵� | 锟斤拷锟斤拷 | 锟斤拷锟斤拷

锟斤拷锟斤拷

锟斤拷锟斤拷锟节匡拷

锟斤拷锟斤拷

锟斤拷锟斤拷LF

This document is available in: English Castellano ChineseGB Deutsch Francais

by Carlos Andrés Pérez
<caperez /at/ usc.edu.co>

锟斤拷锟斤拷锟斤拷锟斤拷:

Carlos Andrés Pérez 锟角凤拷锟斤拷模锟斤拷锟阶拷遥锟斤拷锟斤拷锟窖э拷锟绞匡拷锟紾IEV 锟侥硷拷锟斤拷锟斤拷锟斤拷(GIEV, the Grupo de Investigación en Educación Virtual (GIEV) - Research Group in Virtual Learning锟斤拷锟斤拷锟斤拷锟斤拷学习锟斤拷锟斤拷锟叫撅拷小锟斤拷)锟斤拷锟斤拷址: Universidad Santiago de Cali, Calle 5ª carrera 62 Campus Pampalinda, Cali – Colombia.

锟斤拷锟斤拷锟斤拷息学锟斤拷Bioinformatics锟斤拷
Perl
锟斤拷Perl锟斤拷锟侥硷拷锟斤拷锟斤拷:
锟斤拷锟揭帮拷锟斤拷锟斤拷锟侥Ｊ斤拷锟絊earch for aminoacid patterns锟斤拷
锟斤拷锟姐氨锟斤拷锟斤拷锟狡碉拷龋锟紺alculus of aminoacid frequences锟斤拷:
Bibliographic References
锟斤拷锟斤拷篇锟斤拷锟铰凤拷锟斤拷锟斤拷锟斤拷

LINUX & PERL, 学习锟酵凤拷锟斤拷锟斤拷锟斤拷学锟斤拷息锟侥碉拷锟皆癸拷锟斤拷

摘要:

锟斤拷篇锟斤拷锟铰斤拷锟斤拷锟剿讹拷DNA锟斤拷RNA锟酵碉拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟捷匡拷锟斤拷锟斤拷锟斤拷锟较拷锟饺∈憋拷锟斤拷锟経nix锟较碉拷Perl锟斤拷锟斤拷锟揭恍╋拷诺恪ｏ拷锟叫㏄erl锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷荼榷源锟斤拷锟斤拷头锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷苹锟斤拷锟紻NA锟斤拷隆锟斤拷锟斤拷锟侥凤拷展锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟侥斤拷锟斤拷锟斤拷锟斤拷些锟斤拷锟斤拷每锟斤拷锟斤拷锟斤拷拇锟斤拷锟斤拷锟较⑹癸拷锟斤拷锟斤拷谴锟斤拷锟斤拷锟叫╋拷锟较拷姆锟绞斤拷锟斤拷貌锟斤拷锟斤拷锟斤拷慕锟斤拷锟�

锟斤拷同锟斤拷锟斤拷锟介（锟斤拷锟斤拷锟叫伙拷锟斤拷锟较碉拷一锟斤拷锟斤拷锟斤拷霞锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟较拷贫锟斤拷锟斤拷锟斤拷锟斤拷锟较⒀э拷锟轿拷锟斤拷锟斤拷锟斤拷锟斤拷锟叫╋拷锟斤拷莸幕锟斤拷锟斤拷侄巍锟�

_________________ _________________ _________________

锟斤拷锟斤拷锟斤拷息学锟斤拷Bioinformatics锟斤拷

锟斤拷锟斤拷锟斤拷息学锟斤拷始锟节匡拷学锟斤拷锟角斤拷锟斤拷锟斤拷学锟斤拷锟斤拷锟斤拷锟斤拷锟街革拷式锟斤拷挪锟斤拷锟斤拷贸锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟叫╋拷锟斤拷荨锟斤拷艹锟揭伙拷锟绞憋拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟较⒀э拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷械姆锟斤拷锟斤拷稀锟饺伙拷锟斤拷锟斤拷锟斤拷殴锟斤拷锟斤拷锟斤拷拥慕峁鼓ｏ拷偷锟斤拷锟揭拷钥锟绞硷拷锟斤拷郑锟斤拷锟斤拷蛹锟斤拷锟斤拷也锟斤拷始锟斤拷为锟斤拷锟斤拷锟斤拷锟斤化学锟斤拷锟斤拷要锟斤拷锟竭★拷每锟届都锟斤拷锟斤拷锟叫癸拷锟节凤拷锟斤拷3D锟斤拷息锟斤拷锟斤拷锟捷憋拷锟缴硷拷锟斤拷锟斤拷锟角对伙拷锟斤拷锟斤拷锟绞讹拷锟斤拷芯锟揭诧拷拥锟斤拷锟斤拷幕锟斤拷锟斤拷芯锟阶拷锟轿拷锟斤拷锟斤拷锟斤拷匣锟斤拷锟斤拷锟秸故斤拷锟斤拷芯锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟较⒀э拷姆锟秸癸拷锟斤拷锟斤拷诟锟斤拷锟斤拷锟斤拷锟斤拷獾帮拷锟斤拷锟街拷锟轿拷锟斤拷嗷ブ拷锟斤拷锟斤拷锟斤拷锟斤拷谩锟斤拷锟斤拷锟斤拷锟斤拷通锟斤拷锟铰陈达拷谢锟斤拷锟斤拷织锟洁互锟侥★拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷也越锟斤拷越锟斤拷锟窖碉拷锟斤拷识锟斤拷锟斤拷织锟斤拷锟斤拷些锟斤拷锟捷碉拷锟斤拷要锟皆★拷

锟斤拷锟斤拷锟斤拷息学锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷使锟斤拷锟斤拷梅浅锟斤拷锟饺わ拷锟斤拷锟揭伙拷锟斤拷锟斤拷锟斤拷锟较⒀э拷锟斤拷芯锟侥匡拷锟斤拷锟斤拷页锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷拥墓锟较碉拷锟斤拷锟斤拷锟斤拷目锟斤拷恰恰锟斤拷一锟斤拷锟斤拷趣锟侥筹拷锟斤拷锟斤拷锟斤拷锟斤拷猓拷锟轿拷锟斤拷锟斤拷要锟斤拷锟斤拷锟斤拷锟较诧拷锟斤拷锟斤拷锟斤拷锟角得碉拷锟斤拷锟斤拷些锟斤拷息锟斤拷然锟斤拷锟斤拷械玫锟斤拷锟斤拷锟斤拷锟斤拷疃拷锟揭恍╋拷锟斤拷锟侥猴拷锟斤拷效锟斤拷一些锟斤拷识锟斤拷锟斤拷锟角伙拷锟斤拷锟街ｏ拷锟斤拷锟斤拷锟斤拷锟斤拷锟窖э拷械牟锟酵拷锟斤拷锟斤拷知识锟斤拷锟斤拷锟斤拷锟斤拷欠浅锟斤拷锟揭拷模锟斤拷锟斤拷锟斤拷锟斤拷莸墓锟斤拷锟斤拷锟斤拷锟斤拷稀锟斤拷锟叫э拷煽锟斤拷锟斤拷惴拷锟角匡拷锟斤拷锟接诧拷锟斤拷锟斤拷锟斤拷锟姐技锟斤拷锟斤拷锟洁处锟斤拷锟斤拷锟斤拷使锟矫等★拷

Perl

Larry Wall 锟斤拷1986锟疥开始锟斤拷锟斤拷Perl锟斤拷 Perl锟斤拷一锟街斤拷锟斤拷锟酵碉拷锟斤拷锟皆ｏ拷锟角达拷锟斤拷锟侥憋拷锟斤拷锟侥硷拷锟酵斤拷锟教碉拷强锟斤拷墓锟斤拷摺锟絇erl使锟斤拷锟斤拷锟斤拷锟杰癸拷锟杰匡拷目锟斤拷锟斤拷锟叫★拷锟斤拷颉？锟斤拷锟剿碉拷锟絇erl锟角高硷拷锟斤拷锟斤拷锟斤拷裕锟斤拷锟斤拷锟紺锟斤拷锟酵脚憋拷锟斤拷锟皆ｏ拷锟斤拷bash锟斤拷锟斤拷一锟斤拷锟斤拷效锟斤拷稀锟�

Perl锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷诙锟斤拷植锟斤拷锟较低筹拷锟狡教拷希锟斤拷锟斤拷锟絇erl锟斤拷锟斤拷Unix锟较碉拷锟斤拷锟斤拷锟揭匡拷锟劫凤拷展锟侥★拷锟斤拷锟斤拷Perl锟姐泛锟斤拷锟斤拷锟斤拷web锟斤拷锟斤拷锟斤拷疲锟斤拷浞⒄癸拷芸锟姐超锟斤拷锟斤拷锟斤拷预锟诫。锟斤拷Perl之前锟斤拷锟斤拷锟斤拷使锟斤拷awk,thirst锟斤拷grep 锟斤拷锟斤拷锟斤拷锟侥硷拷锟斤拷锟斤拷取锟斤拷息锟斤拷

Perl锟斤拷锟斤拷些UNIX锟较广泛使锟矫的癸拷锟斤拷统一锟斤拷一锟斤拷锟斤拷锟斤拷锟斤拷锟芥，锟斤拷锟斤拷锟斤拷些锟斤拷锟斤拷锟斤拷展锟斤拷锟街达拷锟斤拷锟斤拷锟斤拷应锟斤拷锟斤拷锟斤拷锟斤拷

Perl锟斤拷一锟斤拷锟斤拷眩锟斤拷锟斤拷傻某锟斤拷锟斤拷锟斤拷裕锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷执锟斤拷锟斤拷锟绞碉拷锟斤拷锟斤拷锟绞癸拷玫母锟斤拷植锟斤拷锟较低筹拷稀锟斤拷锟経NIX锟斤拷MacOSX锟较ｏ拷锟斤拷锟斤拷预锟斤拷装锟矫的ｏ拷锟斤拷锟斤拷锟斤拷系统锟较ｏ拷锟斤拷锟饺帮拷装锟斤拷Perl锟斤拷http://www.cpan.org 锟斤拷站锟斤拷锟叫帮拷装锟斤拷使锟斤拷Perl锟侥很讹拷实锟斤拷锟斤拷息锟斤拷

锟斤拷Linux锟铰ｏ拷锟斤拷锟斤拷Perl锟斤拷锟斤拷锟角斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟侥硷拷锟斤拷锟斤拷为perl 锟斤拷锟斤拷锟斤拷锟斤拷一锟斤拷锟斤拷锟斤拷锟斤拷然锟斤拷perl 锟斤拷锟斤拷锟轿斤拷锟斤拷执锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟筋。

锟斤拷一锟街筹拷锟矫的凤拷锟斤拷锟斤拷锟斤拷锟斤拷要锟斤拷锟斤拷perl 锟斤拷锟斤拷锟斤拷睿拷耍锟斤拷锟斤拷锟斤拷锟揭拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷拢锟� (a)锟节筹拷锟斤拷锟斤拷募锟斤拷锟斤拷锟斤拷一锟斤拷锟斤拷锟斤拷锟阶拷停锟�

#!/usr/bin/env perl

print "Hi\n";

(b) 锟斤拷锟斤拷锟斤拷募锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷峡锟街达拷械锟斤拷锟斤拷裕锟�

% chmod +x greetings.pl

锟斤拷锟斤拷锟斤拷锟斤拷锟角就匡拷锟斤拷直锟斤拷通锟斤拷锟侥硷拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟�

% ./greetings.pl

锟斤拷Perl锟斤拷锟侥硷拷锟斤拷锟斤拷:

锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟侥憋拷锟斤拷式锟侥凤拷锟斤拷锟斤拷锟叫ｏ拷锟斤拷锟角匡拷锟斤拷锟斤拷Perl写一锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟竭★拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷强锟斤拷钥锟斤拷锟斤拷锟斤拷锟斤拷SWISS-PROT(db_human_swissprot)锟斤拷式锟斤拷锟斤拷锟捷匡拷锟斤拷锟斤拷id锟斤拷锟斤拷锟斤拷锟揭碉拷锟斤拷锟斤拷锟斤拷锟叫★拷

#!/usr/bin/perl

# Look for aminoacid sequence in a database

# SWISS-PROT formated, with a given id code

# Ask for the code in the ID field

# and it assigns it from the input(STDIN)to a variable

print "Enter the ID to search: ";
$id_query=<STDIN>;
chomp $id_query;
# We open the database file

# but if it isn't possible the program ends

open (db, "human_kinases_swissprot.txt") ||
 die "problem opening the file human_kinases_swissprot.txt\n";
# Look line by line in the database

while (<db>) {
chomp $_;
# Check if we are in the ID field
if ($_ =~ /^ID/) {
# If it is possitive we gather the information

# breaking the line by spaces

($a1,$id_db) = split (/\s+/,$_);
# but if there is no coincidence of ID we continue to the following

next if ($id_db ne $id_query);
# When they coincide, we put a mark

$signal_good=1;
# Then we check the sequence field

# and if the mark is 1 (chosen sequence)
# If possitive, we change the mark to 2,to collect the sequence

} elsif (($_ =~ /^SQ/) && ($signal_good==1)) {
$signal_good=2;
# Finally, if the mark is 2, we present each line

# of the sequence, until the line begins with //
# is such case we broke the while
} elsif ($signal_good == 2) {
last if ($_ =~ /^\/\//);
print "$_\n";
}
}
# When we left the while instruction we check the mark

# if negative that means that we don't find the chosen sequence

# that will give us an error

if (!$signal_good) {
print "ERROR: "."Sequence not found\n";
}
# Finally, we close the file
# that still si open

close (db);
exit;

锟斤拷锟揭帮拷锟斤拷锟斤拷锟侥Ｊ斤拷锟絊earch for aminoacid patterns锟斤拷

#!/usr/bin/perl
# Searcher for aminoacid patterns
# Ask the user the patterns for search
print "Please, introduce the pattern to search in query.seq: ";
$patron = <STDIN>;
chomp $patron;
# Open the database file
# but if it can't it ends the program
open (query, "query_seq.txt") || die "problem opening the file query_seq.txt\n";
# Look line by line the SWISS-PROT sequence
while (<query>) {
chomp $_;
# When arrives to the SQ field,put the mark in 1

   if ($_ =~ /^SQ/) {

         $signal_seq = 1;
# When arrive to the end of sequence, leave the curl

# Check that this expression is put before to check

# the mark=1,because this line doesn't belong to the aminoacid sequence

   } elsif ($_ =~ /^\/\//) {

         last;
# Check the mark if it is equal to 1, if possitive

# eliminate the blank spaces in the sequence line

# and join every line in a new variable

# To concatenate, we also can do:

# $secuencia_total.=$_;

   } elsif ($signal_seq == 1) {

         $_ =~ s/ //g;

         $secuencia_total=$secuencia_total.$_;

   }

  }
# Now check the sequence, collected in its entirety,

# for the given pattern

  if ($secuencia_total =~ /$patron/) {

   print "The sequence query.seq contains the pattern $patron\n";

  } else {

   print "The sequence query.seq doesn't contains the pattern $patron\n";

  }
# Finally we close the file

# and leave the program

close (query);

exit;

锟斤拷锟斤拷锟街拷锟斤拷锟斤拷菘锟斤拷锟侥Ｊ斤拷木锟斤拷锟轿伙拷茫锟斤拷锟斤拷潜锟斤拷锟绞癸拷锟斤拷锟斤拷锟斤拷锟斤拷`$&'锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷诙锟斤拷锟斤拷锟斤拷锟斤拷式锟斤拷值锟斤拷锟斤拷然锟斤拷锟斤拷锟斤拷锟揭碉拷锟斤拷模式锟斤拷应锟矫斤拷锟斤拷锟斤拷锟斤拷`if ($$secuencia_total>= ~/$$patron>/ 一锟斤拷暮锟斤拷妫╋拷锟斤拷锟斤拷猓拷锟斤拷越锟斤拷锟斤拷锟絗 $ ` ' 锟斤拷` $ ´ '锟斤拷锟斤拷锟斤拷锟绞癸拷茫锟斤拷锟斤拷腔峤拷业锟斤拷锟侥Ｊ斤拷锟斤拷锟斤拷锟轿伙拷玫锟斤拷锟较拷锟斤拷妗ｏ拷锟斤拷锟叫╋拷锟斤拷锟斤拷锟饺凤拷募锟斤拷锟角帮拷锟侥筹拷锟斤拷锟叫ｏ拷锟斤拷锟角就匡拷锟皆革拷锟斤拷模式锟斤拷确锟斤拷位锟矫★拷注锟解：length也锟角非筹拷锟斤拷锟矫的ｏ拷锟斤拷锟斤拷锟斤拷锟揭伙拷锟斤拷锟斤拷莸某锟斤拷取锟�

# Only we need to change the if where the pattern was found # Now check the sequence, collected in its entirety,
# for the given pattern
# and check its position in the sequence
if ($secuencia_total =~ /$patron/) {
$posicion=length($`)+1;
print "The sequence query_seq.txt contains the pattern $patron in the following position $posicion\n"; } else {
print "The sequence query_seq.txt doesn't contains the pattern $patron\n";
}

锟斤拷锟姐氨锟斤拷锟斤拷锟狡碉拷龋锟紺alculus of aminoacid frequences锟斤拷:

锟斤拷同锟斤拷锟斤拷锟斤拷锟斤，锟截讹拷锟侥帮拷锟斤拷锟斤拷锟斤拷值锟狡碉拷锟斤拷遣锟酵拷模锟斤拷锟斤拷锟斤拷锟轿拷锟斤拷谴锟斤拷诓锟酵拷幕锟斤拷锟斤拷锟斤拷妗拷锟斤拷夜锟斤拷懿锟酵拷锟斤拷锟斤拷妫拷锟斤拷歉锟斤拷锟揭伙拷锟斤拷锟斤拷锟斤拷锟秸故撅拷锟轿硷拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟侥筹拷职锟斤拷锟斤拷锟狡碉拷取锟�

#!/usr/bin/perl # Calculates the frequency of aminoacid in a proteinic sequence # Gets the file name from the command line # (SWISS-PROT formatted) # Also can be asked with print from the <STDIN> if (!$ARGV[0]) {print "The execution line shall be: program.pl file_swissprot\n";} $fichero = $ARGV[0]; # Initialize the variable $errores my $errores=0; # Open the file for reading open (FICHA, "$fichero") || die "problem opening the file $fichero\n"; # First we check the sequence as did in the example 2 while (<FICHA>) { chomp $_; if ($_ =~ /^SQ/) { $signal_good = 1; } elsif ($signal_good == 1) { last if ($_ =~ /^\/\//); $_ =~ s/\s//g; $secuencia.=$_; } } close (FICHA); # Now use a curl that checks every position of the aminoacid # in the sequence (from a funcion of its own,that can be used after in other # programs) comprueba_aa ($secuencia); # Print the results to the screen # First the 20 aminoacids and then the array with their frequencies # In this case 'sort' can't be used in foreach, # because the array contains the frequencies (numbers) print"A\tC\tD\tE\tF\tG\tH\tI\tK\tL\tM\tN\tP\tQ\tR\tS\tT\tV\tW\tY\n"; foreach $each_aa (@aa) { print "$each_aa\t"; } # Ten it gives the possible errors # and ends the program print "\nerrores = $errores\n"; exit; # Functions # This one calculates each aminoacid frequency # from a proteinic sequence sub comprueba_aa { # Gets the sequence my ($secuencia)=@_; # and runs aminoacid by aminoacid, using a for running # from 0 until the sequence length for ($posicion=0 ; $posicion<length $secuencia ; $posicion++ ) { # Gets the aminoacid $aa = substr($secuencia, $posicion, 1); # and checks which one is using if # when it is checked it aggregates 1 to the correspondant frequency # in an array using a pointer for each one # ordered in alphabetic way if ( $aa eq 'A' ) { $aa[0]++; } elsif ( $aa eq 'C' ) { $aa[1]++; } elsif ( $aa eq 'D' ) { $aa[2]++; } elsif ( $aa eq 'E' ) { $aa[3]++; } elsif ( $aa eq 'F' ) { $aa[4]++; } elsif ( $aa eq 'G' ) { $aa[5]++; } elsif ( $aa eq 'H' ) { $aa[6]++; } elsif ( $aa eq 'I' ) { $aa[7]++; } elsif ( $aa eq 'K' ) { $aa[8]++; } elsif ( $aa eq 'L' ) { $aa[9]++; } elsif ( $aa eq 'M' ) { $aa[10]++; } elsif ( $aa eq 'N' ) { $aa[11]++; } elsif ( $aa eq 'P' ) { $aa[12]++; } elsif ( $aa eq 'Q' ) { $aa[13]++; } elsif ( $aa eq 'R' ) { $aa[14]++; } elsif ( $aa eq 'S' ) { $aa[15]++; } elsif ( $aa eq 'T' ) { $aa[16]++; } elsif ( $aa eq 'V' ) { $aa[17]++; } elsif ( $aa eq 'W' ) { $aa[18]++; } elsif ( $aa eq 'Y' ) { $aa[19]++; # If the aminoacid is not found # it aggregates 1 to the errors } else { print "ERROR: Aminoacid not found: $aa\n"; $errores++; } } # Finally returns to the frequency array return @aa; }

锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷歉锟斤拷糯锟斤拷锟饺伙拷牟锟斤拷锟斤拷锟斤拷锟斤拷锟较革拷锟斤拷械锟斤拷锟较拷锟斤拷锟斤拷撕畏锟斤拷锟斤拷锟斤拷锟街伙拷锟斤拷锟阶硷拷锟絉NA 锟斤拷DNA锟斤拷锟斤拷锟斤拷锟叫革拷锟狡筹拷锟脚达拷锟斤拷息锟斤拷然锟斤拷锟街斤拷锟斤拷些锟斤拷息锟斤拷锟捷革拷锟斤拷锟斤拷锟绞伙拷锟竭帮拷锟斤拷锟斤拷锟斤拷锟叫★拷为锟剿ｏ拷锟斤拷锟角憋拷锟斤拷使锟斤拷锟诫氨锟斤拷锟斤拷锟接︼拷幕锟斤拷锟斤拷锟斤拷锟�--锟斤拷谓锟斤拷RNA锟斤拷DNA锟斤拷锟斤拷锟斤拷锟斤拷锟接★拷锟斤拷锟斤拷要锟斤拷取Escherichia coli锟斤拷一锟街帮拷[锟斤拷锟斤拷]希锟较杆撅拷锟斤拷锟侥大肠杆撅拷锟斤拷 锟侥伙拷锟斤拷锟斤拷锟斤拷应锟侥帮拷锟斤拷锟斤拷锟斤拷锟叫ｏ拷锟斤拷锟斤拷些锟斤拷息锟斤拷锟斤拷锟斤拷EMBL锟斤拷European Molecular Biology Laboratory锟斤拷要锟斤拷母锟绞斤拷锟斤拷锟斤拷锟斤拷锟叫┳拷锟街拷锟斤拷锟斤拷墙锟斤拷锟斤拷锟斤拷械锟阶硷拷锟较⑿ｏ拷椤ｏ拷锟斤拷锟斤拷锟斤拷锟接ｏ拷锟角筹拷锟叫憋拷要锟斤拷锟斤拷锟斤拷锟斤拷墓锟斤拷锟斤拷锟斤拷锟斤拷锟絘ssociative variables of arrays锟斤拷锟酵癸拷希锟斤拷锟斤拷

#!/usr/bin/perl # Translates an ADN sequence from an EMBL fiche # to the aminoacid correspondant # Gets the file name from the command line # (SWISS-PROT formatted) # Also can be asked with print from the <STDIN> if (!$ARGV[0]) {print "The program line shall be: program.pl ficha_embl\n";} $fichero = $ARGV[0]; # Open the file for reading open (FICHA, "$fichero") || die "problem opening the file $fichero\n"; # First we check the sequence as did in the example 2 while (<FICHA>) { chomp $_; if ($_ =~ /^FT CDS/) { $_ =~ tr/../ /; ($a1,$a2,$a3,$a4) = split (" ",$_); } elsif ($_ =~ /^SQ/) { $signal_good = 1; } elsif ($signal_good == 1) { last if ($_ =~ /^\/\//); # Eliminate numbers and spaces $_ =~ tr/0-9/ /; $_ =~ s/\s//g; $secuencia.=$_; } } close (FICHA); # Now we define an associate array with the correpondence # of every aminoacids with their nucleotide # correspondants (also in an own function, # for if the same genetic code is used in other program my(%codigo_genetico) = ( 'TCA' => 'S',# Serine 'TCC' => 'S',# Serine 'TCG' => 'S',# Serine 'TCT' => 'S',# Serine 'TTC' => 'F',# Fenilalanine 'TTT' => 'F',# Fenilalanine 'TTA' => 'L',# Leucine 'TTG' => 'L',# Leucine 'TAC' => 'Y',# Tirosine 'TAT' => 'Y',# Tirosine 'TAA' => '*',# Stop 'TAG' => '*',# Stop 'TGC' => 'C',# Cysteine 'TGT' => 'C',# Cysteine 'TGA' => '*',# Stop 'TGG' => 'W',# Tryptofane 'CTA' => 'L',# Leucine 'CTC' => 'L',# Leucine 'CTG' => 'L',# Leucine 'CTT' => 'L',# Leucine 'CCA' => 'P',# Proline 'CCC' => 'P',# Proline 'CCG' => 'P',# Proline 'CCT' => 'P',# Proline 'CAC' => 'H',# Hystidine 'CAT' => 'H',# Hystidine 'CAA' => 'Q',# Glutamine 'CAG' => 'Q',# Glutamine 'CGA' => 'R',# Arginine 'CGC' => 'R',# Arginine 'CGG' => 'R',# Arginine 'CGT' => 'R',# Arginine 'ATA' => 'I',# IsoLeucine 'ATC' => 'I',# IsoLeucine 'ATT' => 'I',# IsoLeucine 'ATG' => 'M',# Methionina 'ACA' => 'T',# Treonina 'ACC' => 'T',# Treonina 'ACG' => 'T',# Treonina 'ACT' => 'T',# Treonina 'AAC' => 'N',# Asparagina 'AAT' => 'N',# Asparagina 'AAA' => 'K',# Lisina 'AAG' => 'K',# Lisina 'AGC' => 'S',# Serine 'AGT' => 'S',# Serine 'AGA' => 'R',# Arginine 'AGG' => 'R',# Arginine 'GTA' => 'V',# Valine 'GTC' => 'V',# Valine 'GTG' => 'V',# Valine 'GTT' => 'V',# Valine 'GCA' => 'A',# Alanine 'GCC' => 'A',# Alanine 'GCG' => 'A',# Alanine 'GCT' => 'A',# Alanine 'GAC' => 'D',# Aspartic Acid 'GAT' => 'D',# Aspartic Acid 'GAA' => 'E',# Glutamic Acid 'GAG' => 'E',# Glutamic Acid 'GGA' => 'G',# Glicine 'GGC' => 'G',# Glicine 'GGG' => 'G',# Glicine 'GGT' => 'G',# Glicine ); # Translate every codon in its correspondant aminoacid # and aggregates to the proteinic sequence print $a3; for($i=$a3 - 1; $i < $a4 - 3 ; $i += 3) { $codon = substr($secuencia,$i,3); # Pass the codon from subcase (EMBL format) to uppercase $codon =~ tr/a-z/A-Z/; $protein.= codon2aa($codon); } print "This proteinic sequence of the gen:\n$secuencia\nis the following:\n$protein\n\n"; exit;

Bibliographic References

http://bioperl.org/

http://changjiang.whlib.ac.cn/pylorus/download/book/Beginning%20Perl%20for%20Bioinformatics/contents.html

http://www.unix.org.ua/orelly/perl/prog3/

Example files :
- human_kinases_swissprot.txt
- query_seq.txt
- ecoli_embl.txt

锟斤拷锟斤拷篇锟斤拷锟铰凤拷锟斤拷锟斤拷锟斤拷
每篇锟斤拷锟铰讹拷锟叫革拷锟皆的凤拷锟斤拷页锟芥。锟斤拷锟斤拷锟揭筹拷锟斤拷铮拷锟斤拷锟斤拷锟斤拷峤伙拷锟斤拷郏锟揭诧拷锟斤拷圆榭达拷锟斤拷锟斤拷锟斤拷叩锟斤拷锟斤拷郏锟�

锟斤拷锟斤拷页锟斤拷

<--, LF 锟斤拷页

Go to the index of this issue

锟斤拷页锟斤拷LinuxFocus锟洁辑锟斤拷维锟斤拷
© Carlos Andrés Pérez
"some rights reserved" see linuxfocus.org/license/
http://www.LinuxFocus.org 锟斤拷锟斤拷锟斤拷息:

es --> -- : Carlos Andrés Pérez <caperez /at/ usc.edu.co>

en --> CN: 锟斤拷锟斤拷 <daxiawj(Q)gmail.com>

2005-05-06, generated by lfparser version 2.52