ProFAT is a tool that enables functional prediction of protein sequences based on remote sequence similarity. By a combination of sensitive sequence similarity search tools (PSI-BLAST) and threading techniques (Threader 3.5), ProFAT enables the identification of remotely conserved proteins for functional prediction. It furthermore is the first tool that uses the wealth of published literature associated with identified hits for the functional annotation of a query sequence. A user-provided keyword list makes the tool configurable for any biological scenario.
Reference: Bradshaw CR, Surendranath V, Habermann B in BMC Bioinformatics 2006, 7:466: ProFAT: a web-based tool for the functional annotation of protein sequences.



Pipeline of ProFAT:

  • start by entering a protein sequence and a biological keyword list that describes any process or function, your protein might be involved in
  • the first step is a domain search (RPS-BLAST); domain boundaries are used to split the sequence in conserved domains and inter-domain regions; each region/conserved domain is processed independently in the next steps
  • a PSI-BLAST search and a threading step are carried out with selected conserved domains and regions (Annotation Engine and Threading Engine)
  • a simple text mining step searches for user-provided 'keywords' in the annotations (including literature abstracts from PubMed) associated with identified hit sequences by both search methods
  • finally, the output is combined and presented to the user; hits that contain keywords are shown separately to the completely annotated PSI-BLAST and threading results and Gene Ontology (GO) terms are mapped to the PSI-BLAST results.



ProFAT output.
(a) The output of the ProFAT server typically shows a graphical image of the input query with identified CD domains (grey) and keyword-positive (red) or keyword-negative (blue) hits found in the Annotation and the Threading Engine. The conserved domains and regions selected for ProFAT annotation are furthermore shown in tabular format. The number of keyword-positive hits and total hits, as well as a GO-annotation of the output are given. Links go to the annotated output of the two engines (see b).
(b) Formatted results of the Annotation Engine (PSI-BLAST based) show next to E-value and region of similarity between query and database sequence the iteration of PSI-BLAST at which the sequence has been picked up, annotated Features, Pubmed Abstracts and Gene Ontology information. Shown are two abstracts in which the user-provided keywords are highlighted in bold letters.