GET scoreset
============

get a scoreset from MaveDB via the API
--------------------------------------

To begin, import the modeules below.

.. code:: ipython3

    import attr, os
    from pprint import PrettyPrinter
    from mavetools.client.client import Client
    from mavetools.models.scoreset import ScoreSet

Pretty printer is used to format the output nicely.

.. code:: ipython3

    pp = PrettyPrinter(indent=2)

Here your base_url is set to localhost, http://127.0.0.1:8000/api/. This
default funcionality is what you would want to use when working with a
local instance of MaveDB (e.g., a development branch). If working with
production mavedb you would set base url to https://www.mavedb.org/api/.

In the cell below, comment out the base_url you will not be using.

.. code:: ipython3

    base_url = 'http://127.0.0.1:8000/api/'
    #base_url = 'https://www.mavedb.org/api/'

Set experiment_urn to match the scoreset you want to get.

.. code:: ipython3

    scoreset_urn = 'urn:mavedb:00000001-a-1'

Next, you will need an auth_token to make POST requests to MaveDB. If
you have one, substitute it in the example provided below. If you need
one, please follow these instructions:

::

   1. go to https://www.mavedb.org
   2. login using your ORCID ID
   3. go to settings
   4. generate new auth token
   5. copy auth token and pase it in the auth_token field below

.. code:: ipython3

    # this is an example of what your auth_token should look like
    auth_token = 'R2skRbpBD3Rsf5dNHoQxDZevdEE74T5lCKMFyBhBwwPFH4ZfTrxDz7TZ0kbFLtEZ'

Here you instantiate the Client object. The Client object is the object
by which the POST request is performed. The client object is
instantiated with the value of base_url provided earlier, so make sure
that is up-to-date. If base_url does not exist, base_url is defaulted to
localhost, http://127.0.0.1:8000/api/.

.. code:: ipython3

    client = Client(base_url, auth_token=auth_token) if base_url else Client(auth_token=auth_token)

GET the model instance by passing the model type (Scoreset, in this
instance) and the scoreset_urn as arguments to the get_model_istance
funtion that operates on the Client object. This will GET the model
instance (resource) from the server via the approprate API endpoint.

.. code:: ipython3

    scoreset = client.get_model_instance(ScoreSet, scoreset_urn)

Now, display the results!

.. code:: ipython3

    pp.pprint(attr.asdict(scoreset))


.. parsed-literal::

    { 'abstract_text': 'Although we now routinely sequence human genomes, we can '
                       'confidently identify only a fraction of the sequence '
                       'variants that have a functional impact. Here, we developed '
                       'a deep mutational scanning framework that produces '
                       'exhaustive maps for human missense variants by combining '
                       'random codon mutagenesis and multiplexed functional '
                       'variation assays with computational imputation and '
                       'refinement. We applied this framework to four proteins '
                       'corresponding to six human genes: UBE2I (encoding SUMO E2 '
                       'conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 '
                       '(thiamin pyrophosphokinase), and CALM1/2/3 (three genes '
                       'encoding the protein calmodulin). The resulting maps '
                       'recapitulate known protein features and confidently '
                       'identify pathogenic variation. Assays potentially amenable '
                       'to deep mutational scanning are already available for 57% '
                       'of human disease genes, suggesting that DMS could '
                       'ultimately map functional variation for all human disease '
                       'genes. \r\n'
                       '\r\n'
                       'See [**Weile *et al.* '
                       '2017**](http://msb.embopress.org/content/13/12/957)',
      'approved': None,
      'contributors': ['0000-0003-1628-9390'],
      'count_columns': ['hgvs_nt', 'hgvs_splice', 'hgvs_pro'],
      'created_by': '0000-0003-1628-9390',
      'creation_date': '2018-06-26',
      'current_version': 'urn:mavedb:00000001-a-1',
      'data_usage_policy': '',
      'dataset_columns': None,
      'doi_ids': [],
      'experiment': 'urn:mavedb:00000001-a',
      'extra_metadata': {},
      'is_meta_analysis': False,
      'keywords': [ {'text': 'DMS-BarSeq'},
                    {'text': 'E2'},
                    {'text': 'sumoylation'},
                    {'text': 'imputation'},
                    {'text': 'DMS-TileSeq'},
                    {'text': 'complementation'}],
      'last_child_value': None,
      'licence': { 'link': 'https://creativecommons.org/licenses/by/4.0/',
                   'long_name': 'CC BY 4.0 (Attribution)',
                   'short_name': 'CC BY 4.0',
                   'version': '4.0'},
      'method_text': '##Scoring procedure:\r\n'
                     'DMS-BarSeq and DMS-TileSeq reads were processed using the '
                     '[dmsPipeline](https://bitbucket.org/rothlabto/dmspipeline) '
                     'software. Briefly, Barseq read counts were used to establish '
                     'relative frequencies of each strain at each timepoint and '
                     'converted to estimates of absolute frequencies using OD '
                     'measurement data. Absolute counts were used to establish '
                     'growth curves from which fitness parameters were estimated '
                     'and then normalized to 0-1 scale where 0 corresponds to null '
                     'controls and 1 corresponds to WT controls. Meanwhile, '
                     'TileSeq read counts were used to establish relative allele '
                     'frequencies in each condition. Non-mutagenized control '
                     'counts were subtracted from counts (as estimates of '
                     'sequencing error). log ratios of selection over '
                     'non-selection counts were calculated. The resulting TileSeq '
                     'fitness values were then rescaled to the distribution of the '
                     'BarSeq fitness scores. Fitness scores were joined using '
                     'confidence-weighted averages. Random-Forest base machine '
                     'learning was used to impute missing values and refine '
                     'low-confidence measurements, based on intrinsic, structural, '
                     'and biochemical features.\r\n'
                     '\r\n'
                     'See [**Weile *et al.* '
                     '2017**](http://msb.embopress.org/content/13/12/957) for more '
                     'details.\r\n'
                     '\r\n'
                     '## Additional columns:\r\n'
                     '* exp.score = experimental score from the joint '
                     'DMS-BarSeq/DMS-TileSeq screens\r\n'
                     '* exp.sd = standard deviation of the experimental score\r\n'
                     '* df = degrees of freedom (number of replicates contributing '
                     'to the experimental score)\r\n'
                     '* pred.score = machine-learning predicted score',
      'modification_date': '2019-08-08',
      'modified_by': '0000-0003-1628-9390',
      'next_version': None,
      'previous_version': None,
      'private': None,
      'publish_date': '2018-06-26',
      'pubmed_ids': [ { 'dbname': 'PubMed',
                        'dbversion': None,
                        'identifier': '29269382',
                        'url': 'http://www.ncbi.nlm.nih.gov/pubmed/29269382'}],
      'replaces': None,
      'score_columns': [ 'hgvs_nt',
                         'hgvs_splice',
                         'hgvs_pro',
                         'score',
                         'sd',
                         'se',
                         'exp.score',
                         'exp.sd',
                         'df',
                         'pred.score'],
      'short_description': 'A joint Deep Mutational Scan of the human SUMO E2 '
                           'conjugase UBE2I using functional complementation in '
                           'yeast, combining DMS-BarSeq and DMS-TileSeq data, '
                           'followed by machine-learning-based imputation and '
                           'refinement.',
      'sra_ids': None,
      'target': { 'ensembl': { 'dbname': 'Ensembl',
                               'dbversion': None,
                               'identifier': 'ENSG00000103275',
                               'offset': 0,
                               'url': 'http://www.ensembl.org/id/ENSG00000103275'},
                  'name': 'UBE2I',
                  'reference_maps': [ { 'genome': { 'assembly_identifier': { 'dbname': 'GenomeAssembly',
                                                                             'dbversion': None,
                                                                             'identifier': 'GCF_000001405.26',
                                                                             'url': 'http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26'},
                                                    'organism_name': 'Homo sapiens',
                                                    'short_name': 'hg38'}}],
                  'reference_sequence': { 'sequence': 'ATGTCGGGGATCGCCCTCAGCAGACTCGCCCAGGAGAGGAAAGCATGGAGGAAAGACCACCCATTTGGTTTCGTGGCTGTCCCAACAAAAAATCCCGATGGCACGATGAACCTCATGAACTGGGAGTGCGCCATTCCAGGAAAGAAAGGGACTCCGTGGGAAGGAGGCTTGTTTAAACTACGGATGCTTTTCAAAGATGATTATCCATCTTCGCCACCAAAATGTAAATTCGAACCACCATTATTTCACCCGAATGTGTACCCTTCGGGGACAGTGTGCCTGTCCATCTTAGAGGAGGACAAGGACTGGAGGCCAGCCATCACAATCAAACAGATCCTATTAGGAATACAGGAACTTCTAAATGAACCAAATATCCAAGACCCAGCTCAAGCAGAGGCCTACACGATTTACTGCCAAAACAGAGTGGAGTACGAGAAAAGGGTCCGAGCACAAGCCAAGAAGTTTGCGCCCTCATAA',
                                          'sequence_type': 'dna'},
                  'refseq': { 'dbname': 'RefSeq',
                              'dbversion': None,
                              'identifier': 'NM_003345',
                              'offset': 159,
                              'url': 'http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_003345'},
                  'scoreset': 'urn:mavedb:00000001-a-1',
                  'type': 'Protein coding',
                  'uniprot': { 'dbname': 'UniProt',
                               'dbversion': None,
                               'identifier': 'P63279',
                               'offset': 0,
                               'url': 'http://purl.uniprot.org/uniprot/P63279'}},
      'title': 'UBE2I imputed & refined',
      'urn': 'urn:mavedb:00000001-a-1',
      'variant_count': 3180}