Oceanicsdotio

References proof of concept

October 03, 2020


Most technical and scientific writing references external sources liberally.

I need support for this, and have had an annoying time finding a reference manager I like and want to integrate with. So in the spirit of making this platform richer, I’m experimenting with the site itself being my reference manager.

Let’s start with a bit about our viridian friends from away:

Audet D, Miron G, Moriyasu M. 2008. Biological characteristics of a newly established green crab (Carcinus maenas) population in the Southern Gulf of St. Lawrence, Canada. Journal of Shellfish Research 27:427–441.

Inline method

As a component, this looks like:

// components/References.jsx
export const Reference = ({
    authors,
    year,
    title,
    pageRange, 
    volume, 
    journal,
    hash = null
}) => {

    const pages = pageRange ? `:${pageRange[0]}${pageRange[1]}.` : ``;
    const text = `${authors.join(", ")}. ${year}. ${title.trim()}. ${journal} ${volume}${pages}`;
    const _hash = hash || referenceHash({authors, title, year});

    return (
        <StyledBlock key={hash}>
            {text}
            <Link to={`/${REFERENCES_ROOT}/${_hash}/`}>{"[links]"}</Link>
        </StyledBlock>
    )
};

It is called from extended Markdown (.mdx) by passing it to an <MDXRenderer/>, and inlining it like so:

<Reference
    authors={["Audet D", "Miron G", "Moriyasu M"]} 
    year={2008} 
    title={"Biological characteristics of a newly established green crab (Carcinus maenas) population in the Southern Gulf of St. Lawrence, Canada"} 
    journal={"Journal of Shellfish Research"} 
    volume={27} 
    pageRange={[427, 441]}
/>

That renders to:

Audet D, Miron G, Moriyasu M. 2008. Biological characteristics of a newly established green crab (Carcinus maenas) population in the Southern Gulf of St. Lawrence, Canada. Journal of Shellfish Research 27:427–441.[links]

Nice. But I don’t want to inline all that information, even though I could theoretically pull references through GraphQL.

Notice that there’s a link generated, and some hashing function. More on this later.

Page metadata

Our GatsbyJS setup builds pages from (extended) Markdown using templates. The Markdown is Data and can be queried through GraphQL. The content is prefaced by a YAML metadata block. Like this:

# index.mdx
title: References proof of concept
date: "2020-10-03T12:00:00.000Z"
description: "Testing out support for referencing scientific literatue"
tags: ["interface", "research", "ux", "citation"]

Were going to extend this with a citations array:

# index.mdx
citations:
  - authors: [Audet D, Miron G, Moriyasu M]
    year: 2008
    title: |
      Biological characteristics of a newly established green crab 
      (Carcinus maenas) population in the Southern Gulf of St. Lawrence, Canada
    journal: Journal of Shellfish Research
    volume: 27
    pageRange: [427, 441]

This is getting closer to a format I actually want to use, just have to pick this data up from JavaScript.

Because it’s GraphQL we have to explicit request all the fields, like a chump:

// templates/blog-post.js
query BlogPostBySlug($slug: String!) {
    site {
        siteMetadata { title }
    }
    mdx(fields: { slug: { eq: $slug } }) {
        id
        excerpt(pruneLength: 160)
        body
        frontmatter {
            title
            date(formatString: "MMMM DD, YYYY")
            description
            citations {
                authors, year, title, journal, volume, pageRange
            }
        }
    }
}

Now all of the local citations are available in the GatsbyJS page component.

Consuming references as a footer

Now that we get citations as inputs to the page component, we can process it however we want! That work will all be done during bundling, so we can validate or pull in additional information before anything gets pushed to production.

Quality control is good.

We’ll render a reference list after any article with citations, typical of scientific literatue. We pick up the data from data.mdx.frontmatter.citations and map this array in JSX. The <References/> component is just a basic wrapper that adds a heading and container:

// templates/blog-post.js
{frontmatter.citations ? <References heading={"References"} references={(frontmatter.citations || [])} /> : null }

You can see from the final section below that it works!

Indexing all references

In addition to creating pages, the build process creates a /tags page that gives all content tags, and child pages for each tag. For example, /tags/ux links back to this page.

We can do a similar thing with references! We’ll end up with a /references endpoint, and then we want a page for each reference that links back to the articles containing that reference. It’s just a little bit more complicated because of nested objects.

We also want the <Reference/> component to link to it’s index page. The trick will be creating a unique URL for each reference without making too many assumptions about their formatting. No one is likely to access the page directly, so a data oriented “name” like a hash is a good approach. That’s what the function mentioned before does:

const referenceHash = ({authors, title, year, journal}) => {
    const stringRepr = `${authors.join("").toLowerCase()} ${year} ${title.toLowerCase()}`.replace(/\s/g, "");
    const hashCode = s => s.split('').reduce((a,b) => (((a << 5) - a) + b.charCodeAt(0))|0, 0);
    return hashCode(stringRepr);
}

This is used when we create pages at build time, and when we render links to citations. You can follow the links between pages and the reference index to feel it out.

Bringing it home

In addition to a references section, inline references are extremely helpful. In office productivity software that has integrations with reference managers you usually insert a citation, and then it matches what you type to help autocomplete.

You have to provide just enough information to disambiguate between article by the same authors in the same years, cause wouldn’t ya know, science is a weird incestuous community.

We’ll use the hash function again to create links, and introduce a new <Inline/> component. Not elegant, but it’s a start.

The article (Audet, Miron & Moriyasu 2008) has some interesting traits:

  • There is a place: Gulf of St. Lawrence
  • There is a species: ~Carcinus maenas~
  • It is about shellfish

But I know that because I know how to interpret the journal information and have a background in marine science. We could tag every reference with keywords, like the articles themselves, but that requires a lot of effort and thought.

Much better to automate a general solution.

This will allow us to enrich documents with contextual information about locations and species for example. We can provide also link through words or phrases that are already present in the system. A hybrid trie/hash structure will be built during page generation to cross-reference terminology.

Audet D, Miron G, Moriyasu M. 2008. Biological characteristics of a newly established green crab (Carcinus maenas) population in the Southern Gulf of St. Lawrence, Canada. Journal of Shellfish Research 27:427–441.[links]