Using TaskPipes versus Writing Code

From our point of view, one of the most exciting aspects of TaskPipes is the fact that it allows non-developers to do things that previously only developers could do.

Need to extract data straight from a PDF, modify it, and export it into a database, every time you get a message from a certain email address?

Or perhaps you need to pull data from a webpage, use it in an intermediate calculation, and export it to your accounting software?

Each of these processes by themselves would take a developer a dozen hours of coding and testing to complete.

With TaskPipes, even non-technical users can get set up in just a few minutes.

An Example

Let’s look at an example of how you would approach a typical task in code versus with TaskPipes. Maybe you have a list of weekly usage for each customer in an Excel file:

Screen Shot 2015-10-05 at 20.39.39

Now, we want to get the data so that we have a single line entry for each user for every week. In TaskPipes, this is simple with the “Columns to Rows” method.

Screen Shot 2015-10-05 at 20.42.32

We say that we want to perform this action from the third column onwards, would like to “pull down” only one column in every chunk of data, and would like to also pull down the column headers (the week labels).

We are then left with the data in this format:

Screen Shot 2015-10-05 at 20.45.35

We now need to perform some simple data manipulation: pulling out the week number, and renaming a header:

Screen Shot 2015-10-06 at 11.44.47

From there, we apply a filter to only allow through non-zero values by applying a filter, and enter our database credentials to load these transactions into a MySQL database.

This leaves us with the following process:

Screen Shot 2015-10-05 at 20.49.39

…and the following data:

Screen Shot 2015-10-05 at 20.50.04

If we were to automate this process with code, it might look something like this:

def run():
    # Step 1 - "Choose Input Data"
    rows = []
    path = BASE_PATH + '/using_taskpipes_vs_code.xlsx'
    workbook = open_workbook(path)
    worksheet = workbook.sheet_by_index(0)
    for r in range(worksheet.nrows):
        row = []
        for c in range(worksheet.ncols):
            cell_value = get_and_cast_value_in_cell(worksheet, r, c)
            row.append(cell_value)

        rows.append(row)
    headers = rows.pop(0)

    # Step 2 - "Columns to Rows"
    new_rows = []
    first_column_to_split = 2
    headers_to_split = headers[first_column_to_split:]
    for row in rows:
        cells_to_copy = row[:first_column_to_split]
        cells_to_split = row[first_column_to_split:]

        for cell, header in zip(cells_to_split, headers_to_split):
            new_rows.append(cells_to_copy + [header] + [cell])

    headers = headers[:first_column_to_split] + ['', '']
    rows = new_rows

    # Step 3 - "Extract Text"
    column_index = 2
    headers.append('Week Number')
    for row in rows:
        cell = row[column_index]
        start_index = cell.index('Week') + 4
        extracted_text = cell[start_index:]
        row.append(extracted_text)

    # Step 4 - "Remove Column"
    column_index = 2
    del headers[column_index]
    for row in rows:
        del row[column_index]

    # Step 5 - "Rename Column"
    headers[2] = 'Transaction Amount ($)'

    # Step 6 - "Filter"
    new_rows = []
    for row in rows:
        if row[0] != 0:
            new_rows.append(row)
    rows = new_rows

    # Step 7 - "Move Column"
    old_index = 3
    new_index = 2
    headers.insert(new_index, headers.pop(old_index))
    for row in rows:
        row.insert(new_index, row.pop(old_index))

    # Step 8 - "Insert into MySQL Database"
    database = connect_to_database(
        domain='',
        port='3306',
        username='',
        password='',
        database='',
    )
    table = database.table('table_name')

    for row in rows:
        entry = create_entry(table, **dict(zip(headers, row)))
        table.add_entry(entry)

Reviewing this code, it is clear that this has been built with the most basic functionality. It doesn’t cover any of the edge-cases that so often appear when dealing with data like this. Furthermore, this would be fairly tricky to debug.

There are a few key reasons why we think TaskPipes tackles this better than code:

  1. Accessibility to non-technical users. Obviously code was never designed to be approachable to non-technical users. TaskPipes was.
  2. Easy To Modify. But what if these incoming Excel files change format? Using code, you’d have to change all the list indices etc. yourself or, even worse, lean on your friend in the software department to help you out. With TaskPipes, it is as simple as changing a couple of drop-downs.
  3. Visual Ease of Use. The user interface of TaskPipes makes it simple to see how a process is working, and being able to see what the data looks like at each step makes it straightforward to ensure that the process is working as expected. To do something similar in code, you’d have to include a load of breakpoints.

See what processes you can automate by heading over to taskpipes.com now.

Advertisements
Using TaskPipes versus Writing Code

Using TaskPipes to Parse a Reddit RSS Feed

Ever since Yahoo Pipes shut down recently, users have been looking for an alternative way to parse and edit RSS feeds.

Thankfully, we’ve built TaskPipes with the objective of being able to pull in data from any location. This including RSS and Atom feeds.

An Example

For this example, let’s extract posts from the homepage of Reddit, filter and clean only those we are interested in, and send ourselves a daily summary of the results by email.

Interestingly, you can get an RSS feed for any Reddit page by simply appending “.rss” to the end of the URL. Therefore, the first thing we do in TaskPipes is define the input data by referencing an “External Link”:

Screen Shot 2015-10-07 at 13.58.00

This gives us data in the following format:

Screen Shot 2015-10-07 at 13.59.48

Now let’s say we want to pull out the subreddit, the title of the post and the link to the content. Firstly, let’s extract the link from the “description” column.

We do this using two “Extract Text” steps in TaskPipes. The first will pull out the text between the start of the description and the first occurrence of “>[link].

Screen Shot 2015-10-07 at 14.17.16

Now we have all the text between the start and the end of the link. We now pull out the text from the first occurrence of the ” character and the end of this text:

Screen Shot 2015-10-07 at 14.17.26

We now just strip out the columns we don’t need using the “Remove Columns” method. Finally, let’s say that we don’t want to be sent links to images, so let’s apply a filter to remove entries that contain “imgur.com” within them:

Screen Shot 2015-10-07 at 14.21.40

The final step in the process is to send this data as an email. We set our email address, the subject and a message if required. The final process looks like this:

Screen Shot 2015-10-07 at 14.24.35

We can run this on a schedule, by using the “Triggers” functionality:

Screen Shot 2015-10-07 at 14.31.22

We’ll be sent an email every day at 4pm with a CSV of the cleaned data.

To use this pipe, head over to the TaskPipes Examples Page and make a copy of the Reddit process.

Using TaskPipes to Parse a Reddit RSS Feed

4 Great Workflow and Data Automation Tools

Here at TaskPipes, we’re always looking for ways to improve productivity and look to automate the more repetitive aspects of work.

That’s why we wanted to highlight 4 of our favourite workflow automation tools.

Streak

We spend a lot of time in email. One of the most useful tools we’ve come across is Streak. They describe themselves as “CRM in your inbox” and are a Chrome extension that extends a lot of the functionality of Gmail.

A couple of their features that I find most useful are shortcuts and reminders. You can program shortcuts such that whenever you type e.g. “lookfwd” – it autocompletes to “I look forward to hearing from you.”, much like you can do in iOS. You can also set a reminder to follow up on an email, which sends you a follow up message at the date and time of your choosing.

TaskPipes

We’ve built TaskPipes to automate regular data processing tasks, where the incoming data is not always in the required format. Whether that’s receiving regular emails, extracting and cleaning data from the body of the email to send to an API; or pulling the appropriate data from a PDF file to clean and load into your CRM tool, TaskPipes has your back.

We’ve worked hard to make a lot of the complex data wrangling (that you probably have to do on a regular basis in Microsoft Excel) really easy through our user interface. You just define the process once, connect it to the desire endpoint, and data can flow through your pipe whenever you have a new data set.

Zapier

We love Zapier at TaskPipes. Zapier allows you to connect your different apps together. So whenever you get e.g. a signup on your site, you can e.g. add a To-Do item on your Trello board. They’ve provided all the API integrations which makes it super easy to deal with.

And what is even better, once you’ve processed and manipulated your data in TaskPipes, you can send it straight to Zapier to load it into any of the 500+ applications they are connecting with. That TaskPipes to CRM example I referred to earlier? That was via Zapier.

BlockSpring

Blockspring, like Zapier, allows you to pull data from a range of different applications. Their core feature though? They let you pull data from these apps directly into your Google Spreadsheet or Excel file via their Chrome extension or plugin. Analysts now have access to API data!

4 Great Workflow and Data Automation Tools

Sending Data From TaskPipes To Zapier

Zapier is a fantastic tool, and we’ve used it to build all sorts of integrations here at TaskPipes.

In case you don’t know, Zapier allows you to connect hundreds of your favourite applications together, like Trello, Salesforce, MailChimp and many more.

However, one problem we have had with Zapier is that data always has to be in exactly the right format when you are connecting these apps together. You can never change or modify it before loading it into your app of choice.

We wanted to open up the data processing, manipulation and reformating capabilities of TaskPipes to a wider audience of applications and so are pleased to announce that you can now integrate TaskPipes with Zapier, so that you can send the data from TaskPipes on to almost any third-party application.

Pull out new lead data straight from the body of an email and load it into your CRM, extract and clean the data from an pdf email attachment into your accounting software, or load rows of data from Excel files into your reminder application. Try this integration with Zapier now at taskpipes.com

You can view a walkthrough of how to integrate TaskPipes with Zapier below.

Sending Data From TaskPipes To Zapier

Extracting Data from the Body of an Email using TaskPipes

One of the most powerful uses of TaskPipes is that of pulling data straight from the body of regular emails.

I personally receive dozens of email updates, ranging from property alerts via Zoopla/Zillow to payment emails sent by PayPal.

But what if you want to automatically pull out the square footage of every apartment; or extract the payment information from every transaction, whenever you receive a new email?

Copying and pasting is not the solution.

An Example

In this specific example, we introduce a fictional supplier of international SMS messages who sends us frequent pricing updates by email.

The emails looks like this:

TaskPipes Incoming Email

Most of the data we are interested in is within that ugly green table. I wish “Supplier Five” hadn’t let go of their in-house designer.

To receive emails in this format in TaskPipes, we create a new pipe with a “Read from Email Body” step. Defined within this step is a unique TaskPipes preview email address.

TaskPipes - Read from Email Body

If we forward our original message to that email address, TaskPipes will then display a preview of the incoming data. You will notice that TaskPipes cleverly pulls out tables and new lines of text like so:

TaskPipes Data

I can now define a few intermediate data manipulation steps, to get the data in the right format:

Screen Shot 2015-09-29 at 12.44.40

To run this process, I simply assign a custom email address to the pipe.

TaskPipes Email Trigger

Whenever an email gets sent to this address, the body of the email will be parsed and processed through each of these steps.

Exporting

To send this to a final location, I can use one of the many output steps, including Google Spreadsheets, API webhooks or direct database writing.

TaskPipes Pipe

Use the above process via the TaskPipes Examples Page – it’s called “Parse Data From Body of Email”

Extracting Data from the Body of an Email using TaskPipes

Scraping Hacker News on a Schedule with TaskPipes

Background

TaskPipes is a tool to turn any data into a spreadsheet. But why?

Well, spreadsheets are flexible. They’re accessible to the full spectrum of users, from completely non-technical to Turing Award winners. Tabular data is just really easy to manipulate, modify and get into the format you need.

an example

If you’d like to follow along with this, please go to taskpipes.com/examples and make a copy of the Hacker News Pipe.

Let’s use TaskPipes to scrape the front page of Hacker News every day, pull out any stories from GitHub, and email me the results.

Although you can pull in data from a range of sources with TaskPipes, we want to use the “External Link” option. Let’s set this to HN:

Screen Shot 2015-09-22 at 19.27.23

TaskPipes extracts any tables that are present in the HTML and, if you view the source on news.ycombinator.com, you’ll notice that there are three columns. We only want the third column, so let’s remove the first two.

Screen Shot 2015-09-22 at 19.27.45

Next, we want to extract the number of points of each submission. We use the “Extract Text” functionality to get the text between the start position and the first occurrence of the word “points”.

Screen Shot 2015-09-22 at 19.29.52

No more regex!

We do a similar thing to pull out the headline, domain and the number of comments, to end up with the data in this format:

Screen Shot 2015-09-22 at 17.38.46

Now, let’s apply a filter to extract only the stories from github.com, and our pipe is set up.

We can set this process to run on a schedule, and will be emailed a CSV file with the results.

Screen Shot 2015-09-22 at 17.40.57

Alternatively, send this data to an external API, a database, Google spreadsheet, or elsewhere.

Wrapping Up

You can use this above example by visiting taskpipes.com/examples

TaskPipes can pull data from almost anywhere, including web pages, the body of emails or even email attachments.

Clean and manipulate data, and send it to a range of different destinations.

Sign up for a free TaskPipes account at taskpipes.com

Scraping Hacker News on a Schedule with TaskPipes