Skip to main content

Overview

This guide explains how to choose and configure AI Tables columns for company research, people discovery, contact enrichment, and scoring. It is meant to be actionable on its own: each supported column kind below includes the minimum working shape, required context, and the most common mistake.

Prerequisites

export EXTRUCT_API_TOKEN="YOUR_API_TOKEN"
Generate tokens in Dashboard API Tokens. For full setup, see Authentication.

Endpoints used

Column Guide in 60 Seconds

A column is one field Extruct fills for each row. Every column has kind, name, and key. Some kinds also have value. If kind is agent, you also choose agent_type, output_format, and a prompt. Typical shape:
{
  "kind": "agent",
  "name": "Description",
  "key": "company_description",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Describe what the company does in under 25 words.",
    "output_format": "text"
  }
}
Exceptions:
  • input, email_finder, and phone_finder do not need a value block
  • reverse_email_lookup uses top-level email_column_key instead of value

Column kinds at a glance

All column kinds require kind, name, and key.
KindWhat it doesWhat you provide
inputstores a value you already havethe row value
agentresearches, reasons, or transforms one fieldagent_type, prompt, output_format
company_people_finderfinds people at each company by roleroles
email_finderfinds a work email for a personnothing extra
phone_finderfinds a phone number for a personnothing extra
reverse_email_lookupresolves profile data from a known emailemail_column_key

If kind = agent

agent is the core Extruct pattern. After choosing kind: "agent", you choose the agent behavior, the output shape, and the prompt.
ChoiceWhat it controls
agent_typehow Extruct gets or produces the answer
output_formatthe shape of the returned result
promptthe exact job the column should do

Agent types at a glance

Agent typeBest forNot best for
research_proresearched answers on company tablespeople or generic tables, direct contact lookups, LinkedIn fetch from a known URL
research_reasoningresearched answers on people and generic tablescompany-table use cases that need company-context disambiguation
llmtransforming data already in the rowweb research or source discovery
linkedinLinkedIn data from a known LinkedIn URLdiscovering the LinkedIn URL in the first place

Output formats at a glance

FormatUse forExtra required fields
textprose or notesnone
urlone canonical URLnone
emailone email-shaped answer from an agentnone
selectexactly one allowed optionlabels
multiselectzero or more allowed optionslabels
numericcounts and measured valuesnone
moneystructured financial valuesnone
dateexact or partial datesnone
phoneone phone number in structured formnone
gradebounded scoring with explanationnone
jsonnested schema-defined outputoutput_schema

Fast defaults

  • On company tables, start with the official website domain or URL as input
  • On company tables, default to kind: "agent" and agent_type: "research_pro"
  • On people and generic tables, use kind: "agent" and agent_type: "research_reasoning"
  • On company tables, choose research_reasoning only if you intentionally want to skip the company-context disambiguation step
  • For scoring, use output_format: "grade"
  • For people or contact workflows, prefer company_people_finder, email_finder, phone_finder, or reverse_email_lookup over generic prompts
  • When you add new columns, rerun only the new column IDs with mode: "new"

Choose the right column kind

Use the simplest column kind that matches the result you want back.

input

What it does:
  • stores a value you already have
When to use it:
  • company website domain or URL
  • manual notes
  • internal tags or metadata
  • local helper fields such as company_website on a standalone people table
Minimum config:
{
  "kind": "input",
  "name": "Company",
  "key": "input"
}
Context it needs:
  • none; you provide the value directly in row data
Best practice:
  • on company tables, the preferred input is the company’s official website domain or URL
  • a raw company name is a weaker fallback when you do not have the site
What it returns:
  • exactly the row value you send
Most common mistake:
  • using weak company names as the default company input when you already have a domain or URL

agent

What it does:
  • researches, reasons, or transforms one field
When to use it:
  • research a company fact
  • classify a company or person
  • summarize upstream evidence
  • return structured output such as select, money, grade, or json
Minimum config:
{
  "kind": "agent",
  "name": "Description",
  "key": "company_description",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Describe what the company does in under 25 words.",
    "output_format": "text"
  }
}
Context it needs:
  • on company tables, Extruct auto-injects company context
  • on people tables, Extruct auto-injects person context
  • on generic tables, use explicit prompt references for the row fields you need
What it returns:
  • one value in the output format you choose
Most common mistake:
  • one prompt tries to do multiple jobs or references {input} unnecessarily on company or people tables

company_people_finder

What it does:
  • finds people at each company by role
When to use it:
  • branch from company research into a people workflow
  • find leadership, decision-makers, or role-based contact targets
Minimum config:
{
  "kind": "company_people_finder",
  "name": "Decision Makers",
  "key": "decision_makers",
  "value": {
    "roles": ["VP Sales", "sales leadership", "revenue operations"]
  }
}
Context it needs:
  • a company table with the correct company rows
What it returns or creates:
  • fills the finder cell on the company table
  • creates or updates a child people table for downstream enrichment
Good role style:
  • broader role families for coverage: sales leadership, engineering leadership
  • exact titles for narrower targeting: CEO, Head of Engineering
Most common mistake:
  • adding it to a people table or using overly narrow role strings before checking coverage

email_finder

What it does:
  • finds a work email for a person
When to use it:
  • enrich a people table after person rows already exist
Minimum config:
{
  "kind": "email_finder",
  "name": "Work Email",
  "key": "work_email"
}
Context it needs:
  • full_name
  • profile_url
  • company_website
company_website can come from:
  • a parent company row, or
  • a local people-table input column keyed exactly company_website
What it returns:
  • one work email or no result
Most common mistake:
  • adding the column before company_website exists or trying to configure it with a value block

phone_finder

What it does:
  • finds a phone number for a person
When to use it:
  • enrich a people table after person rows already exist
Minimum config:
{
  "kind": "phone_finder",
  "name": "Direct Phone",
  "key": "direct_phone"
}
Context it needs:
  • full_name
  • profile_url
  • company_website
company_website can come from:
  • a parent company row, or
  • a local people-table input column keyed exactly company_website
What it returns:
  • one phone number or no result
Most common mistake:
  • expecting high coverage without first verifying that the person rows and company website context are strong

reverse_email_lookup

What it does:
  • resolves person or profile data from an email you already have
When to use it:
  • you already have an email column and want profile resolution rather than company-to-people branching
Minimum config:
{
  "kind": "reverse_email_lookup",
  "name": "Profile from Email",
  "key": "profile_from_email",
  "email_column_key": "work_email"
}
Context it needs:
  • a source email column keyed in email_column_key
What it returns:
  • person or profile metadata such as name, headline, location, and profile URL
Most common mistake:
  • putting email_column_key inside value or using this when the real job is company-to-people discovery

Agent types

Agent types change what Extruct can do for a custom agent column. research_pro and research_reasoning can both research, reason, and make judgments. The difference is whether Extruct first disambiguates the requested company using company context before answering.
Agent typeBest forWhat it actually does
research_proresearched answers on company tablesweb research plus Extruct DB search and similar-company retrieval, with a company-context disambiguation step
research_reasoningresearched answers on people and generic tablesthe same general research tool class, but without the company-context disambiguation step
llmrow-local transformationtransforms existing row data only; no web research
linkedinLinkedIn data from a known LinkedIn URLuses LinkedIn company or profile lookup behavior
Research agents cannot do these jobs directly:
  • fetch LinkedIn data from a known LinkedIn URL
  • find people at a company by role
  • find work emails
  • find phone numbers
  • resolve a person from an email
Use these instead:
  • linkedin
  • company_people_finder
  • email_finder
  • phone_finder
  • reverse_email_lookup

research_pro

Use this for researched answers on company tables. What makes it different:
  • it uses the same general research and reasoning capability as research_reasoning
  • before answering, it gathers or confirms company context
  • it uses that company context to disambiguate whether the evidence belongs to the requested company or to some other company
Good fits:
  • pricing and packaging
  • funding or valuation
  • target segments
  • recent news
  • product or use-case research
  • competitor or alternative discovery
  • canonical company URLs such as LinkedIn or careers pages
Default role:
  • this is the safe default for custom agent columns on company tables
  • use it unless you intentionally want to skip the disambiguation step and you know the entity is already clear

research_reasoning

Use this for researched answers when you do not want the company-context disambiguation step. What makes it different:
  • it uses the same general research and reasoning capability as research_pro
  • it does not start with the company-context disambiguation step
  • it is the default research agent on people and generic tables
  • on company tables, it is an expert choice when the entity is already clear and you intentionally want to skip the disambiguation step
Good fits:
  • official-site disambiguation
  • custom 1-5 scoring
  • SMB vs enterprise judgments
  • ranked comparisons
  • researched answers on generic tables

llm

Use this for row-local transformation only. What to expect:
  • no web research
  • no page visits
  • no source URLs
  • cheaper downstream transformation after a research step already happened
Good fits:
  • turn notes into a select
  • summarize several upstream columns
  • normalize messy evidence
  • derive structured json from row context

linkedin

Use this when the row already has a known LinkedIn company or person URL and you want LinkedIn-derived data. Typical chain:
  1. use research_pro to find the right LinkedIn URL
  2. use linkedin to fetch LinkedIn data
  3. use llm to summarize or classify that data
Important people-table rule:
  • custom agent columns on people tables can use llm, research_reasoning, and linkedin
  • research_pro is not supported on custom agent columns on people tables

Built-in company and people context

company tables

company tables automatically create and fill:
  • company_profile
  • company_name
  • company_website
This is why most company-table prompts should be plain natural-language instructions. Weak:
Research the company {input} and summarize what it does.
Strong:
Describe what the company does in under 25 words.

people tables

Custom agent columns on people tables get baseline row context automatically:
  • full_name
  • profile_url
  • role
Use plain natural-language prompts when that baseline context is enough. Add explicit prompt references only when the column depends on another custom field or a local input field. Weak:
Classify this person using {input}.
Strong:
Classify this person's seniority level based on their current role.
Example where an explicit reference is appropriate:
Choose the company's pricing model based on:

Pricing Notes
---
{pricing_notes}
---

Output formats

Choose the output format that best matches how the result will be used later.
Output formatBest forExample
textshort prose or notescompany description
urlone canonical URLcareers page
emailone email addressnormalized email
selectone allowed optionpricing model
multiselectzero or more allowed optionsmarkets served
numericcounts or measured valuesemployee count
moneystructured financial valuesannual revenue
dateexact or partial datesfounded date
phoneone phone numberdirect phone
gradebounded scoring with explanationICP fit
jsonnested structured outputcompetitors list
Guidance:
  • use text for short prose meant to be read directly
  • use select or multiselect when the answer should stay within labels
  • use money for revenue, ARR, valuation, or funding
  • use grade for scoring
  • use json only when the structure is genuinely useful downstream
For custom agent columns, the main result lives in rows[].data.<key>.value.answer. What changes by output_format is the shape of answer. Supporting fields such as explanation and sources may sit alongside it in rows[].data.<key>.value.

text

Use this for prose or short notes that will be read directly. Minimum config:
{
  "kind": "agent",
  "name": "Description",
  "key": "company_description",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Describe what the company does in under 25 words.",
    "output_format": "text"
  }
}
Typical returned shape:
{
  "answer": "AI research automation for company and people workflows.",
  "explanation": "Company site and product messaging describe structured research automation.",
  "sources": ["https://extruct.ai"]
}
Best practice:
  • use text only when the result is meant to be read directly; if you need filtering or sorting later, prefer a structured format

url

Use this for one canonical URL. Minimum config:
{
  "kind": "agent",
  "name": "Careers URL",
  "key": "careers_page_url",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's official careers or jobs page URL.",
    "output_format": "url"
  }
}
Typical returned shape:
{
  "answer": "https://extruct.ai/careers",
  "explanation": "This is the company's official careers page.",
  "sources": ["https://extruct.ai/careers"]
}
Best practice:
  • ask for the single best URL only; do not ask for a list unless you actually need a json structure

email

Use this when you want one email-shaped answer from a custom agent. Minimum config:
{
  "kind": "agent",
  "name": "Support Email",
  "key": "support_email",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's primary support email address.",
    "output_format": "email"
  }
}
Typical returned shape:
{
  "answer": "support@extruct.ai",
  "explanation": "The support contact is listed on the company's website.",
  "sources": ["https://extruct.ai"]
}
Best practice:
  • if the real job is finding a work email for a person, prefer email_finder

select

Use this for exactly one allowed option. Minimum config:
{
  "kind": "agent",
  "name": "Pricing Model",
  "key": "pricing_model",
  "value": {
    "agent_type": "llm",
    "prompt": "Select the company's primary pricing model.\n\nPricing Notes\n---\n{pricing_notes}\n---",
    "output_format": "select",
    "labels": ["Free", "Freemium", "Subscription", "Usage-Based", "Custom Quote"]
  }
}
labels is required in the column config. Typical returned shape:
{
  "answer": {
    "value": "Subscription"
  },
  "explanation": "The company sells recurring subscription plans.",
  "sources": ["https://extruct.ai/pricing"]
}
Best practice:
  • labels should be stable categories, not free-form text or sentence-length answers

multiselect

Use this for zero or more allowed options. Minimum config:
{
  "kind": "agent",
  "name": "Markets Served",
  "key": "markets_served",
  "value": {
    "agent_type": "research_reasoning",
    "prompt": "Select the markets this company clearly serves.",
    "output_format": "multiselect",
    "labels": ["SMB", "Mid-Market", "Enterprise", "Public Sector", "Healthcare"]
  }
}
labels is required in the column config. Typical returned shape:
{
  "answer": {
    "values": ["Mid-Market", "Enterprise"]
  },
  "explanation": "The company shows clear messaging for larger commercial customers.",
  "sources": ["https://extruct.ai"]
}
Best practice:
  • use multiselect only when multiple labels can legitimately be true at the same time

numeric

Use this for counts, ranges, or measured values. Minimum config:
{
  "kind": "agent",
  "name": "Employee Count",
  "key": "employee_count",
  "value": {
    "agent_type": "research_pro",
    "prompt": "How many employees does the company currently have?",
    "output_format": "numeric"
  }
}
Typical returned shape:
{
  "answer": {
    "value": 5234,
    "min_value": null,
    "max_value": null,
    "unit": "employees",
    "max_possible": null
  },
  "explanation": "Recent company sources point to approximately 5,234 employees.",
  "sources": ["https://example.com"]
}
Best practice:
  • prefer numeric over text when the result may be sorted, filtered, or bucketed later

money

Use this for structured financial values such as revenue, ARR, valuation, or funding. Minimum config:
{
  "kind": "agent",
  "name": "Annual Revenue",
  "key": "annual_revenue",
  "value": {
    "agent_type": "research_pro",
    "prompt": "What is the company's latest annual revenue or ARR?",
    "output_format": "money"
  }
}
Typical returned shape:
{
  "answer": {
    "amount": 15000000,
    "min_amount": null,
    "max_amount": null,
    "currency": "USD",
    "period": "annual"
  },
  "explanation": "The latest reliable figure is approximately $15M annually.",
  "sources": ["https://example.com"]
}
Best practice:
  • prefer money over prose for any financial figure you may compare or score later

date

Use this for exact dates, partial dates, or date ranges. Minimum config:
{
  "kind": "agent",
  "name": "Founded Date",
  "key": "founded_date",
  "value": {
    "agent_type": "research_pro",
    "prompt": "When was the company founded?",
    "output_format": "date"
  }
}
Typical returned shape:
{
  "answer": {
    "year": 2021,
    "month": null,
    "day": null,
    "end_year": null,
    "end_month": null,
    "end_day": null
  },
  "explanation": "Only the founding year is clearly supported by the available sources.",
  "sources": ["https://example.com"]
}
Best practice:
  • partial dates are valid and expected; do not force full day-level precision when the source does not support it

phone

Use this for one phone number in structured form. Minimum config:
{
  "kind": "agent",
  "name": "Main Phone",
  "key": "main_phone",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's primary phone number.",
    "output_format": "phone"
  }
}
Typical returned shape:
{
  "answer": {
    "number": "+14155552671",
    "country_code": "1",
    "extension": null
  },
  "explanation": "This is the main published company phone number.",
  "sources": ["https://example.com"]
}
Best practice:
  • if the real job is finding a phone number for a person, prefer phone_finder

grade

Use this when you want a bounded score plus an explanation. The recommended scoring pattern is a custom agent column with output_format: "grade". Example:
{
  "kind": "agent",
  "name": "Enterprise Fit",
  "key": "enterprise_fit",
  "value": {
    "agent_type": "research_reasoning",
    "prompt": "Assess this statement about the company: The company clearly sells to enterprise customers. Use the default 1-5 grade scale. If there is not enough evidence to score confidently, return Not found.",
    "output_format": "grade"
  }
}
Default scale:
  • 5: strong yes
  • 4: likely yes
  • 3: mixed, partial, or uncertain
  • 2: likely no
  • 1: strong no
  • Not found: not enough evidence to score confidently
Typical returned shape:
{
  "answer": 4,
  "explanation": "Strong mid-market and enterprise signals, but not purely enterprise-only.",
  "sources": ["https://example.com"]
}
Not found shape:
{
  "answer": "Not found",
  "explanation": "Not enough reliable evidence to score this confidently.",
  "sources": []
}

json

Use this for user-defined nested output. Minimum config:
{
  "kind": "agent",
  "name": "Competitors",
  "key": "competitors",
  "value": {
    "agent_type": "research_pro",
    "prompt": "Find the company's closest competitors and alternatives. Return one object with a competitors array. Each competitor should include name, domain, and short_reason.",
    "output_format": "json",
    "output_schema": {
      "type": "object",
      "properties": {
        "competitors": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "domain": { "type": "string" },
              "short_reason": { "type": "string" }
            }
          }
        }
      }
    }
  }
}
output_schema is required. Typical returned shape:
{
  "answer": {
    "competitors": [
      {
        "name": "Example Co",
        "domain": "example.com",
        "short_reason": "Targets the same buyer with a similar core workflow."
      }
    ]
  },
  "explanation": "These companies are the closest alternatives based on product and buyer overlap.",
  "sources": ["https://example.com"]
}
Best practice:
  • unlike the fixed formats above, the shape of answer is defined by your output_schema

Dependencies and prompt writing

For custom agent columns, dependencies come from:
  • prompt references such as {pricing_notes}
  • extra_dependencies
Prompt references are usually the clearest default because the dependency is visible in the prompt itself. Use extra_dependencies only when you intentionally need an upstream dependency that is not referenced in the prompt. Dependency example:
{
  "kind": "agent",
  "name": "Pricing Model",
  "key": "pricing_model",
  "value": {
    "agent_type": "llm",
    "prompt": "Select the company's primary pricing model.\n\nPricing Notes\n---\n{pricing_notes}\n---",
    "output_format": "select",
    "labels": ["Free", "Freemium", "Subscription", "Usage-Based", "Custom Quote"]
  }
}
Prompt writing rules:
  • each column should do one job
  • say exactly what artifact should come back
  • prefer natural-language prompts on company and people tables
  • use explicit prompt references only when a column truly depends on another custom field
Research prompt pattern:
Find the company's official careers page URL. Return the single best URL only.
Competitor discovery pattern:
Find the company's closest competitors and alternatives.
Prefer companies that solve the same core problem for a similar buyer.
Return one object with a competitors array.
Each competitor should include name, domain, and short_reason.
If you are not confident a company is a real competitor, leave it out.

Troubleshooting

Unresolved column references

Cause:
  • the prompt references a missing column key
Fix:
  • create the source column first
  • or change the prompt to reference an existing key

Column is not compatible with the table kind

Common examples:
  • company_people_finder on a people table
  • research_pro custom agent columns on a people table

Cells never start running

Cause:
  • upstream dependency cells are not done
Fix:
  • inspect the source columns first
  • keep dependency chains short
  • rerun only the new column IDs with mode: "new"

Results are hard to reuse downstream

Cause:
  • too many text outputs
Fix:
  • move repeated decisions into select, multiselect, numeric, money, date, phone, grade, or json

A column feels too expensive

Cause:
  • web research is being used for a transformation problem
Fix:
  • research the source fact once
  • derive downstream fields with llm
If you are unsure:
  • default to company tables
  • default to official website domain or URL as company-table input
  • default to built-in or purpose-built column kinds when available
  • default to research_pro on company tables
  • default to research_reasoning on people and generic tables
  • on company tables, switch to research_reasoning only when you intentionally want to skip company-context disambiguation
  • default to llm for downstream transformations
  • default to agent + grade for scoring
  • default to plain natural-language prompts on company and people tables
  • add explicit prompt references only for intentional chaining or on generic tables
For fuller reusable configs and multi-step patterns, continue to Column Library.