Presented by:

Smallportrait

QUAN-HA LE

Infrastructure Platforms, Database Group, BlackBerry

Le Quan Ha is a PostgreSQL Database Administrator of IPG Database, BlackBerry Company in Ontario, Canada since 2015. From 2010 until 2015, he was working through Anvy Digital, Canwrx and Hire Ground Software companies for IT Manager on PostgreSQL, over 7 years of PostgreSQL experience. In 2005, he graduated PhD in Queen's University Belfast, UK. From 1999 to 2000, he was a Research Assistant of Queen's University Belfast. Since 2016, Ha is a member of the United States PostgreSQL Association in New York.

Mainurrahman

Mainur Rahman

MNP

Mainur Rahman is a Web Developer of MNP firm in Calgary, Alberta, Canada. 2010-2015, he was working through Circle Cardiovascular Imaging and Hire Ground Software companies for Head of Research and Development on PostgreSQL. He was Dr. Ha’s Head in Hire Ground Software, over 7 years of PostgreSQL experience. 2008-2010, he graduated MSc of Computer Science and Teaching Assistant and of University of Calgary, Alberta.

No video of the event yet, sorry!

Web search engines often federate many user queries to relevant structured databases. For example, a recruitment-related query might be federated to a jobseekers-and-employers database containing their resumes and skills. The relevant structured data items are then returned to the user along with web search results. Though each structured database is searched in isolation, the search often produces empty/incomplete results as the database may not contain the required information to answer the query.

Starting from our Applicant Tracking System developed on Zend Server using PostgreSQL, we have 16 development databases of over 650,000 profile documents of resumes/cover letters/skills. There are on average 238 keywords per document. In fact, per minute there can be up to 200,000 transactions within all these databases. Our existing traditional database search technique (by PostgreSQL full-text keyword search) can be frozen, taking very long to respond unless if we cut off from top profiles then we will not have the search results ready by thirty seconds, but this cut-off limitation returned incorrect results; for example for a query “Jet fuel Thermal Oxidation” to request information about jobseekers whose resumes contain skills in Oil/Gas industry, in the top ten results there was a conflict in relevance ranking.

In order to research a more suitable search technique better than the existing database search, we considered employment of semantic search models. Our semantic search has 88% - 91.22% accuracy with very much quicker queries that can help users to make a search of 4 keywords completed from 1 second to 28 seconds. Furthermore, our semantic search engine becomes very strong that users can search by entering a whole text paragraph. We designed a combination of semantic search that look for web pages per search and PostgreSQL database search.

Our semantic search included novel features as + Federated partitions: PostgreSQL huge full-text databases can associate semantic searches to over 100 entity-and-keyword hashing partitions.

  • Scalable capability: hashing partitions with 300 well-setting triggers are able to cover a good storage of document databases from more than 20 accumulation years.

  • Query-cloud web module: users can increase/decrease scores of all keywords to affect into the search strategy.

  • Ontology extraction: we focus on generating semantic ontologies using HTML web forms.

  • Semantic RDF ranking: raw unstructured content of jobseeker documents are transformed into RDF format.

  • Intelligent web search engine.

The timing cost is at Table 1. "Performance of Semantic Search on Resume Database"

Keyword number per search | Timing cost One | 1 second – 10 seconds Two | 1s – 12s Three | 1s – 17s Four | 1s – 28s Five | 1s – 1 minute Short sentence | 2s – 38s Long sentence | 9s – 1m 12s Paragraph | 21s – 5m

On average, a search takes 4 seconds/query.

We successfully developed a new semantic search architecture that is easy to apply in practical. Our future work aims to develop deeper click models for our semantic search and to organize the semantic search onto cell phones.

Date:
2017 November 14 10:40
Duration:
50 min
Room:
Willow
Conference:
PGConf Local: Seattle
Language:
Track:
dev
Difficulty:
Hard