luccioman
58b9834729
Added HTML microdata typed items parsing capability.
...
This adds the possibility for the HTML parser to gather typed items URLs
annotated in HTML tags with itemscope and itemtype attributes (see
microdata specification https://www.w3.org/TR/microdata/ ), notably
Types from the schema.org vocabulary, but also Types/Classes from any
other vocabulary, such as the common ones listed in the RDFa core
context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).
7 years ago
luccioman
80fb1026d0
Create recrawl requests with the relevant crawl profile.
...
Recrawl default profile was previously effectively used for crawl
stacker acceptance check, but request entries were indeed still created
with the "snippetGlobalText" profile.
7 years ago
luccioman
539925a275
Added an utility to generate/update XLIFF master file from lng files.
7 years ago
luccioman
41a6b052d9
Updated master and French translation for the IndexReIndexMonitor_p page
7 years ago
luccioman
fa6d030b0b
Moved dbtest to the test source folder.
7 years ago
luccioman
6cd3847d0a
Fixed NullPointerException case on Table init with relative file path.
...
Can occur for example when running dbtest with relative test table file
name (wihout explicit parent folder).
7 years ago
luccioman
28883d8a71
Shutdown daemon threads at the end of dbtest
7 years ago
luccioman
929e0d6eae
Replaced improper ByteBuffer.equals() implementation by Arrays.equals()
...
Renamed also ByteBuffer.equals() to startsWith() as this is the
appropriate function implementation semantics.
7 years ago
luccioman
098ee63911
Added a manual performance test for the HostBalancer.
...
Consequently to the report in mantis 776
(http://mantis.tokeek.de/view.php?id=776 ).
Running the perfs test with different control parameters seems to reveal
that the YaCy's RowHandleMap used in the balancer depthCache is finally
more efficient than for example the ConcurrentHashMap from JDK 8.
7 years ago
luccioman
fefe2d1b6e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
7 years ago
reger
5aa4fb1144
upd to metadata-extractor-2.11.0.jar
7 years ago
luccioman
46b5249c20
Removed time condition on HostBalancer initialization in JUnit test.
...
Its initialization in main application usage remains asynchronous.
7 years ago
luccioman
8b572b7337
Commit Solr index before simulating or starting recrawl job.
...
This ensures up-to-date simulation query results, and recrawl
processing.
7 years ago
luccioman
5b943c07ab
Merge pull request #155 from JeremyRand/readme-typo-fixes
...
Fix some typos in the README.
7 years ago
JeremyRand
dea856c854
Fix some typos in the README.
7 years ago
luccioman
733cacdbb8
Revised the RDFaParser main launcher for minimal proper operation.
...
This parser is still not enabled in the main text parsers list. More
would have to be done to make it functional.
7 years ago
luccioman
7baa99f26f
Fixed stored URL in web cache when redirection(s) occurs.
...
Associate cached content to the last redirection location, instead of
the first URL of a redirection(s) chain :
- for proper base URL processing in parsers (fixes mantis 636 -
http://mantis.tokeek.de/view.php?id=636 )
- to prevent duplicated content in Solr index when recrawling a
redirected URL
7 years ago
luccioman
5e2812c060
Automatically refresh running recrawl report when JavaScript is enabled.
...
For users who would prefer to keep JavaScript disabled, a manual Refresh
button is still available.
7 years ago
luccioman
19903a984f
Merge pull request #154 from tangdou1/master
...
update chinese translation
7 years ago
tangdou1
49d103ad16
Merge pull request #1 from tangdou1/tangdou1-patch-1
...
Update zh.lng
7 years ago
tangdou1
dd4f93f049
Update zh.lng
...
translate some untranslated words to chinese.
7 years ago
tangdou1
e585b4f597
Update zh.lng
7 years ago
luccioman
0fce264ba4
Set reindex page to html5 and removed presentational only html tables.
7 years ago
luccioman
83df922afc
Removed unused duplicated HTML id on header hidden field
7 years ago
luccioman
9ddf92d143
Removed unncessary reflection usage for workflow tasks.
...
This improves code readability and maintainability (calls hierarchy are
easier to read) and eventually performance.
7 years ago
luccioman
897d3d30cc
Added new recrawl job profile to the list of default crawl profiles
7 years ago
luccioman
9624516bf8
Refresh recrawl job profile threshold date like other default profiles
7 years ago
luccioman
b712a0671e
Added a specific default crawl profile for the recrawl job.
...
- with only light constraint on known indexed documents load date, as it
can already been controlled by the selection query, and the goal of the
job is indeed to recrawl selected documents now
- using the iffresh cache strategy
7 years ago
luccioman
adf3fa493d
Added comments about crawl profiles recrawl cycles
7 years ago
luccioman
3638e16c2e
More comprehensive log on rejected recrawls caused by date constraint
7 years ago
luccioman
d47afe6fab
Use a constant for crawler reject reason prefix with specific processing
7 years ago
luccioman
4e03335625
Added more details to the recrawl job report
7 years ago
luccioman
d95d393a0d
Add a query link to local Solr to browse selected recrawl candidates
7 years ago
luccioman
59f7763af6
Display recrawl job report also when job is actively running
7 years ago
luccioman
6425963cee
Fixed internal tables exact value match iterator
7 years ago
luccioman
0c9e0b3566
Record recrawl calls to make them schedulable
7 years ago
luccioman
433e241e4f
Added a report info box about eventual last terminated recrawl job
...
For easier monitoring of recrawls.
7 years ago
luccioman
b2af25b14f
Added a stop condition to the Recrawl busy thread
7 years ago
luccioman
421728d25a
Made possible to customize selection query before launching a recrawl
7 years ago
luccioman
fab6e54fec
Enforced controls (HTTP method, token) on ReIndex and ReCrawl operations
7 years ago
luccioman
36e9b1c5b3
Fixed SegmentTest test case time dependant occasional failures
...
As highlighted by latest automated Travis builds.
7 years ago
luccioman
8a4ea1c11e
Added UI switch to control content domain constraint per search request
7 years ago
luccioman
36a45b3905
Added UI setting for strictness of content-type checking on media search
7 years ago
reger
cedb53be4e
upd to commons-io-2.6
7 years ago
reger
f8071ac8ae
Make TokenizedStringNavigator (used for keyword search facet) active
...
check case insensitive.
As keywords are compared lower case, make sure user input keyword:Key
or keyword:key will be shown as active in facet entry key.
7 years ago
reger
270b77074e
upd to httpclient-4.5.4 and httpmime-4.5.4
7 years ago
reger
6db7f5525b
upd to icu4j-60.2
7 years ago
luccioman
e6907fdab3
Added optional search parameter/setting to control content domain filter
...
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).
Related graphical controls to be added to user interface.
7 years ago
luccioman
f52217c939
Enable full size images preview for users with extended search rights
7 years ago
luccioman
d42c1773c8
Added UI setting for optional encryption with https on p2p searches
7 years ago